From nobody Thu Dec 18 06:32:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBE4AC7EE43 for ; Thu, 24 Aug 2023 08:06:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240450AbjHXIFk (ORCPT ); Thu, 24 Aug 2023 04:05:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46184 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240506AbjHXIF2 (ORCPT ); Thu, 24 Aug 2023 04:05:28 -0400 Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A3B41731 for ; Thu, 24 Aug 2023 01:04:55 -0700 (PDT) Received: by mail-pl1-x62d.google.com with SMTP id d9443c01a7336-1bc3d94d40fso50513295ad.3 for ; Thu, 24 Aug 2023 01:04:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1692864294; x=1693469094; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=iV3LzITJwv11zEQaTKqycZRD5Ms/f2qTTnP9Z8Na2Dc=; b=hTSVm1KpsiVl32iXwZgDBmPH+embp/HaAL1ZLu2uR7cF3H6hTl34Q9JKEmO30iZR28 vaejshZyHwxBtl2bVKV19vbjkB/8FGyJFYPLkrisnF9qRlAILfqlwg9vcKyHYNMNE+7T 4vD6MFlPiDmwbS1FIANXvqkYrzJusW2+2tYbA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692864294; x=1693469094; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iV3LzITJwv11zEQaTKqycZRD5Ms/f2qTTnP9Z8Na2Dc=; b=d9+psrdB7D3n+esSp8lAHUsO8CFZJphRIThjXwFA5HPym46IxWlx7j1GeG8ZLYQaLe qDqtiPXeEfCDW0LEjXODa8m/5sSgznhI0YiS1onmMPTQl5B6aUZBzEViF4ZDCpuCs8Yk wLaHwybKGjIsHCm/7hMECboZF0q9XexFynUoFqy8uQwnBcZpBuJ9MquqoaZMJ8NRaWHk gQkL/TxpfOvasGb281hKsFN6FYrzmXhdTim3rH9gpEPGF6gjaOSg6D9V6TQccbKLPsfZ gu0FDrYYBv+WfHBBp4G/MlnPrPZvJsYxCxyNiuGsETwiKoRAZRpNKZxsAxYfK/1IiLG2 f33Q== X-Gm-Message-State: AOJu0YwiSB4q5ABbebDXT61nbHWQb59lIVsnY/VChYkNlQ/Jj9FCliyl t+W4yCVaoVbuI9NIZNwtkDNW2g== X-Google-Smtp-Source: AGHT+IHagCrEiCSTUeawSO2RaSCgtKrRoyrdFRSlC2mm085LYzwbHRTmVonAanytbmYPqEcAnMXU/A== X-Received: by 2002:a17:902:ab4f:b0:1be:f53c:7d1d with SMTP id ij15-20020a170902ab4f00b001bef53c7d1dmr14752623plb.23.1692864293682; Thu, 24 Aug 2023 01:04:53 -0700 (PDT) Received: from localhost ([2401:fa00:8f:203:515:8b2a:90c3:b79e]) by smtp.gmail.com with UTF8SMTPSA id g12-20020a1709026b4c00b001bdb167f6ebsm12152992plt.94.2023.08.24.01.04.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 24 Aug 2023 01:04:53 -0700 (PDT) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson Cc: Yu Zhang , Isaku Yamahata , Marc Zyngier , Michael Ellerman , Peter Xu , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org Subject: [PATCH v8 1/8] KVM: Assert that a page's refcount is elevated when marking accessed/dirty Date: Thu, 24 Aug 2023 17:04:01 +0900 Message-ID: <20230824080408.2933205-2-stevensd@google.com> X-Mailer: git-send-email 2.42.0.rc1.204.g551eb34607-goog In-Reply-To: <20230824080408.2933205-1-stevensd@google.com> References: <20230824080408.2933205-1-stevensd@google.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Assert that a page's refcount is elevated, i.e. that _something_ holds a reference to the page, when KVM marks a page as accessed and/or dirty. KVM typically doesn't hold a reference to pages that are mapped into the guest, e.g. to allow page migration, compaction, swap, etc., and instead relies on mmu_notifiers to react to changes in the primary MMU. Incorrect handling of mmu_notifier events (or similar mechanisms) can result in KVM keeping a mapping beyond the lifetime of the backing page, i.e. can (and often does) result in use-after-free. Yelling if KVM marks a freed page as accessed/dirty doesn't prevent badness as KVM usually only does A/D updates when unmapping memory from the guest, i.e. the assertion fires well after an underlying bug has occurred, but yelling does help detect, triage, and debug use-after-free bugs. Note, the assertion must use page_count(), NOT page_ref_count()! For hugepages, the returned struct page may be a tailpage and thus not have its own refcount. Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 5bbb5612b207..1e4586aaa6cb 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2888,6 +2888,19 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_unmap); =20 static bool kvm_is_ad_tracked_page(struct page *page) { + /* + * Assert that KVM isn't attempting to mark a freed page as Accessed or + * Dirty, i.e. that KVM's MMU doesn't have a use-after-free bug. KVM + * (typically) doesn't pin pages that are mapped in KVM's MMU, and + * instead relies on mmu_notifiers to know when a mapping needs to be + * zapped/invalidated. Unmapping from KVM's MMU must happen _before_ + * KVM returns from its mmu_notifier, i.e. the page should have an + * elevated refcount at this point even though KVM doesn't hold a + * reference of its own. + */ + if (WARN_ON_ONCE(!page_count(page))) + return false; + /* * Per page-flags.h, pages tagged PG_reserved "should in general not be * touched (e.g. set dirty) except by its owner". --=20 2.42.0.rc1.204.g551eb34607-goog From nobody Thu Dec 18 06:32:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5603FC71145 for ; Thu, 24 Aug 2023 08:06:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240427AbjHXIGO (ORCPT ); Thu, 24 Aug 2023 04:06:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240390AbjHXIFj (ORCPT ); Thu, 24 Aug 2023 04:05:39 -0400 Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D25DE1BC6 for ; Thu, 24 Aug 2023 01:05:01 -0700 (PDT) Received: by mail-pl1-x632.google.com with SMTP id d9443c01a7336-1bbc87ded50so44195875ad.1 for ; Thu, 24 Aug 2023 01:05:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1692864298; x=1693469098; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=AvY3BR7n7ICYAWnZQ/eSZPYmzhaqgUHnyIdR0CKyw8o=; b=icN/X5nU7CuPmVf3EDP69dbIgW3C5ar+dYaOd+8Um+/O02PqFVSvONoyv1PfzWzeHx XGX0oIDz1VPn6MkrlqnH3sf/EON0iHZeAG5rKH7d4jWCsSSL/lHE41K1imdxEycpg7uf yQkZvT0Xz84srBh1H+MUK3DVVjjmy0ScESnew= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692864298; x=1693469098; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AvY3BR7n7ICYAWnZQ/eSZPYmzhaqgUHnyIdR0CKyw8o=; b=Ssm5Nd1WKqfDX6JrzWIBc07dRS6AY4C8G6fr033Zbdq26gDaZ+mylTUISYCpW0G/Yr IzOJ+9fSIzBJecOUPAwqk/06VqPv6N/96SLljONsTAG8D/6dfL36Ob3vknBKUUMUTKEA 5z4SD0oOEpMDSMzFRpF3SmWyfs+mWeHvrdABuEfPih4kQtkCOrqaknT0bzfH+Tb/F0p9 ciPT25YquqMtx9LeCIWUeAPHZDxjdqKwXNv2oE5rEkTx3EGAQmMYC9QcgtBiS6Ov9ebO ZB64dxCEdpdQ3UvqJPvLT72BSZGu1R0xHAA0arpqZa7d4kXBlYoLbioIPFVJ5X/4TK+7 fw1A== X-Gm-Message-State: AOJu0YzTt94nzTTu0kmS/uah8AXeIwIK29L8f0WAGHJuPKeKsCyOlMW/ uWrOTcslb5R3/FZ+++mKKIf3Jg== X-Google-Smtp-Source: AGHT+IHcwTm8GB+wpjCGaAV8ehVfoF/tH7wPenceS5dXQN6OUyjBTD0aXuUmDcuvmshnSF2y3JwxCQ== X-Received: by 2002:a17:903:32ce:b0:1bc:56c3:ebb7 with SMTP id i14-20020a17090332ce00b001bc56c3ebb7mr15916806plr.20.1692864298341; Thu, 24 Aug 2023 01:04:58 -0700 (PDT) Received: from localhost ([2401:fa00:8f:203:515:8b2a:90c3:b79e]) by smtp.gmail.com with UTF8SMTPSA id e21-20020a170902d39500b001c0c86a5415sm610623pld.154.2023.08.24.01.04.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 24 Aug 2023 01:04:57 -0700 (PDT) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson Cc: Yu Zhang , Isaku Yamahata , Marc Zyngier , Michael Ellerman , Peter Xu , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org, David Stevens Subject: [PATCH v8 2/8] KVM: mmu: Introduce __kvm_follow_pfn function Date: Thu, 24 Aug 2023 17:04:02 +0900 Message-ID: <20230824080408.2933205-3-stevensd@google.com> X-Mailer: git-send-email 2.42.0.rc1.204.g551eb34607-goog In-Reply-To: <20230824080408.2933205-1-stevensd@google.com> References: <20230824080408.2933205-1-stevensd@google.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: David Stevens Introduce __kvm_follow_pfn, which will replace __gfn_to_pfn_memslot. __kvm_follow_pfn refactors the old API's arguments into a struct and, where possible, combines the boolean arguments into a single flags argument. Signed-off-by: David Stevens --- include/linux/kvm_host.h | 16 ++++ virt/kvm/kvm_main.c | 171 ++++++++++++++++++++++----------------- virt/kvm/kvm_mm.h | 3 +- virt/kvm/pfncache.c | 10 ++- 4 files changed, 123 insertions(+), 77 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 9d3ac7720da9..59d9b5e5db33 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -97,6 +97,7 @@ #define KVM_PFN_ERR_HWPOISON (KVM_PFN_ERR_MASK + 1) #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2) #define KVM_PFN_ERR_SIGPENDING (KVM_PFN_ERR_MASK + 3) +#define KVM_PFN_ERR_NEEDS_IO (KVM_PFN_ERR_MASK + 4) =20 /* * error pfns indicate that the gfn is in slot but faild to @@ -1156,6 +1157,21 @@ unsigned long gfn_to_hva_memslot_prot(struct kvm_mem= ory_slot *slot, gfn_t gfn, void kvm_release_page_clean(struct page *page); void kvm_release_page_dirty(struct page *page); =20 +struct kvm_follow_pfn { + const struct kvm_memory_slot *slot; + gfn_t gfn; + unsigned int flags; + bool atomic; + /* Try to create a writable mapping even for a read fault */ + bool try_map_writable; + + /* Outputs of __kvm_follow_pfn */ + hva_t hva; + bool writable; +}; + +kvm_pfn_t __kvm_follow_pfn(struct kvm_follow_pfn *foll); + kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn); kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, bool *writable); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 1e4586aaa6cb..5fde46f05117 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2486,8 +2486,7 @@ static inline int check_user_page_hwpoison(unsigned l= ong addr) * true indicates success, otherwise false is returned. It's also the * only part that runs if we can in atomic context. */ -static bool hva_to_pfn_fast(unsigned long addr, bool write_fault, - bool *writable, kvm_pfn_t *pfn) +static bool hva_to_pfn_fast(struct kvm_follow_pfn *foll, kvm_pfn_t *pfn) { struct page *page[1]; =20 @@ -2496,14 +2495,12 @@ static bool hva_to_pfn_fast(unsigned long addr, boo= l write_fault, * or the caller allows to map a writable pfn for a read fault * request. */ - if (!(write_fault || writable)) + if (!((foll->flags & FOLL_WRITE) || foll->try_map_writable)) return false; =20 - if (get_user_page_fast_only(addr, FOLL_WRITE, page)) { + if (get_user_page_fast_only(foll->hva, FOLL_WRITE, page)) { *pfn =3D page_to_pfn(page[0]); - - if (writable) - *writable =3D true; + foll->writable =3D true; return true; } =20 @@ -2514,35 +2511,26 @@ static bool hva_to_pfn_fast(unsigned long addr, boo= l write_fault, * The slow path to get the pfn of the specified host virtual address, * 1 indicates success, -errno is returned if error is detected. */ -static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fau= lt, - bool interruptible, bool *writable, kvm_pfn_t *pfn) +static int hva_to_pfn_slow(struct kvm_follow_pfn *foll, kvm_pfn_t *pfn) { - unsigned int flags =3D FOLL_HWPOISON; + unsigned int flags =3D FOLL_HWPOISON | foll->flags; struct page *page; int npages; =20 might_sleep(); =20 - if (writable) - *writable =3D write_fault; - - if (write_fault) - flags |=3D FOLL_WRITE; - if (async) - flags |=3D FOLL_NOWAIT; - if (interruptible) - flags |=3D FOLL_INTERRUPTIBLE; - - npages =3D get_user_pages_unlocked(addr, 1, &page, flags); + npages =3D get_user_pages_unlocked(foll->hva, 1, &page, flags); if (npages !=3D 1) return npages; =20 - /* map read fault as writable if possible */ - if (unlikely(!write_fault) && writable) { + if (foll->flags & FOLL_WRITE) { + foll->writable =3D true; + } else if (foll->try_map_writable) { struct page *wpage; =20 - if (get_user_page_fast_only(addr, FOLL_WRITE, &wpage)) { - *writable =3D true; + /* map read fault as writable if possible */ + if (get_user_page_fast_only(foll->hva, FOLL_WRITE, &wpage)) { + foll->writable =3D true; put_page(page); page =3D wpage; } @@ -2573,23 +2561,23 @@ static int kvm_try_get_pfn(kvm_pfn_t pfn) } =20 static int hva_to_pfn_remapped(struct vm_area_struct *vma, - unsigned long addr, bool write_fault, - bool *writable, kvm_pfn_t *p_pfn) + struct kvm_follow_pfn *foll, kvm_pfn_t *p_pfn) { kvm_pfn_t pfn; pte_t *ptep; pte_t pte; spinlock_t *ptl; + bool write_fault =3D foll->flags & FOLL_WRITE; int r; =20 - r =3D follow_pte(vma->vm_mm, addr, &ptep, &ptl); + r =3D follow_pte(vma->vm_mm, foll->hva, &ptep, &ptl); if (r) { /* * get_user_pages fails for VM_IO and VM_PFNMAP vmas and does * not call the fault handler, so do it here. */ bool unlocked =3D false; - r =3D fixup_user_fault(current->mm, addr, + r =3D fixup_user_fault(current->mm, foll->hva, (write_fault ? FAULT_FLAG_WRITE : 0), &unlocked); if (unlocked) @@ -2597,7 +2585,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct = *vma, if (r) return r; =20 - r =3D follow_pte(vma->vm_mm, addr, &ptep, &ptl); + r =3D follow_pte(vma->vm_mm, foll->hva, &ptep, &ptl); if (r) return r; } @@ -2609,8 +2597,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct = *vma, goto out; } =20 - if (writable) - *writable =3D pte_write(pte); + foll->writable =3D pte_write(*ptep); pfn =3D pte_pfn(pte); =20 /* @@ -2655,24 +2642,22 @@ static int hva_to_pfn_remapped(struct vm_area_struc= t *vma, * 2): @write_fault =3D false && @writable, @writable will tell the caller * whether the mapping is writable. */ -kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible, - bool *async, bool write_fault, bool *writable) +kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *foll) { struct vm_area_struct *vma; kvm_pfn_t pfn; int npages, r; =20 /* we can do it either atomically or asynchronously, not both */ - BUG_ON(atomic && async); + BUG_ON(foll->atomic && (foll->flags & FOLL_NOWAIT)); =20 - if (hva_to_pfn_fast(addr, write_fault, writable, &pfn)) + if (hva_to_pfn_fast(foll, &pfn)) return pfn; =20 - if (atomic) + if (foll->atomic) return KVM_PFN_ERR_FAULT; =20 - npages =3D hva_to_pfn_slow(addr, async, write_fault, interruptible, - writable, &pfn); + npages =3D hva_to_pfn_slow(foll, &pfn); if (npages =3D=3D 1) return pfn; if (npages =3D=3D -EINTR) @@ -2680,83 +2665,123 @@ kvm_pfn_t hva_to_pfn(unsigned long addr, bool atom= ic, bool interruptible, =20 mmap_read_lock(current->mm); if (npages =3D=3D -EHWPOISON || - (!async && check_user_page_hwpoison(addr))) { + (!(foll->flags & FOLL_NOWAIT) && check_user_page_hwpoison(foll->hva))= ) { pfn =3D KVM_PFN_ERR_HWPOISON; goto exit; } =20 retry: - vma =3D vma_lookup(current->mm, addr); + vma =3D vma_lookup(current->mm, foll->hva); =20 if (vma =3D=3D NULL) pfn =3D KVM_PFN_ERR_FAULT; else if (vma->vm_flags & (VM_IO | VM_PFNMAP)) { - r =3D hva_to_pfn_remapped(vma, addr, write_fault, writable, &pfn); + r =3D hva_to_pfn_remapped(vma, foll, &pfn); if (r =3D=3D -EAGAIN) goto retry; if (r < 0) pfn =3D KVM_PFN_ERR_FAULT; } else { - if (async && vma_is_valid(vma, write_fault)) - *async =3D true; - pfn =3D KVM_PFN_ERR_FAULT; + if ((foll->flags & FOLL_NOWAIT) && + vma_is_valid(vma, foll->flags & FOLL_WRITE)) + pfn =3D KVM_PFN_ERR_NEEDS_IO; + else + pfn =3D KVM_PFN_ERR_FAULT; } exit: mmap_read_unlock(current->mm); return pfn; } =20 -kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t g= fn, - bool atomic, bool interruptible, bool *async, - bool write_fault, bool *writable, hva_t *hva) +kvm_pfn_t __kvm_follow_pfn(struct kvm_follow_pfn *foll) { - unsigned long addr =3D __gfn_to_hva_many(slot, gfn, NULL, write_fault); + foll->writable =3D false; + foll->hva =3D __gfn_to_hva_many(foll->slot, foll->gfn, NULL, + foll->flags & FOLL_WRITE); =20 - if (hva) - *hva =3D addr; - - if (addr =3D=3D KVM_HVA_ERR_RO_BAD) { - if (writable) - *writable =3D false; + if (foll->hva =3D=3D KVM_HVA_ERR_RO_BAD) return KVM_PFN_ERR_RO_FAULT; - } =20 - if (kvm_is_error_hva(addr)) { - if (writable) - *writable =3D false; + if (kvm_is_error_hva(foll->hva)) return KVM_PFN_NOSLOT; - } =20 - /* Do not map writable pfn in the readonly memslot. */ - if (writable && memslot_is_readonly(slot)) { - *writable =3D false; - writable =3D NULL; - } + if (memslot_is_readonly(foll->slot)) + foll->try_map_writable =3D false; =20 - return hva_to_pfn(addr, atomic, interruptible, async, write_fault, - writable); + return hva_to_pfn(foll); +} +EXPORT_SYMBOL_GPL(__kvm_follow_pfn); + +kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t g= fn, + bool atomic, bool interruptible, bool *async, + bool write_fault, bool *writable, hva_t *hva) +{ + kvm_pfn_t pfn; + struct kvm_follow_pfn foll =3D { + .slot =3D slot, + .gfn =3D gfn, + .flags =3D 0, + .atomic =3D atomic, + .try_map_writable =3D !!writable, + }; + + if (write_fault) + foll.flags |=3D FOLL_WRITE; + if (async) + foll.flags |=3D FOLL_NOWAIT; + if (interruptible) + foll.flags |=3D FOLL_INTERRUPTIBLE; + + pfn =3D __kvm_follow_pfn(&foll); + if (pfn =3D=3D KVM_PFN_ERR_NEEDS_IO) { + *async =3D true; + pfn =3D KVM_PFN_ERR_FAULT; + } + if (hva) + *hva =3D foll.hva; + if (writable) + *writable =3D foll.writable; + return pfn; } EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot); =20 kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, bool *writable) { - return __gfn_to_pfn_memslot(gfn_to_memslot(kvm, gfn), gfn, false, false, - NULL, write_fault, writable, NULL); + kvm_pfn_t pfn; + struct kvm_follow_pfn foll =3D { + .slot =3D gfn_to_memslot(kvm, gfn), + .gfn =3D gfn, + .flags =3D write_fault ? FOLL_WRITE : 0, + .try_map_writable =3D !!writable, + }; + pfn =3D __kvm_follow_pfn(&foll); + if (writable) + *writable =3D foll.writable; + return pfn; } EXPORT_SYMBOL_GPL(gfn_to_pfn_prot); =20 kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t gfn) { - return __gfn_to_pfn_memslot(slot, gfn, false, false, NULL, true, - NULL, NULL); + struct kvm_follow_pfn foll =3D { + .slot =3D slot, + .gfn =3D gfn, + .flags =3D FOLL_WRITE, + }; + return __kvm_follow_pfn(&foll); } EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot); =20 kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_memory_slot *slot, gf= n_t gfn) { - return __gfn_to_pfn_memslot(slot, gfn, true, false, NULL, true, - NULL, NULL); + struct kvm_follow_pfn foll =3D { + .slot =3D slot, + .gfn =3D gfn, + .flags =3D FOLL_WRITE, + .atomic =3D true, + }; + return __kvm_follow_pfn(&foll); } EXPORT_SYMBOL_GPL(gfn_to_pfn_memslot_atomic); =20 diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index 180f1a09e6ba..ed896aee5396 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -20,8 +20,7 @@ #define KVM_MMU_UNLOCK(kvm) spin_unlock(&(kvm)->mmu_lock) #endif /* KVM_HAVE_MMU_RWLOCK */ =20 -kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool interruptible, - bool *async, bool write_fault, bool *writable); +kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *foll); =20 #ifdef CONFIG_HAVE_KVM_PFNCACHE void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index 2d6aba677830..86cd40acad11 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -144,6 +144,12 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_ca= che *gpc) kvm_pfn_t new_pfn =3D KVM_PFN_ERR_FAULT; void *new_khva =3D NULL; unsigned long mmu_seq; + struct kvm_follow_pfn foll =3D { + .slot =3D gpc->memslot, + .gfn =3D gpa_to_gfn(gpc->gpa), + .flags =3D FOLL_WRITE, + .hva =3D gpc->uhva, + }; =20 lockdep_assert_held(&gpc->refresh_lock); =20 @@ -182,8 +188,8 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cac= he *gpc) cond_resched(); } =20 - /* We always request a writeable mapping */ - new_pfn =3D hva_to_pfn(gpc->uhva, false, false, NULL, true, NULL); + /* We always request a writable mapping */ + new_pfn =3D hva_to_pfn(&foll); if (is_error_noslot_pfn(new_pfn)) goto out_error; =20 --=20 2.42.0.rc1.204.g551eb34607-goog From nobody Thu Dec 18 06:32:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C803C83003 for ; Thu, 24 Aug 2023 08:06:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240447AbjHXIGQ (ORCPT ); Thu, 24 Aug 2023 04:06:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240501AbjHXIFt (ORCPT ); Thu, 24 Aug 2023 04:05:49 -0400 Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4DEA01736 for ; Thu, 24 Aug 2023 01:05:19 -0700 (PDT) Received: by mail-pl1-x62c.google.com with SMTP id d9443c01a7336-1bdb7b0c8afso40420355ad.3 for ; Thu, 24 Aug 2023 01:05:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1692864303; x=1693469103; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XABl5efGX4dLDIDtb0EMToFN6DITOp/e+Uzd4Q9F18I=; b=NMOxeuFctuvdMSGAHF6a3sNMAZVYTiQ15gzr51r5WSX9AD4Jp52VPhOlW4t3j7+BIp EK+p3SbwjPGUMIyX9MCUaAGP1JNqTZ84JwBGpqtuFZhFFJqNseR/Ohu5UlCU6B+kWGXz F1m0I2rqei5AcINJI6WK2oBjleuGwcva3t0q8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692864303; x=1693469103; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XABl5efGX4dLDIDtb0EMToFN6DITOp/e+Uzd4Q9F18I=; b=bKJnORBm4AbGJdhE7r7/DEv1JdV+5Td1pPsmYA8HapoRgGk2+5NpqL0p9H6DiPcQvR ggfDlogmXgeSAMuMiyNLKPBsNLTfJH0rih0uoqQgO/4qI32GU0dQgSeGcBEWsETy36DX PU9tLq4eyTm65hs0yKpj4ON2UzsNk5eIhQk1SKfA6684NSCJPBQZlLtcEEbjvb97TVIF tYTy3nrik1wY8J0F3c3gv5XbQ/+qxlu0BzwVXz/7QZ7tRC06RY66AeuKhwVB2sqpe+Nu /IvLuJoSUPTX2EELHUUEig1eljho6qsN3PYdMl4O+eGPio4Bak30DtIBdDeGyFhVS0mO fVDw== X-Gm-Message-State: AOJu0YyMBLOpEzPsL7pCT0neW5m1Kf2WebSQHkb5DwdjnixuPXp9zezi upK7TTug7Z8WZ1930GghOSqAcw== X-Google-Smtp-Source: AGHT+IFxDr7iOWu6Tl7iYobrbaczRccU42on6BqmhxkrRNhaRIO0VNSdRCiyULPVAj40XBMY9/nM9g== X-Received: by 2002:a17:902:8c85:b0:1bf:13a7:d3ef with SMTP id t5-20020a1709028c8500b001bf13a7d3efmr10608777plo.66.1692864303541; Thu, 24 Aug 2023 01:05:03 -0700 (PDT) Received: from localhost ([2401:fa00:8f:203:515:8b2a:90c3:b79e]) by smtp.gmail.com with UTF8SMTPSA id jd17-20020a170903261100b001bf20c80684sm12158428plb.6.2023.08.24.01.05.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 24 Aug 2023 01:05:03 -0700 (PDT) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson Cc: Yu Zhang , Isaku Yamahata , Marc Zyngier , Michael Ellerman , Peter Xu , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org, David Stevens Subject: [PATCH v8 3/8] KVM: mmu: Make __kvm_follow_pfn not imply FOLL_GET Date: Thu, 24 Aug 2023 17:04:03 +0900 Message-ID: <20230824080408.2933205-4-stevensd@google.com> X-Mailer: git-send-email 2.42.0.rc1.204.g551eb34607-goog In-Reply-To: <20230824080408.2933205-1-stevensd@google.com> References: <20230824080408.2933205-1-stevensd@google.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: David Stevens Make it so that __kvm_follow_pfn does not imply FOLL_GET. This allows callers to resolve a gfn when the associated pfn has a valid struct page that isn't being actively refcounted (e.g. tail pages of non-compound higher order pages). For a caller to safely omit FOLL_GET, all usages of the returned pfn must be guarded by a mmu notifier. This also adds a is_refcounted_page out parameter to kvm_follow_pfn that is set when the returned pfn has an associated struct page with a valid refcount. Callers that don't pass FOLL_GET should remember this value and use it to avoid places like kvm_is_ad_tracked_page that assume a non-zero refcount. Signed-off-by: David Stevens --- include/linux/kvm_host.h | 7 ++++ virt/kvm/kvm_main.c | 84 ++++++++++++++++++++++++---------------- virt/kvm/pfncache.c | 2 +- 3 files changed, 58 insertions(+), 35 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 59d9b5e5db33..713fc2d91f95 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1164,10 +1164,17 @@ struct kvm_follow_pfn { bool atomic; /* Try to create a writable mapping even for a read fault */ bool try_map_writable; + /* + * Usage of the returned pfn will be guared by a mmu notifier. Must + * be true if FOLL_GET is not set. + */ + bool guarded_by_mmu_notifier; =20 /* Outputs of __kvm_follow_pfn */ hva_t hva; bool writable; + /* True if the returned pfn is for a page with a valid refcount. */ + bool is_refcounted_page; }; =20 kvm_pfn_t __kvm_follow_pfn(struct kvm_follow_pfn *foll); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 5fde46f05117..963b96cd8ff9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2481,6 +2481,25 @@ static inline int check_user_page_hwpoison(unsigned = long addr) return rc =3D=3D -EHWPOISON; } =20 +static kvm_pfn_t kvm_follow_refcounted_pfn(struct kvm_follow_pfn *foll, + struct page *page) +{ + kvm_pfn_t pfn =3D page_to_pfn(page); + + foll->is_refcounted_page =3D true; + + /* + * FIXME: Ideally, KVM wouldn't pass FOLL_GET to gup() when the caller + * doesn't want to grab a reference, but gup() doesn't support getting + * just the pfn, i.e. FOLL_GET is effectively mandatory. If that ever + * changes, drop this and simply don't pass FOLL_GET to gup(). + */ + if (!(foll->flags & FOLL_GET)) + put_page(page); + + return pfn; +} + /* * The fast path to get the writable pfn which will be stored in @pfn, * true indicates success, otherwise false is returned. It's also the @@ -2499,8 +2518,8 @@ static bool hva_to_pfn_fast(struct kvm_follow_pfn *fo= ll, kvm_pfn_t *pfn) return false; =20 if (get_user_page_fast_only(foll->hva, FOLL_WRITE, page)) { - *pfn =3D page_to_pfn(page[0]); foll->writable =3D true; + *pfn =3D kvm_follow_refcounted_pfn(foll, page[0]); return true; } =20 @@ -2513,7 +2532,7 @@ static bool hva_to_pfn_fast(struct kvm_follow_pfn *fo= ll, kvm_pfn_t *pfn) */ static int hva_to_pfn_slow(struct kvm_follow_pfn *foll, kvm_pfn_t *pfn) { - unsigned int flags =3D FOLL_HWPOISON | foll->flags; + unsigned int flags =3D FOLL_HWPOISON | FOLL_GET | foll->flags; struct page *page; int npages; =20 @@ -2535,7 +2554,7 @@ static int hva_to_pfn_slow(struct kvm_follow_pfn *fol= l, kvm_pfn_t *pfn) page =3D wpage; } } - *pfn =3D page_to_pfn(page); + *pfn =3D kvm_follow_refcounted_pfn(foll, page); return npages; } =20 @@ -2550,16 +2569,6 @@ static bool vma_is_valid(struct vm_area_struct *vma,= bool write_fault) return true; } =20 -static int kvm_try_get_pfn(kvm_pfn_t pfn) -{ - struct page *page =3D kvm_pfn_to_refcounted_page(pfn); - - if (!page) - return 1; - - return get_page_unless_zero(page); -} - static int hva_to_pfn_remapped(struct vm_area_struct *vma, struct kvm_follow_pfn *foll, kvm_pfn_t *p_pfn) { @@ -2568,6 +2577,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct = *vma, pte_t pte; spinlock_t *ptl; bool write_fault =3D foll->flags & FOLL_WRITE; + struct page *page; int r; =20 r =3D follow_pte(vma->vm_mm, foll->hva, &ptep, &ptl); @@ -2601,28 +2611,29 @@ static int hva_to_pfn_remapped(struct vm_area_struc= t *vma, pfn =3D pte_pfn(pte); =20 /* - * Get a reference here because callers of *hva_to_pfn* and - * *gfn_to_pfn* ultimately call kvm_release_pfn_clean on the - * returned pfn. This is only needed if the VMA has VM_MIXEDMAP - * set, but the kvm_try_get_pfn/kvm_release_pfn_clean pair will - * simply do nothing for reserved pfns. - * - * Whoever called remap_pfn_range is also going to call e.g. - * unmap_mapping_range before the underlying pages are freed, - * causing a call to our MMU notifier. + * Now deal with reference counting. If kvm_pfn_to_refcounted_page + * returns NULL, then there's no refcount to worry about. * - * Certain IO or PFNMAP mappings can be backed with valid - * struct pages, but be allocated without refcounting e.g., - * tail pages of non-compound higher order allocations, which - * would then underflow the refcount when the caller does the - * required put_page. Don't allow those pages here. + * Otherwise, certain IO or PFNMAP mappings can be backed with valid + * struct pages but be allocated without refcounting e.g., tail pages of + * non-compound higher order allocations. If FOLL_GET is set and we + * increment such a refcount, then when that pfn is eventually passed to + * kvm_release_pfn_clean, its refcount would hit zero and be incorrectly + * freed. Therefore don't allow those pages here when FOLL_GET is set. */ - if (!kvm_try_get_pfn(pfn)) - r =3D -EFAULT; + page =3D kvm_pfn_to_refcounted_page(pfn); + if (!page) + goto out; + + if (get_page_unless_zero(page)) + WARN_ON_ONCE(kvm_follow_refcounted_pfn(foll, page) !=3D pfn); =20 out: pte_unmap_unlock(ptep, ptl); - *p_pfn =3D pfn; + if (!foll->is_refcounted_page && !foll->guarded_by_mmu_notifier) + r =3D -EFAULT; + else + *p_pfn =3D pfn; =20 return r; } @@ -2696,6 +2707,11 @@ kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *foll) kvm_pfn_t __kvm_follow_pfn(struct kvm_follow_pfn *foll) { foll->writable =3D false; + foll->is_refcounted_page =3D false; + + if (WARN_ON_ONCE(!(foll->flags & FOLL_GET) && !foll->guarded_by_mmu_notif= ier)) + return KVM_PFN_ERR_FAULT; + foll->hva =3D __gfn_to_hva_many(foll->slot, foll->gfn, NULL, foll->flags & FOLL_WRITE); =20 @@ -2720,7 +2736,7 @@ kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memor= y_slot *slot, gfn_t gfn, struct kvm_follow_pfn foll =3D { .slot =3D slot, .gfn =3D gfn, - .flags =3D 0, + .flags =3D FOLL_GET, .atomic =3D atomic, .try_map_writable =3D !!writable, }; @@ -2752,7 +2768,7 @@ kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn,= bool write_fault, struct kvm_follow_pfn foll =3D { .slot =3D gfn_to_memslot(kvm, gfn), .gfn =3D gfn, - .flags =3D write_fault ? FOLL_WRITE : 0, + .flags =3D FOLL_GET | (write_fault ? FOLL_WRITE : 0), .try_map_writable =3D !!writable, }; pfn =3D __kvm_follow_pfn(&foll); @@ -2767,7 +2783,7 @@ kvm_pfn_t gfn_to_pfn_memslot(const struct kvm_memory_= slot *slot, gfn_t gfn) struct kvm_follow_pfn foll =3D { .slot =3D slot, .gfn =3D gfn, - .flags =3D FOLL_WRITE, + .flags =3D FOLL_GET | FOLL_WRITE, }; return __kvm_follow_pfn(&foll); } @@ -2778,7 +2794,7 @@ kvm_pfn_t gfn_to_pfn_memslot_atomic(const struct kvm_= memory_slot *slot, gfn_t gf struct kvm_follow_pfn foll =3D { .slot =3D slot, .gfn =3D gfn, - .flags =3D FOLL_WRITE, + .flags =3D FOLL_GET | FOLL_WRITE, .atomic =3D true, }; return __kvm_follow_pfn(&foll); diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index 86cd40acad11..c558f510ab51 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -147,7 +147,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cac= he *gpc) struct kvm_follow_pfn foll =3D { .slot =3D gpc->memslot, .gfn =3D gpa_to_gfn(gpc->gpa), - .flags =3D FOLL_WRITE, + .flags =3D FOLL_WRITE | FOLL_GET, .hva =3D gpc->uhva, }; =20 --=20 2.42.0.rc1.204.g551eb34607-goog From nobody Thu Dec 18 06:32:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91F2DC83003 for ; Thu, 24 Aug 2023 08:07:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240458AbjHXIGi (ORCPT ); Thu, 24 Aug 2023 04:06:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44958 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240464AbjHXIGV (ORCPT ); Thu, 24 Aug 2023 04:06:21 -0400 Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B51741BD6 for ; Thu, 24 Aug 2023 01:05:35 -0700 (PDT) Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-1bda9207132so51602655ad.0 for ; Thu, 24 Aug 2023 01:05:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1692864308; x=1693469108; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ZhPhAEYXOjd7Sn6E5mFHIsVfVenAXGYibiKvIzNeJhI=; b=G+ioXOlMhzzSsFeOo98ApC0O6Bz+xTN7o7NvHt6zUSzMtMjnoOCwssUrtpM3Ea/f4q QzZWnac70CX9Tsu8jkszmzYyq03LndVS91OTCpw8PH2TyoJt/jfWUy1VBPEHCSH5NHQI lVh4ksRnv/2AHsWJlta0CGXZwJ81ZZyP1PIsU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692864308; x=1693469108; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZhPhAEYXOjd7Sn6E5mFHIsVfVenAXGYibiKvIzNeJhI=; b=F+6taLKPKNOYX3C9bK3wAOKQW0xuCI/ZHRgI9eT6l5d9NC1+3/6Dvx0bIqhzktiuBS SHQKE1nQ1c7UiONt3xZ5XLpIiMziaj/dM9AgqM/6NAKhPfxOSEjdoCqcbiKkrDkAs5Q/ v1P+PHttfX4tAImgldhgKXfsfw9PT9+pEZFxCJGHMb5kTMuFHvrg5h8TS88oMEEvU6P0 S/4bD+WmoRLLwsoUXX3vtOpD9DskkrMKeEMrMrqimofMr0ANnmsVBxx9YWbS+naN8a3B N8MD+XhRcO2G7BsPdHBOcEk82yQdhoZHFI0qatEc7gsaZqvvcCJPyLbzRVRVrjnAQVPF Y19g== X-Gm-Message-State: AOJu0YyjJWV5taybtMHs2ZRNgbg099Az9RSOmM4TlEOUws4nvBR3v14e okXDFxOC/L0CXvdb8hSYHYxnIA== X-Google-Smtp-Source: AGHT+IFkCGH2GwA3ilX5jB05HUrrf1vUBIbmMJaFJx2swg78OUGHa276ImTXLpzo2mY30kNcsT6wbw== X-Received: by 2002:a17:902:c947:b0:1c0:7bac:13d4 with SMTP id i7-20020a170902c94700b001c07bac13d4mr10820347pla.65.1692864308600; Thu, 24 Aug 2023 01:05:08 -0700 (PDT) Received: from localhost ([2401:fa00:8f:203:515:8b2a:90c3:b79e]) by smtp.gmail.com with UTF8SMTPSA id q6-20020a170902a3c600b001bf095dfb76sm12370611plb.237.2023.08.24.01.05.05 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 24 Aug 2023 01:05:08 -0700 (PDT) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson Cc: Yu Zhang , Isaku Yamahata , Marc Zyngier , Michael Ellerman , Peter Xu , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org, David Stevens Subject: [PATCH v8 4/8] KVM: x86/mmu: Migrate to __kvm_follow_pfn Date: Thu, 24 Aug 2023 17:04:04 +0900 Message-ID: <20230824080408.2933205-5-stevensd@google.com> X-Mailer: git-send-email 2.42.0.rc1.204.g551eb34607-goog In-Reply-To: <20230824080408.2933205-1-stevensd@google.com> References: <20230824080408.2933205-1-stevensd@google.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: David Stevens Migrate from __gfn_to_pfn_memslot to __kvm_follow_pfn. Most arguments directly map to the new API. The largest change is replacing the async in/out parameter with FOLL_NOWAIT parameter and the KVM_PFN_ERR_NEEDS_IO return value. Signed-off-by: David Stevens --- arch/x86/kvm/mmu/mmu.c | 41 +++++++++++++++++++++++++++++++---------- 1 file changed, 31 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index ec169f5c7dce..dabae67f198b 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4296,7 +4296,12 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu= , struct kvm_async_pf *work) static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault = *fault) { struct kvm_memory_slot *slot =3D fault->slot; - bool async; + struct kvm_follow_pfn foll =3D { + .slot =3D slot, + .gfn =3D fault->gfn, + .flags =3D FOLL_GET | (fault->write ? FOLL_WRITE : 0), + .try_map_writable =3D true, + }; =20 /* * Retry the page fault if the gfn hit a memslot that is being deleted @@ -4325,12 +4330,20 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu,= struct kvm_page_fault *fault return RET_PF_EMULATE; } =20 - async =3D false; - fault->pfn =3D __gfn_to_pfn_memslot(slot, fault->gfn, false, false, &asyn= c, - fault->write, &fault->map_writable, - &fault->hva); - if (!async) - return RET_PF_CONTINUE; /* *pfn has correct page already */ + foll.flags |=3D FOLL_NOWAIT; + fault->pfn =3D __kvm_follow_pfn(&foll); + + if (!is_error_noslot_pfn(fault->pfn)) + goto success; + + /* + * If __kvm_follow_pfn() failed because I/O is needed to fault in the + * page, then either set up an asynchronous #PF to do the I/O, or if + * doing an async #PF isn't possible, retry __kvm_follow_pfn() with + * I/O allowed. All other failures are fatal, i.e. retrying won't help. + */ + if (fault->pfn !=3D KVM_PFN_ERR_NEEDS_IO) + return RET_PF_CONTINUE; =20 if (!fault->prefetch && kvm_can_do_async_pf(vcpu)) { trace_kvm_try_async_get_page(fault->addr, fault->gfn); @@ -4348,9 +4361,17 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, = struct kvm_page_fault *fault * to wait for IO. Note, gup always bails if it is unable to quickly * get a page and a fatal signal, i.e. SIGKILL, is pending. */ - fault->pfn =3D __gfn_to_pfn_memslot(slot, fault->gfn, false, true, NULL, - fault->write, &fault->map_writable, - &fault->hva); + foll.flags |=3D FOLL_INTERRUPTIBLE; + foll.flags &=3D ~FOLL_NOWAIT; + fault->pfn =3D __kvm_follow_pfn(&foll); + + if (!is_error_noslot_pfn(fault->pfn)) + goto success; + + return RET_PF_CONTINUE; +success: + fault->hva =3D foll.hva; + fault->map_writable =3D foll.writable; return RET_PF_CONTINUE; } =20 --=20 2.42.0.rc1.204.g551eb34607-goog From nobody Thu Dec 18 06:32:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5DE0C88CB9 for ; Thu, 24 Aug 2023 08:07:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240500AbjHXIGs (ORCPT ); Thu, 24 Aug 2023 04:06:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41904 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240472AbjHXIGW (ORCPT ); Thu, 24 Aug 2023 04:06:22 -0400 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C34611BD8 for ; Thu, 24 Aug 2023 01:05:36 -0700 (PDT) Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-68a41035828so3289683b3a.1 for ; Thu, 24 Aug 2023 01:05:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1692864314; x=1693469114; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=R3N2qLIV1dOCTfZGrxfezfZCNqk3aw3I6ITB122p148=; b=KmMnpYLR9IuJ5tdXymssdKkrVpHH6nCmiyCa3qQW4Rilr0JQFELA5m87ZWYfut3pmA fysrOxyV+ru/71qrasjIWXzkfsAP2+sXyOGC5ezD+7SC7LGvnaruJNY8kw15hsYmQHfX 8IQzokZvhpXo049qUnE0rm8sLA2ng20KMTZBM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692864314; x=1693469114; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=R3N2qLIV1dOCTfZGrxfezfZCNqk3aw3I6ITB122p148=; b=ZBqz8uFA6cKwnqMWrs9YoKSrpRp5mp+LU7NotpVE/+vmLYX4c36dn49YqJedJTHO38 Sej9NkZBJsM03D+DdJSnwInj+jiiYuF4sWeAU+NZcJBAiLvmosodPptqY4EhrnjX/0m1 bKVPTp1zJmH24NN1Y49o8Gl69pQCi32kRRZTdqiZ9eke1qXHc7LNFBbOUrdd6qsPAzqs HgS2vO4ZMsxnfX902sj3dDt7G/06YVPs2pjQOEOOZxqT0TjGE4wr2HT6y4jK1yZ/HL5h 76pU4RfTT35CRvnoCDdxOCM3pql7udZy8jKYDoDeyI1OKrtLepOSLHVhlUziMfaziPM0 ugnA== X-Gm-Message-State: AOJu0YzKbew5b575GgexPviwFk5v/Pxrl866JNhEkRGub1fHYPAlIOZU LjpOR371jzOnBNiberod+r2bxQ== X-Google-Smtp-Source: AGHT+IFQuzhiw+VNrMdIrBf0Ryoivf1q43SH2TuxYQh6Y66LpJ63BZQmwO6ZvOpnuBz8ux1MkDgcBA== X-Received: by 2002:a05:6a21:778a:b0:14b:83f7:ef4f with SMTP id bd10-20020a056a21778a00b0014b83f7ef4fmr3998192pzc.34.1692864313823; Thu, 24 Aug 2023 01:05:13 -0700 (PDT) Received: from localhost ([2401:fa00:8f:203:515:8b2a:90c3:b79e]) by smtp.gmail.com with UTF8SMTPSA id m14-20020aa7900e000000b0068a3f861b24sm7002699pfo.195.2023.08.24.01.05.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 24 Aug 2023 01:05:13 -0700 (PDT) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson Cc: Yu Zhang , Isaku Yamahata , Marc Zyngier , Michael Ellerman , Peter Xu , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org, David Stevens Subject: [PATCH v8 5/8] KVM: x86/mmu: Don't pass FOLL_GET to __kvm_follow_pfn Date: Thu, 24 Aug 2023 17:04:05 +0900 Message-ID: <20230824080408.2933205-6-stevensd@google.com> X-Mailer: git-send-email 2.42.0.rc1.204.g551eb34607-goog In-Reply-To: <20230824080408.2933205-1-stevensd@google.com> References: <20230824080408.2933205-1-stevensd@google.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: David Stevens Stop passing FOLL_GET to __kvm_follow_pfn. This allows the host to map memory into the guest that is backed by un-refcounted struct pages - for example, the tail pages of higher order non-compound pages allocated by the amdgpu driver via ttm_pool_alloc_page. The bulk of this change is tracking the is_refcounted_page flag so that non-refcounted pages don't trigger page_count() =3D=3D 0 warnings. This is done by storing the flag in an unused bit in the sptes. This only bit is not available in PAE SPTEs, so FOLL_GET is only omitted for TDP on x86-64. Signed-off-by: David Stevens --- arch/x86/kvm/mmu/mmu.c | 55 +++++++++++++++++++++++---------- arch/x86/kvm/mmu/mmu_internal.h | 1 + arch/x86/kvm/mmu/paging_tmpl.h | 8 +++-- arch/x86/kvm/mmu/spte.c | 4 ++- arch/x86/kvm/mmu/spte.h | 12 ++++++- arch/x86/kvm/mmu/tdp_mmu.c | 22 +++++++------ include/linux/kvm_host.h | 3 ++ virt/kvm/kvm_main.c | 6 ++-- 8 files changed, 79 insertions(+), 32 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index dabae67f198b..4f5d33e95c6e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -553,12 +553,14 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte) =20 if (is_accessed_spte(old_spte) && !is_accessed_spte(new_spte)) { flush =3D true; - kvm_set_pfn_accessed(spte_to_pfn(old_spte)); + if (is_refcounted_page_pte(old_spte)) + kvm_set_page_accessed(pfn_to_page(spte_to_pfn(old_spte))); } =20 if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte)) { flush =3D true; - kvm_set_pfn_dirty(spte_to_pfn(old_spte)); + if (is_refcounted_page_pte(old_spte)) + kvm_set_page_dirty(pfn_to_page(spte_to_pfn(old_spte))); } =20 return flush; @@ -596,14 +598,18 @@ static u64 mmu_spte_clear_track_bits(struct kvm *kvm,= u64 *sptep) * before they are reclaimed. Sanity check that, if the pfn is backed * by a refcounted page, the refcount is elevated. */ - page =3D kvm_pfn_to_refcounted_page(pfn); - WARN_ON(page && !page_count(page)); + if (is_refcounted_page_pte(old_spte)) { + page =3D kvm_pfn_to_refcounted_page(pfn); + WARN_ON(!page || !page_count(page)); + } =20 - if (is_accessed_spte(old_spte)) - kvm_set_pfn_accessed(pfn); + if (is_refcounted_page_pte(old_spte)) { + if (is_accessed_spte(old_spte)) + kvm_set_page_accessed(pfn_to_page(pfn)); =20 - if (is_dirty_spte(old_spte)) - kvm_set_pfn_dirty(pfn); + if (is_dirty_spte(old_spte)) + kvm_set_page_dirty(pfn_to_page(pfn)); + } =20 return old_spte; } @@ -639,8 +645,8 @@ static bool mmu_spte_age(u64 *sptep) * Capture the dirty status of the page, so that it doesn't get * lost when the SPTE is marked for access tracking. */ - if (is_writable_pte(spte)) - kvm_set_pfn_dirty(spte_to_pfn(spte)); + if (is_writable_pte(spte) && is_refcounted_page_pte(spte)) + kvm_set_page_dirty(pfn_to_page(spte_to_pfn(spte))); =20 spte =3D mark_spte_for_access_track(spte); mmu_spte_update_no_track(sptep, spte); @@ -1278,8 +1284,8 @@ static bool spte_wrprot_for_clear_dirty(u64 *sptep) { bool was_writable =3D test_and_clear_bit(PT_WRITABLE_SHIFT, (unsigned long *)sptep); - if (was_writable && !spte_ad_enabled(*sptep)) - kvm_set_pfn_dirty(spte_to_pfn(*sptep)); + if (was_writable && !spte_ad_enabled(*sptep) && is_refcounted_page_pte(*s= ptep)) + kvm_set_page_dirty(pfn_to_page(spte_to_pfn(*sptep))); =20 return was_writable; } @@ -2937,6 +2943,11 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struc= t kvm_memory_slot *slot, bool host_writable =3D !fault || fault->map_writable; bool prefetch =3D !fault || fault->prefetch; bool write_fault =3D fault && fault->write; + /* + * Prefetching uses gfn_to_page_many_atomic, which never gets + * non-refcounted pages. + */ + bool is_refcounted =3D !fault || fault->is_refcounted_page; =20 pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__, *sptep, write_fault, gfn); @@ -2969,7 +2980,7 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct= kvm_memory_slot *slot, } =20 wrprot =3D make_spte(vcpu, sp, slot, pte_access, gfn, pfn, *sptep, prefet= ch, - true, host_writable, &spte); + true, host_writable, is_refcounted, &spte); =20 if (*sptep =3D=3D spte) { ret =3D RET_PF_SPURIOUS; @@ -4296,11 +4307,19 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcp= u, struct kvm_async_pf *work) static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault = *fault) { struct kvm_memory_slot *slot =3D fault->slot; + /* + * We only allow non-refcounted pages if we can track whether or not + * pages are refcounted via an SPTE bit. This bit is not available + * in PAE SPTEs, so pass FOLL_GET if we may have to deal with those. + */ + unsigned int get_flag =3D (tdp_enabled && IS_ENABLED(CONFIG_X86_64)) ? + 0 : FOLL_GET; struct kvm_follow_pfn foll =3D { .slot =3D slot, .gfn =3D fault->gfn, - .flags =3D FOLL_GET | (fault->write ? FOLL_WRITE : 0), + .flags =3D (fault->write ? FOLL_WRITE : 0) | get_flag, .try_map_writable =3D true, + .guarded_by_mmu_notifier =3D true, }; =20 /* @@ -4317,6 +4336,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, s= truct kvm_page_fault *fault fault->slot =3D NULL; fault->pfn =3D KVM_PFN_NOSLOT; fault->map_writable =3D false; + fault->is_refcounted_page =3D false; return RET_PF_CONTINUE; } /* @@ -4372,6 +4392,7 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, s= truct kvm_page_fault *fault success: fault->hva =3D foll.hva; fault->map_writable =3D foll.writable; + fault->is_refcounted_page =3D foll.is_refcounted_page; return RET_PF_CONTINUE; } =20 @@ -4456,8 +4477,9 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, s= truct kvm_page_fault *fault r =3D direct_map(vcpu, fault); =20 out_unlock: + if (fault->is_refcounted_page) + kvm_set_page_accessed(pfn_to_page(fault->pfn)); write_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(fault->pfn); return r; } =20 @@ -4534,8 +4556,9 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vc= pu, r =3D kvm_tdp_mmu_map(vcpu, fault); =20 out_unlock: + if (fault->is_refcounted_page) + kvm_set_page_accessed(pfn_to_page(fault->pfn)); read_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(fault->pfn); return r; } #endif diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna= l.h index d39af5639ce9..55790085884f 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -240,6 +240,7 @@ struct kvm_page_fault { kvm_pfn_t pfn; hva_t hva; bool map_writable; + bool is_refcounted_page; =20 /* * Indicates the guest is trying to write a gfn that contains one or diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 0662e0278e70..4ffcebc0c3ce 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -828,8 +828,9 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, str= uct kvm_page_fault *fault r =3D FNAME(fetch)(vcpu, fault, &walker); =20 out_unlock: + if (fault->is_refcounted_page) + kvm_set_page_accessed(pfn_to_page(fault->pfn)); write_unlock(&vcpu->kvm->mmu_lock); - kvm_release_pfn_clean(fault->pfn); return r; } =20 @@ -883,7 +884,7 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, s= truct kvm_mmu *mmu, */ static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp= , int i) { - bool host_writable; + bool host_writable, is_refcounted; gpa_t first_pte_gpa; u64 *sptep, spte; struct kvm_memory_slot *slot; @@ -940,10 +941,11 @@ static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, st= ruct kvm_mmu_page *sp, int sptep =3D &sp->spt[i]; spte =3D *sptep; host_writable =3D spte & shadow_host_writable_mask; + is_refcounted =3D spte & SPTE_MMU_PAGE_REFCOUNTED; slot =3D kvm_vcpu_gfn_to_memslot(vcpu, gfn); make_spte(vcpu, sp, slot, pte_access, gfn, spte_to_pfn(spte), spte, true, false, - host_writable, &spte); + host_writable, is_refcounted, &spte); =20 return mmu_spte_update(sptep, spte); } diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index cf2c6426a6fc..5b13d9143d56 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -138,7 +138,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_pa= ge *sp, const struct kvm_memory_slot *slot, unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch, bool can_unsync, - bool host_writable, u64 *new_spte) + bool host_writable, bool is_refcounted, u64 *new_spte) { int level =3D sp->role.level; u64 spte =3D SPTE_MMU_PRESENT_MASK; @@ -188,6 +188,8 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_pa= ge *sp, =20 if (level > PG_LEVEL_4K) spte |=3D PT_PAGE_SIZE_MASK; + if (is_refcounted) + spte |=3D SPTE_MMU_PAGE_REFCOUNTED; =20 if (shadow_memtype_mask) spte |=3D static_call(kvm_x86_get_mt_mask)(vcpu, gfn, diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index 1279db2eab44..be93dd061ae3 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -95,6 +95,11 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK= _SAVED_MASK)); /* Defined only to keep the above static asserts readable. */ #undef SHADOW_ACC_TRACK_SAVED_MASK =20 +/* + * Indicates that the SPTE refers to a page with a valid refcount. + */ +#define SPTE_MMU_PAGE_REFCOUNTED BIT_ULL(59) + /* * Due to limited space in PTEs, the MMIO generation is a 19 bit subset of * the memslots generation and is derived as follows: @@ -332,6 +337,11 @@ static inline bool is_dirty_spte(u64 spte) return dirty_mask ? spte & dirty_mask : spte & PT_WRITABLE_MASK; } =20 +static inline bool is_refcounted_page_pte(u64 spte) +{ + return spte & SPTE_MMU_PAGE_REFCOUNTED; +} + static inline u64 get_rsvd_bits(struct rsvd_bits_validate *rsvd_check, u64= pte, int level) { @@ -462,7 +472,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_pa= ge *sp, const struct kvm_memory_slot *slot, unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch, bool can_unsync, - bool host_writable, u64 *new_spte); + bool host_writable, bool is_refcounted, u64 *new_spte); u64 make_huge_page_split_spte(struct kvm *kvm, u64 huge_spte, union kvm_mmu_page_role role, int index); u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 512163d52194..a9b1b14d2e26 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -474,6 +474,7 @@ static void handle_changed_spte(struct kvm *kvm, int as= _id, gfn_t gfn, bool was_leaf =3D was_present && is_last_spte(old_spte, level); bool is_leaf =3D is_present && is_last_spte(new_spte, level); bool pfn_changed =3D spte_to_pfn(old_spte) !=3D spte_to_pfn(new_spte); + bool is_refcounted =3D is_refcounted_page_pte(old_spte); =20 WARN_ON(level > PT64_ROOT_MAX_LEVEL); WARN_ON(level < PG_LEVEL_4K); @@ -538,9 +539,9 @@ static void handle_changed_spte(struct kvm *kvm, int as= _id, gfn_t gfn, if (is_leaf !=3D was_leaf) kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1); =20 - if (was_leaf && is_dirty_spte(old_spte) && + if (was_leaf && is_dirty_spte(old_spte) && is_refcounted && (!is_present || !is_dirty_spte(new_spte) || pfn_changed)) - kvm_set_pfn_dirty(spte_to_pfn(old_spte)); + kvm_set_page_dirty(pfn_to_page(spte_to_pfn(old_spte))); =20 /* * Recursively handle child PTs if the change removed a subtree from @@ -552,9 +553,9 @@ static void handle_changed_spte(struct kvm *kvm, int as= _id, gfn_t gfn, (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared); =20 - if (was_leaf && is_accessed_spte(old_spte) && + if (was_leaf && is_accessed_spte(old_spte) && is_refcounted && (!is_present || !is_accessed_spte(new_spte) || pfn_changed)) - kvm_set_pfn_accessed(spte_to_pfn(old_spte)); + kvm_set_page_accessed(pfn_to_page(spte_to_pfn(old_spte))); } =20 /* @@ -988,8 +989,9 @@ static int tdp_mmu_map_handle_target_level(struct kvm_v= cpu *vcpu, new_spte =3D make_mmio_spte(vcpu, iter->gfn, ACC_ALL); else wrprot =3D make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn, - fault->pfn, iter->old_spte, fault->prefetch, true, - fault->map_writable, &new_spte); + fault->pfn, iter->old_spte, fault->prefetch, true, + fault->map_writable, fault->is_refcounted_page, + &new_spte); =20 if (new_spte =3D=3D iter->old_spte) ret =3D RET_PF_SPURIOUS; @@ -1205,8 +1207,9 @@ static bool age_gfn_range(struct kvm *kvm, struct tdp= _iter *iter, * Capture the dirty status of the page, so that it doesn't get * lost when the SPTE is marked for access tracking. */ - if (is_writable_pte(iter->old_spte)) - kvm_set_pfn_dirty(spte_to_pfn(iter->old_spte)); + if (is_writable_pte(iter->old_spte) && + is_refcounted_page_pte(iter->old_spte)) + kvm_set_page_dirty(pfn_to_page(spte_to_pfn(iter->old_spte))); =20 new_spte =3D mark_spte_for_access_track(iter->old_spte); iter->old_spte =3D kvm_tdp_mmu_write_spte(iter->sptep, @@ -1626,7 +1629,8 @@ static void clear_dirty_pt_masked(struct kvm *kvm, st= ruct kvm_mmu_page *root, trace_kvm_tdp_mmu_spte_changed(iter.as_id, iter.gfn, iter.level, iter.old_spte, iter.old_spte & ~dbit); - kvm_set_pfn_dirty(spte_to_pfn(iter.old_spte)); + if (is_refcounted_page_pte(iter.old_spte)) + kvm_set_page_dirty(pfn_to_page(spte_to_pfn(iter.old_spte))); } =20 rcu_read_unlock(); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 713fc2d91f95..292701339198 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1157,6 +1157,9 @@ unsigned long gfn_to_hva_memslot_prot(struct kvm_memo= ry_slot *slot, gfn_t gfn, void kvm_release_page_clean(struct page *page); void kvm_release_page_dirty(struct page *page); =20 +void kvm_set_page_accessed(struct page *page); +void kvm_set_page_dirty(struct page *page); + struct kvm_follow_pfn { const struct kvm_memory_slot *slot; gfn_t gfn; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 963b96cd8ff9..fa1848c6c84f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2949,17 +2949,19 @@ static bool kvm_is_ad_tracked_page(struct page *pag= e) return !PageReserved(page); } =20 -static void kvm_set_page_dirty(struct page *page) +void kvm_set_page_dirty(struct page *page) { if (kvm_is_ad_tracked_page(page)) SetPageDirty(page); } +EXPORT_SYMBOL_GPL(kvm_set_page_dirty); =20 -static void kvm_set_page_accessed(struct page *page) +void kvm_set_page_accessed(struct page *page) { if (kvm_is_ad_tracked_page(page)) mark_page_accessed(page); } +EXPORT_SYMBOL_GPL(kvm_set_page_accessed); =20 void kvm_release_page_clean(struct page *page) { --=20 2.42.0.rc1.204.g551eb34607-goog From nobody Thu Dec 18 06:32:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B32EFC88CB2 for ; Thu, 24 Aug 2023 08:07:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240512AbjHXIGm (ORCPT ); Thu, 24 Aug 2023 04:06:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41886 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240474AbjHXIGW (ORCPT ); Thu, 24 Aug 2023 04:06:22 -0400 Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5FBBC198C for ; Thu, 24 Aug 2023 01:05:38 -0700 (PDT) Received: by mail-pf1-x42e.google.com with SMTP id d2e1a72fcca58-68bed28818fso518246b3a.2 for ; Thu, 24 Aug 2023 01:05:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1692864318; x=1693469118; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=g56uEk2MwfqiIQL+ErCF8LkgNo1fm+/UVdRtKRVa5jM=; b=ZmfC2H94Cfq7qeuTdK+I523W7JgWVtf0NoVYhxiVkl1g+AhID2EBdD+chl2WsbO4/5 b91CNyacOnMHohdGF0QhiuUgzViFJIFKuZruOBD5q27lVACYtj3C0Tb/Ezhbnft5wCAA HAvNrd3jOIzXBx/YwgBF4lQqE3BpOcApbhbdo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692864318; x=1693469118; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=g56uEk2MwfqiIQL+ErCF8LkgNo1fm+/UVdRtKRVa5jM=; b=XMR0AS7xhE7wESj8flXxAaaSBGX90gpy+Ryhc2oMJKKw+nL91QobBbJNbPcTDyCZi8 GxEbuKiZ4YEumSyuFHYb1YlU86O3i87T9/eac57frtbNmzIE5Y82I4Lr5JaWM9k+OwDK YGFfsPJpRTVd5TJJi3riZwySKWlXJpG70D4WMouXycG1SCzLYSdgnFeVb49VS0C3OeZa WggCV773ja+eQRmMVXPjpTwJYi5P73V4xkDJzp4t4BVT6vjLpyFuzewZulrfuYluKlLT yLT/GrMXcymiq3wxh/khmjWTEBdLvA1WSRbG5UvST6hFA7r4qpp6jaHNC5X5iLsGmAA9 6WHw== X-Gm-Message-State: AOJu0Yy5oNuS/cueVAQ2rkWUR3cB/Kvl8zv8yqkQmEUNaBAO89NO99hx fzvXmgbQ9iX1b+k5ft/LiDyXQQ== X-Google-Smtp-Source: AGHT+IG5DARchqgjlrYacKLWZkG/gje35tz5DbS9iEiEUj/zyN5dTp+yXFg5KHD5Ko9bddQ/hKa17g== X-Received: by 2002:a05:6a21:3390:b0:14b:7c16:a590 with SMTP id yy16-20020a056a21339000b0014b7c16a590mr4964459pzb.60.1692864318384; Thu, 24 Aug 2023 01:05:18 -0700 (PDT) Received: from localhost ([2401:fa00:8f:203:515:8b2a:90c3:b79e]) by smtp.gmail.com with UTF8SMTPSA id j24-20020aa79298000000b0068a077847c7sm8928630pfa.135.2023.08.24.01.05.15 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 24 Aug 2023 01:05:17 -0700 (PDT) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson Cc: Yu Zhang , Isaku Yamahata , Marc Zyngier , Michael Ellerman , Peter Xu , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org, David Stevens Subject: [PATCH v8 6/8] KVM: arm64: Migrate to __kvm_follow_pfn Date: Thu, 24 Aug 2023 17:04:06 +0900 Message-ID: <20230824080408.2933205-7-stevensd@google.com> X-Mailer: git-send-email 2.42.0.rc1.204.g551eb34607-goog In-Reply-To: <20230824080408.2933205-1-stevensd@google.com> References: <20230824080408.2933205-1-stevensd@google.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: David Stevens Migrate from __gfn_to_pfn_memslot to __kvm_follow_pfn. Signed-off-by: David Stevens --- arch/arm64/kvm/mmu.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index d3b4feed460c..e4abc2a57d7d 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1334,7 +1334,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys= _addr_t fault_ipa, unsigned long fault_status) { int ret =3D 0; - bool write_fault, writable, force_pte =3D false; + bool write_fault =3D kvm_is_write_fault(vcpu); + bool force_pte =3D false; bool exec_fault, mte_allowed; bool device =3D false; unsigned long mmu_seq; @@ -1342,16 +1343,19 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, ph= ys_addr_t fault_ipa, struct kvm_mmu_memory_cache *memcache =3D &vcpu->arch.mmu_page_cache; struct vm_area_struct *vma; short vma_shift; - gfn_t gfn; kvm_pfn_t pfn; bool logging_active =3D memslot_is_logging(memslot); unsigned long fault_level =3D kvm_vcpu_trap_get_fault_level(vcpu); long vma_pagesize, fault_granule; enum kvm_pgtable_prot prot =3D KVM_PGTABLE_PROT_R; struct kvm_pgtable *pgt; + struct kvm_follow_pfn foll =3D { + .slot =3D memslot, + .flags =3D FOLL_GET | (write_fault ? FOLL_WRITE : 0), + .try_map_writable =3D true, + }; =20 fault_granule =3D 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level); - write_fault =3D kvm_is_write_fault(vcpu); exec_fault =3D kvm_vcpu_trap_is_exec_fault(vcpu); VM_BUG_ON(write_fault && exec_fault); =20 @@ -1425,7 +1429,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys= _addr_t fault_ipa, if (vma_pagesize =3D=3D PMD_SIZE || vma_pagesize =3D=3D PUD_SIZE) fault_ipa &=3D ~(vma_pagesize - 1); =20 - gfn =3D fault_ipa >> PAGE_SHIFT; + foll.gfn =3D fault_ipa >> PAGE_SHIFT; mte_allowed =3D kvm_vma_mte_allowed(vma); =20 /* Don't use the VMA after the unlock -- it may have vanished */ @@ -1433,7 +1437,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys= _addr_t fault_ipa, =20 /* * Read mmu_invalidate_seq so that KVM can detect if the results of - * vma_lookup() or __gfn_to_pfn_memslot() become stale prior to + * vma_lookup() or __kvm_follow_pfn() become stale prior to * acquiring kvm->mmu_lock. * * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs @@ -1442,8 +1446,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys= _addr_t fault_ipa, mmu_seq =3D vcpu->kvm->mmu_invalidate_seq; mmap_read_unlock(current->mm); =20 - pfn =3D __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL, - write_fault, &writable, NULL); + pfn =3D __kvm_follow_pfn(&foll); if (pfn =3D=3D KVM_PFN_ERR_HWPOISON) { kvm_send_hwpoison_signal(hva, vma_shift); return 0; @@ -1468,7 +1471,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys= _addr_t fault_ipa, * Only actually map the page as writable if this was a write * fault. */ - writable =3D false; + foll.writable =3D false; } =20 if (exec_fault && device) @@ -1508,7 +1511,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys= _addr_t fault_ipa, } } =20 - if (writable) + if (foll.writable) prot |=3D KVM_PGTABLE_PROT_W; =20 if (exec_fault) @@ -1534,9 +1537,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys= _addr_t fault_ipa, KVM_PGTABLE_WALK_SHARED); =20 /* Mark the page dirty only if the fault is handled successfully */ - if (writable && !ret) { + if (foll.writable && !ret) { kvm_set_pfn_dirty(pfn); - mark_page_dirty_in_slot(kvm, memslot, gfn); + mark_page_dirty_in_slot(kvm, memslot, foll.gfn); } =20 out_unlock: --=20 2.42.0.rc1.204.g551eb34607-goog From nobody Thu Dec 18 06:32:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 216ECEE49AD for ; Thu, 24 Aug 2023 08:07:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240574AbjHXIGy (ORCPT ); Thu, 24 Aug 2023 04:06:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45022 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240491AbjHXIGW (ORCPT ); Thu, 24 Aug 2023 04:06:22 -0400 Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com [IPv6:2607:f8b0:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA286198E for ; Thu, 24 Aug 2023 01:05:42 -0700 (PDT) Received: by mail-pg1-x532.google.com with SMTP id 41be03b00d2f7-56c2e840e70so2674234a12.3 for ; Thu, 24 Aug 2023 01:05:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1692864323; x=1693469123; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8pInsN0R9Jrj5G4DiNtaAqgrOTWh2xmTbeohTUBqU+0=; b=SNN3TBN+2El1oNqHuodXqkb+5HS1GQGpVa9ijD5Zr72npYE0GqK9C0GS+tal51GSM3 QWaEWMWVPJ8vzhQWVs+fRBUmp5Fk5aZOK4tK1T4gtNEHXiyaELrNaoB8zkOIbXIz3F/0 3ft/zWiNEmq8UdzMxAIvMNOTURLYSy8oxiORo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692864323; x=1693469123; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8pInsN0R9Jrj5G4DiNtaAqgrOTWh2xmTbeohTUBqU+0=; b=dqyciKw1uWL+KPD/9SroVEi5c7MlPJztlMf07j1MWxUw1qlp8ExGGCTUpUTAJeoYLU HvrBGjgKiOQyZy6rDMXluzNsK11ERlfGLXsPiojk/mInAyCb4lbukpMD/T8Wsvs1Ee/6 tHkX2f7HOYkYQ6CtWXsFVXpOL4KcadAzfES8GwwyPG3xLG7jNfNVXUEEIZbKecUMGgxN ZgQz4DoKslhw2zCV/lVvrsoCerW22WoDc2mZTJqq1+JuT8eI6CS5xh29Og2hgosZTB5H APGbCqFmSuokO1/l2NLyGKL9GIkNDKTMOkDQs10Wf/nmQNegcwdXnMMwGnUTXnz7+uKF ffsw== X-Gm-Message-State: AOJu0YwSpZX9ycAgfyZwjMQXxQ5PHt0SjiPIFX1h5XfS/k6IQ10f0ETr RjpGtwFP2n39HM47qo6HpjTDwg== X-Google-Smtp-Source: AGHT+IGdSjMwqhsbz44+KiD8gA7voTpvuuFsAPy+9CTgub6hQrFAaqcxjKO/IO30vietUV2rKmrNqA== X-Received: by 2002:a17:90a:34c9:b0:268:3ea0:7160 with SMTP id m9-20020a17090a34c900b002683ea07160mr13435610pjf.0.1692864323554; Thu, 24 Aug 2023 01:05:23 -0700 (PDT) Received: from localhost ([2401:fa00:8f:203:515:8b2a:90c3:b79e]) by smtp.gmail.com with UTF8SMTPSA id n14-20020a17090ac68e00b002636dfcc6f5sm966246pjt.3.2023.08.24.01.05.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 24 Aug 2023 01:05:23 -0700 (PDT) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson Cc: Yu Zhang , Isaku Yamahata , Marc Zyngier , Michael Ellerman , Peter Xu , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org, David Stevens Subject: [PATCH v8 7/8] KVM: PPC: Migrate to __kvm_follow_pfn Date: Thu, 24 Aug 2023 17:04:07 +0900 Message-ID: <20230824080408.2933205-8-stevensd@google.com> X-Mailer: git-send-email 2.42.0.rc1.204.g551eb34607-goog In-Reply-To: <20230824080408.2933205-1-stevensd@google.com> References: <20230824080408.2933205-1-stevensd@google.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: David Stevens Migrate from __gfn_to_pfn_memslot to __kvm_follow_pfn. As part of the refactoring, remove the redundant calls to get_user_page_fast_only, since the check for !async && !atomic was removed from the KVM generic code in b9b33da2aa74. Also, remove the kvm_ro parameter because the KVM generic code handles RO memslots. Signed-off-by: David Stevens --- arch/powerpc/include/asm/kvm_book3s.h | 2 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 38 +++++++++----------- arch/powerpc/kvm/book3s_64_mmu_radix.c | 50 +++++++++++--------------- arch/powerpc/kvm/book3s_hv_nested.c | 4 +-- 4 files changed, 38 insertions(+), 56 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/a= sm/kvm_book3s.h index bbf5e2c5fe09..bf48c511e700 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -202,7 +202,7 @@ extern bool kvmppc_hv_handle_set_rc(struct kvm *kvm, bo= ol nested, extern int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu, unsigned long gpa, struct kvm_memory_slot *memslot, - bool writing, bool kvm_ro, + bool writing, pte_t *inserted_pte, unsigned int *levelp); extern int kvmppc_init_vm_radix(struct kvm *kvm); extern void kvmppc_free_radix(struct kvm *kvm); diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_= 64_mmu_hv.c index 7f765d5ad436..4688046626af 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -523,6 +523,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu, unsigned long rcbits; long mmio_update; pte_t pte, *ptep; + struct kvm_follow_pfn foll =3D { + .try_map_writable =3D true, + }; =20 if (kvm_is_radix(kvm)) return kvmppc_book3s_radix_page_fault(vcpu, ea, dsisr); @@ -599,29 +602,20 @@ int kvmppc_book3s_hv_page_fault(struct kvm_vcpu *vcpu, page =3D NULL; writing =3D (dsisr & DSISR_ISSTORE) !=3D 0; /* If writing !=3D 0, then the HPTE must allow writing, if we get here */ - write_ok =3D writing; - hva =3D gfn_to_hva_memslot(memslot, gfn); =20 - /* - * Do a fast check first, since __gfn_to_pfn_memslot doesn't - * do it with !atomic && !async, which is how we call it. - * We always ask for write permission since the common case - * is that the page is writable. - */ - if (get_user_page_fast_only(hva, FOLL_WRITE, &page)) { - write_ok =3D true; - } else { - /* Call KVM generic code to do the slow-path check */ - pfn =3D __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL, - writing, &write_ok, NULL); - if (is_error_noslot_pfn(pfn)) - return -EFAULT; - page =3D NULL; - if (pfn_valid(pfn)) { - page =3D pfn_to_page(pfn); - if (PageReserved(page)) - page =3D NULL; - } + foll.slot =3D memslot; + foll.gfn =3D gfn; + foll.flags =3D FOLL_GET | (writing ? FOLL_WRITE : 0); + pfn =3D __kvm_follow_pfn(&foll); + if (is_error_noslot_pfn(pfn)) + return -EFAULT; + page =3D NULL; + write_ok =3D foll.writable; + hva =3D foll.hva; + if (pfn_valid(pfn)) { + page =3D pfn_to_page(pfn); + if (PageReserved(page)) + page =3D NULL; } =20 /* diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book= 3s_64_mmu_radix.c index 572707858d65..498f89128c3a 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -815,47 +815,39 @@ bool kvmppc_hv_handle_set_rc(struct kvm *kvm, bool ne= sted, bool writing, int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu, unsigned long gpa, struct kvm_memory_slot *memslot, - bool writing, bool kvm_ro, + bool writing, pte_t *inserted_pte, unsigned int *levelp) { struct kvm *kvm =3D vcpu->kvm; struct page *page =3D NULL; unsigned long mmu_seq; - unsigned long hva, gfn =3D gpa >> PAGE_SHIFT; - bool upgrade_write =3D false; - bool *upgrade_p =3D &upgrade_write; + unsigned long hva, pfn, gfn =3D gpa >> PAGE_SHIFT; + bool upgrade_write; pte_t pte, *ptep; unsigned int shift, level; int ret; bool large_enable; + struct kvm_follow_pfn foll =3D { + .slot =3D memslot, + .gfn =3D gfn, + .flags =3D FOLL_GET | (writing ? FOLL_WRITE : 0), + .try_map_writable =3D true, + }; =20 /* used to check for invalidations in progress */ mmu_seq =3D kvm->mmu_invalidate_seq; smp_rmb(); =20 - /* - * Do a fast check first, since __gfn_to_pfn_memslot doesn't - * do it with !atomic && !async, which is how we call it. - * We always ask for write permission since the common case - * is that the page is writable. - */ - hva =3D gfn_to_hva_memslot(memslot, gfn); - if (!kvm_ro && get_user_page_fast_only(hva, FOLL_WRITE, &page)) { - upgrade_write =3D true; - } else { - unsigned long pfn; - - /* Call KVM generic code to do the slow-path check */ - pfn =3D __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL, - writing, upgrade_p, NULL); - if (is_error_noslot_pfn(pfn)) - return -EFAULT; - page =3D NULL; - if (pfn_valid(pfn)) { - page =3D pfn_to_page(pfn); - if (PageReserved(page)) - page =3D NULL; - } + pfn =3D __kvm_follow_pfn(&foll); + if (is_error_noslot_pfn(pfn)) + return -EFAULT; + page =3D NULL; + hva =3D foll.hva; + upgrade_write =3D foll.writable; + if (pfn_valid(pfn)) { + page =3D pfn_to_page(pfn); + if (PageReserved(page)) + page =3D NULL; } =20 /* @@ -944,7 +936,6 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcp= u, struct kvm_memory_slot *memslot; long ret; bool writing =3D !!(dsisr & DSISR_ISSTORE); - bool kvm_ro =3D false; =20 /* Check for unusual errors */ if (dsisr & DSISR_UNSUPP_MMU) { @@ -997,7 +988,6 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcp= u, ea, DSISR_ISSTORE | DSISR_PROTFAULT); return RESUME_GUEST; } - kvm_ro =3D true; } =20 /* Failed to set the reference/change bits */ @@ -1015,7 +1005,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *v= cpu, =20 /* Try to insert a pte */ ret =3D kvmppc_book3s_instantiate_page(vcpu, gpa, memslot, writing, - kvm_ro, NULL, NULL); + NULL, NULL); =20 if (ret =3D=3D 0 || ret =3D=3D -EAGAIN) ret =3D RESUME_GUEST; diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_= hv_nested.c index 377d0b4a05ee..6d531051df04 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -1497,7 +1497,6 @@ static long int __kvmhv_nested_page_fault(struct kvm_= vcpu *vcpu, unsigned long n_gpa, gpa, gfn, perm =3D 0UL; unsigned int shift, l1_shift, level; bool writing =3D !!(dsisr & DSISR_ISSTORE); - bool kvm_ro =3D false; long int ret; =20 if (!gp->l1_gr_to_hr) { @@ -1577,7 +1576,6 @@ static long int __kvmhv_nested_page_fault(struct kvm_= vcpu *vcpu, ea, DSISR_ISSTORE | DSISR_PROTFAULT); return RESUME_GUEST; } - kvm_ro =3D true; } =20 /* 2. Find the host pte for this L1 guest real address */ @@ -1599,7 +1597,7 @@ static long int __kvmhv_nested_page_fault(struct kvm_= vcpu *vcpu, if (!pte_present(pte) || (writing && !(pte_val(pte) & _PAGE_WRITE))) { /* No suitable pte found -> try to insert a mapping */ ret =3D kvmppc_book3s_instantiate_page(vcpu, gpa, memslot, - writing, kvm_ro, &pte, &level); + writing, &pte, &level); if (ret =3D=3D -EAGAIN) return RESUME_GUEST; else if (ret) --=20 2.42.0.rc1.204.g551eb34607-goog From nobody Thu Dec 18 06:32:00 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B870EE49A5 for ; Thu, 24 Aug 2023 08:07:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240388AbjHXIGu (ORCPT ); Thu, 24 Aug 2023 04:06:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44958 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240492AbjHXIGW (ORCPT ); Thu, 24 Aug 2023 04:06:22 -0400 Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D9721994 for ; Thu, 24 Aug 2023 01:05:43 -0700 (PDT) Received: by mail-pj1-x102c.google.com with SMTP id 98e67ed59e1d1-26d60f27491so3037325a91.1 for ; Thu, 24 Aug 2023 01:05:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1692864329; x=1693469129; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hI/ttp/wxMkRYZ+oyQ2NmS2HxwoQO77IewL5mtwNCh0=; b=Vrd5G9upmFX23Dc2l1UOiVR7mwQuxLdJALTe566riReLawye0Iez9RByru+Lw1LGHs +bLoUARoikkBnCqoL1flcVwqreaLS334iX1gCgyZpEjqUf218Tus3zznhtCtfSXcIQ+Z YpZi1+Q5+CVa8tvqM3unr1KQ1huvUG9hP9dD4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692864329; x=1693469129; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hI/ttp/wxMkRYZ+oyQ2NmS2HxwoQO77IewL5mtwNCh0=; b=fuoizUsAmbNwPoFdMeMHgTEL3ozgdgQ4qlIYhMiv9istmzqf2rnMek8h20CGfzqPGt gWOta4qs4LN5oC8vOKkpxuyULUM/FzTRcFcUT0Vd6M1Eu+oFRs/R40JnqHwyV0djDKdH +qCdtuB9guW/hWW0e3ZMkiVbjG0pzStYdH6EXQKOr9lMHSrIkRYjxM/cYCpobJV/zPut mCvgCbWIojuvhkJCy7Vg6sN4HAWSW8xULjsWfr5oqSfWpoFobYgh6qHLXBSFLclQ2COx BcYRU3vBlAErXhtg5rOeiKJmeQw4mFkavBpDBRnOXvNPsCVZGPqHZi/Lx4wvrSNMHGGK vx+g== X-Gm-Message-State: AOJu0YwzvIn1GEzRBf9CIaBT1PRNH6jIyh4WsMIBrjDQi5jW8HGb4g15 5BEzYTEkAafueTUnWRnGV/vVlw== X-Google-Smtp-Source: AGHT+IGr4o4edi/X3Imw/dOG9514f6ovvLqJxfkblt2y6pWdfKpxCplBDdHZn5gaAedd/iurfPc6oA== X-Received: by 2002:a17:90b:3590:b0:267:f094:afcf with SMTP id mm16-20020a17090b359000b00267f094afcfmr11065174pjb.12.1692864328780; Thu, 24 Aug 2023 01:05:28 -0700 (PDT) Received: from localhost ([2401:fa00:8f:203:515:8b2a:90c3:b79e]) by smtp.gmail.com with UTF8SMTPSA id g2-20020a17090adac200b0026d034f6badsm967988pjx.38.2023.08.24.01.05.25 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 24 Aug 2023 01:05:28 -0700 (PDT) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson Cc: Yu Zhang , Isaku Yamahata , Marc Zyngier , Michael Ellerman , Peter Xu , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org, David Stevens Subject: [PATCH v8 8/8] KVM: mmu: remove __gfn_to_pfn_memslot Date: Thu, 24 Aug 2023 17:04:08 +0900 Message-ID: <20230824080408.2933205-9-stevensd@google.com> X-Mailer: git-send-email 2.42.0.rc1.204.g551eb34607-goog In-Reply-To: <20230824080408.2933205-1-stevensd@google.com> References: <20230824080408.2933205-1-stevensd@google.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: David Stevens All callers have been migrated to __kvm_follow_pfn. Signed-off-by: David Stevens --- virt/kvm/kvm_main.c | 33 --------------------------------- 1 file changed, 33 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index fa1848c6c84f..aebaf4a7340e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2728,39 +2728,6 @@ kvm_pfn_t __kvm_follow_pfn(struct kvm_follow_pfn *fo= ll) } EXPORT_SYMBOL_GPL(__kvm_follow_pfn); =20 -kvm_pfn_t __gfn_to_pfn_memslot(const struct kvm_memory_slot *slot, gfn_t g= fn, - bool atomic, bool interruptible, bool *async, - bool write_fault, bool *writable, hva_t *hva) -{ - kvm_pfn_t pfn; - struct kvm_follow_pfn foll =3D { - .slot =3D slot, - .gfn =3D gfn, - .flags =3D FOLL_GET, - .atomic =3D atomic, - .try_map_writable =3D !!writable, - }; - - if (write_fault) - foll.flags |=3D FOLL_WRITE; - if (async) - foll.flags |=3D FOLL_NOWAIT; - if (interruptible) - foll.flags |=3D FOLL_INTERRUPTIBLE; - - pfn =3D __kvm_follow_pfn(&foll); - if (pfn =3D=3D KVM_PFN_ERR_NEEDS_IO) { - *async =3D true; - pfn =3D KVM_PFN_ERR_FAULT; - } - if (hva) - *hva =3D foll.hva; - if (writable) - *writable =3D foll.writable; - return pfn; -} -EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot); - kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, bool *writable) { --=20 2.42.0.rc1.204.g551eb34607-goog