From nobody Sun May 10 19:14:14 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 653FCC433F5 for ; Wed, 27 Apr 2022 01:40:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356859AbiD0Bn1 (ORCPT ); Tue, 26 Apr 2022 21:43:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236166AbiD0BnW (ORCPT ); Tue, 26 Apr 2022 21:43:22 -0400 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14EE765FC for ; Tue, 26 Apr 2022 18:40:12 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id j24-20020a637a58000000b003ab45990247so169499pgn.17 for ; Tue, 26 Apr 2022 18:40:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=fm2jFbot54PvFH20t4TxzybP3OE84ejhq1XxWwapvtM=; b=ILHHX56EFCxezIq3EQk2itS0pXFDo1O6lEKmmvkKR5gKRVOl2Y+a4cJXIzRR+zJaTI mcTvkTLdjSSKxzcNk+wvZqMLVYKVUvmqX5Ar8l913XfsqrlF3UGpDeHKdnfL9GqzfLdU nnE1FIo6IZvCB9kbYWzaJvNDeEb9KHxLvej6/tAWNl274K8TDUgqG7u2YsfP8oIuqINE 7d4eX0CnNCJINrjYW+DvVx19wlg0t2ar56EG9pze/viUxCWpaOu7ypXXah2+7HMmhtti nmX55ONv5BW7spf9BUS/qWHpSrJ6+jGLuU4cfuH/Cmu5Xbg32MdrBE4dGFdDfbRFmq/W YfyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=fm2jFbot54PvFH20t4TxzybP3OE84ejhq1XxWwapvtM=; b=me+FODgJuNXu2BboV8OFOE7E1zPHLpPG7zPDS0bLCw7hU0BALvLtcPMgrIpeUf87Yo L3Yy5kWVK6W9UwQ9T9GKiRJ2kYrmnK6Lu5Fceaci3Wcj/1zH5K21z+73Kif6gnDwUlNa 2C9HXZp8FSQk1QI9JtKuK1gRofByGI7eaLGowndHRL0F+sY1EmZaVVrgjFIeo0Hlyn7K eZWJsY8EeIle4ZP5aPhFRb5wIxJ/SdLp4uHKT84d9d+5lcilHrS6nvGXesezaaby7QR4 Fq1mM4db6E13tlvUpMhertjGvgZQDO1gaiy9EgoCWoUuC5DC7EBotBbts9z3qdVKCQww b0zw== X-Gm-Message-State: AOAM531ZcNrx3wrR0YNFac3Qy1zaizbNFCAuydenIoPiqGR0b1iF/14z 7idLub7MqdiXPEYIaPSIWLO3iWrZ/L8= X-Google-Smtp-Source: ABdhPJwUekR1b6h0Z9yFRwjtbUYBs62GSJESl9pr1sbEnNpewT52TkTR2aZh+tu7EXL0TS1Cfa7TQ9YX2B8= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a17:902:b590:b0:153:a243:3331 with SMTP id a16-20020a170902b59000b00153a2433331mr27227659pls.129.1651023611573; Tue, 26 Apr 2022 18:40:11 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 27 Apr 2022 01:39:57 +0000 In-Reply-To: <20220427014004.1992589-1-seanjc@google.com> Message-Id: <20220427014004.1992589-2-seanjc@google.com> Mime-Version: 1.0 References: <20220427014004.1992589-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v2 1/8] Revert "KVM: Do not speculatively mark pfn cache valid to "fix" race" From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Woodhouse , Mingwei Zhang , Maxim Levitsky Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This reverts commit 55111927df1cd140aa7b7ea3f33f524b87776381. Signed-off-by: Sean Christopherson --- virt/kvm/pfncache.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index 72eee096a7cd..71c84a43024c 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -81,8 +81,6 @@ bool kvm_gfn_to_pfn_cache_check(struct kvm *kvm, struct g= fn_to_pfn_cache *gpc, { struct kvm_memslots *slots =3D kvm_memslots(kvm); =20 - lockdep_assert_held_read(&gpc->lock); - if ((gpa & ~PAGE_MASK) + len > PAGE_SIZE) return false; =20 @@ -228,6 +226,11 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, stru= ct gfn_to_pfn_cache *gpc, if (!old_valid || old_uhva !=3D gpc->uhva) { void *new_khva =3D NULL; =20 + /* Placeholders for "hva is valid but not yet mapped" */ + gpc->pfn =3D KVM_PFN_ERR_FAULT; + gpc->khva =3D NULL; + gpc->valid =3D true; + new_pfn =3D hva_to_pfn_retry(kvm, gpc); if (is_error_noslot_pfn(new_pfn)) { ret =3D -EFAULT; @@ -256,7 +259,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struc= t gfn_to_pfn_cache *gpc, gpc->pfn =3D KVM_PFN_ERR_FAULT; gpc->khva =3D NULL; } else { - gpc->valid =3D true; + /* At this point, gpc->valid may already have been cleared */ gpc->pfn =3D new_pfn; gpc->khva =3D new_khva; } --=20 2.36.0.rc2.479.g8af0fa9b8e-goog From nobody Sun May 10 19:14:14 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 269EFC433EF for ; Wed, 27 Apr 2022 01:40:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356850AbiD0Bnc (ORCPT ); Tue, 26 Apr 2022 21:43:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56466 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356843AbiD0BnX (ORCPT ); Tue, 26 Apr 2022 21:43:23 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7E42B6451 for ; Tue, 26 Apr 2022 18:40:13 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id m8-20020a62a208000000b0050593296139so258400pff.1 for ; Tue, 26 Apr 2022 18:40:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=oGKJMKmQneEXFvesIdP/MdOu2MYms+4EU8OAtreP5tw=; b=VVbk0y+Yg3Z/d5NJ0SVmLfxqKkP+E0RlJFQiKLDYUIOV8wIPlchPjI/bZjT7LKFYGR iJBcm5lf+7XMepxvtsNgKC1TFySPLCP42B5n+tYt4LaLHx07JQ+1BfW/eES3NLCqghjI 6TyEr7hBZB3DRjZ7bghcvQh4A1O55UQDoeV1Ev45E7wB+KcjeH/UZwImWcXO3D1DPubc NZEENsz2EiJOfHpz1L/2nK8SVxNs48Yb5sl8HZJ2UKBeiimLnzP0rMtj1GInpVHviKtj 3AfFK5I9R/qD9QOtb4QA37JnzYgJMCx6S5H07t+l+IvjVQjOsgbpAv5eS8wzuYbr4QKg 4YaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=oGKJMKmQneEXFvesIdP/MdOu2MYms+4EU8OAtreP5tw=; b=JuXqtoi7YOYYCl9lQKilBoBBwUg7IRkYC/1s2hYgtKjfGFa7Le+kSpWNTD8GsxYiyN WHc6owIv8iXSLoh30HiZy2VGIV6DvhuwYDGSiTf1nI+DAyypMaN3eux11Cdyll5eFUD5 F9/qwHM3qN2OLLoWiDUPQAqUAN2PL8Z/K6oZBzFrh3AilCy+aY3iJXSGOuN+imFGZxFq odPzSzWFbrbCw7AFW7S8zjT9Rpia3qs0Rq1X+Q1v7nNvqrr1AMuxbhHwW6/PdvaTZ4ve CdIfqQi9snwOnAFezZKWTb5hEwfVF91ydrcAEixEsRLFRzN0sntgifayhdYozTwgSDkJ EimQ== X-Gm-Message-State: AOAM533WNenNja7caQ/cGBDpvzD5MtAzmoKoM4yyOsvdbWwZCalRvQhw YVwcLD7B7acHYPjX5nV7gziwtp3JgyM= X-Google-Smtp-Source: ABdhPJx84NyNHGnmdODJUP+9o9+gUdbBIr6blpX8ebjFy0IgXNEdkmGSMzwohG0/rHcA8Wj7gsF4yURaNNw= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a17:903:2443:b0:15d:422a:d596 with SMTP id l3-20020a170903244300b0015d422ad596mr4321269pls.160.1651023613020; Tue, 26 Apr 2022 18:40:13 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 27 Apr 2022 01:39:58 +0000 In-Reply-To: <20220427014004.1992589-1-seanjc@google.com> Message-Id: <20220427014004.1992589-3-seanjc@google.com> Mime-Version: 1.0 References: <20220427014004.1992589-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v2 2/8] Revert "KVM: Fix race between mmu_notifier invalidation and pfncache refresh" From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Woodhouse , Mingwei Zhang , Maxim Levitsky Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This reverts commit c496097d2c0bdc229f82d72b4b1e55d64974c316. Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 9 ------ virt/kvm/pfncache.c | 70 ++++++++++++++------------------------------- 2 files changed, 21 insertions(+), 58 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 0848430f36c6..dfb7dabdbc63 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -705,15 +705,6 @@ static int kvm_mmu_notifier_invalidate_range_start(str= uct mmu_notifier *mn, kvm->mn_active_invalidate_count++; spin_unlock(&kvm->mn_invalidate_lock); =20 - /* - * Invalidate pfn caches _before_ invalidating the secondary MMUs, i.e. - * before acquiring mmu_lock, to avoid holding mmu_lock while acquiring - * each cache's lock. There are relatively few caches in existence at - * any given time, and the caches themselves can check for hva overlap, - * i.e. don't need to rely on memslot overlap checks for performance. - * Because this runs without holding mmu_lock, the pfn caches must use - * mn_active_invalidate_count (see above) instead of mmu_notifier_count. - */ gfn_to_pfn_cache_invalidate_start(kvm, range->start, range->end, hva_range.may_block); =20 diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index 71c84a43024c..dd84676615f1 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -112,63 +112,29 @@ static void __release_gpc(struct kvm *kvm, kvm_pfn_t = pfn, void *khva, gpa_t gpa) } } =20 -static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, struct gfn_to_pfn_cache= *gpc) +static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, unsigned long uhva) { - bool first_attempt =3D true; unsigned long mmu_seq; kvm_pfn_t new_pfn; + int retry; =20 - lockdep_assert_held_write(&gpc->lock); - - for (;;) { + do { mmu_seq =3D kvm->mmu_notifier_seq; smp_rmb(); =20 - write_unlock_irq(&gpc->lock); - - /* Opportunistically check for resched while the lock isn't held. */ - if (!first_attempt) - cond_resched(); - /* We always request a writeable mapping */ - new_pfn =3D hva_to_pfn(gpc->uhva, false, NULL, true, NULL); - - write_lock_irq(&gpc->lock); - + new_pfn =3D hva_to_pfn(uhva, false, NULL, true, NULL); if (is_error_noslot_pfn(new_pfn)) break; =20 - first_attempt =3D false; - - /* - * Wait for mn_active_invalidate_count, not mmu_notifier_count, - * to go away, as the invalidation in the mmu_notifier event - * occurs _before_ mmu_notifier_count is elevated. - * - * Note, mn_active_invalidate_count can change at any time as - * it's not protected by gpc->lock. But, it is guaranteed to - * be elevated before the mmu_notifier acquires gpc->lock, and - * isn't dropped until after mmu_notifier_seq is updated. So, - * this task may get a false positive of sorts, i.e. see an - * elevated count and wait even though it's technically safe to - * proceed (becase the mmu_notifier will invalidate the cache - * _after_ it's refreshed here), but the cache will never be - * refreshed with stale data, i.e. won't get false negatives. - */ - if (kvm->mn_active_invalidate_count) - continue; - - /* - * Ensure mn_active_invalidate_count is read before - * mmu_notifier_seq. This pairs with the smp_wmb() in - * mmu_notifier_invalidate_range_end() to guarantee either the - * old (non-zero) value of mn_active_invalidate_count or the - * new (incremented) value of mmu_notifier_seq is observed. - */ - smp_rmb(); - if (kvm->mmu_notifier_seq =3D=3D mmu_seq) + KVM_MMU_READ_LOCK(kvm); + retry =3D mmu_notifier_retry_hva(kvm, mmu_seq, uhva); + KVM_MMU_READ_UNLOCK(kvm); + if (!retry) break; - } + + cond_resched(); + } while (1); =20 return new_pfn; } @@ -224,6 +190,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struc= t gfn_to_pfn_cache *gpc, * drop the lock and do the HVA to PFN lookup again. */ if (!old_valid || old_uhva !=3D gpc->uhva) { + unsigned long uhva =3D gpc->uhva; void *new_khva =3D NULL; =20 /* Placeholders for "hva is valid but not yet mapped" */ @@ -231,10 +198,15 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, str= uct gfn_to_pfn_cache *gpc, gpc->khva =3D NULL; gpc->valid =3D true; =20 - new_pfn =3D hva_to_pfn_retry(kvm, gpc); + write_unlock_irq(&gpc->lock); + + new_pfn =3D hva_to_pfn_retry(kvm, uhva); if (is_error_noslot_pfn(new_pfn)) { ret =3D -EFAULT; - } else if (gpc->usage & KVM_HOST_USES_PFN) { + goto map_done; + } + + if (gpc->usage & KVM_HOST_USES_PFN) { if (new_pfn =3D=3D old_pfn) { new_khva =3D old_khva; old_pfn =3D KVM_PFN_ERR_FAULT; @@ -250,10 +222,10 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, str= uct gfn_to_pfn_cache *gpc, new_khva +=3D page_offset; else ret =3D -EFAULT; - } else { - /* Nothing more to do, the pfn is consumed only by the guest. */ } =20 + map_done: + write_lock_irq(&gpc->lock); if (ret) { gpc->valid =3D false; gpc->pfn =3D KVM_PFN_ERR_FAULT; --=20 2.36.0.rc2.479.g8af0fa9b8e-goog From nobody Sun May 10 19:14:14 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44EA2C433EF for ; Wed, 27 Apr 2022 01:40:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356890AbiD0Bne (ORCPT ); Tue, 26 Apr 2022 21:43:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56580 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356851AbiD0BnZ (ORCPT ); Tue, 26 Apr 2022 21:43:25 -0400 Received: from mail-pf1-x449.google.com (mail-pf1-x449.google.com [IPv6:2607:f8b0:4864:20::449]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72E32B85B for ; Tue, 26 Apr 2022 18:40:15 -0700 (PDT) Received: by mail-pf1-x449.google.com with SMTP id g5-20020a62e305000000b0050d2dba0c5dso244451pfh.8 for ; Tue, 26 Apr 2022 18:40:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=Opu+r9cOj6vT5KVW7KE10AooN5kbCr3+qYloyBaXyY0=; b=pCoA5OZvIkGwUDXcUDFUnJWZ4yNderLkQHzBEEJnoDg4MLvr+dbdQPnq9p+ywSaqsh LMWwqpDiuV9k7JKmTpIdQD5k7cwQEr7hvvsgh6aiN8TiSwC9QO6kk1xYncpE7raOsBne owTGoiSMe+lPp6h/S3/pfPUvftLqjxzLkgkkNEn8UJBG/BKLCjGYFyUDRjjVX6A/eTPe gY/oLZJXNOJTmJ5MN9NsnzaycIEMHMM41BGNvrhOmn0OfPWwTcBPj4urwHa+UVAuI59k p1ykzFr5pi2UUhJNIWf1l7dsMPQcPRj4uPWjqv4SoFDPWlCuMLLnExc9A5Euke3FFD4a XIWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=Opu+r9cOj6vT5KVW7KE10AooN5kbCr3+qYloyBaXyY0=; b=yFCrB/PSaAP8z9GqWtsV9PkHcBLBoHB/s84/QJsGgCLfDGxdSVmQhfKpHDrdSwn/X1 6ot8U8h5aCKVrmp22EuO82mAonZ8r0oun3WQMURreAer5/aiKkH9sqjhzQRuRbsWxmgr l6Npyta1+Iz57ID0e8tze9vXjgrFBtubncM6bDeiu/5whwJcesL6ANRsPS+sm7ExMN/J sIK0K5QKTsbKJ9HaIAaxMz9zS77r7iXRCt3sQl69ho0aqGJyzkNIWzFA9Y5Ba7tnX1Rk Rqs7HKKK4al3J/ts2cr4dGyCzuKXk7+WPAqxeqAQRgu9zHx202M0K6t83ZxWXOowyV7A u/Bw== X-Gm-Message-State: AOAM5318+NxZxLtMEe2r8qckOD9ddY+TxupHV+vpwVUtL/xOO0QhIXbc /29dmFi0fueIStmu5wx5CjI9UcACJ+A= X-Google-Smtp-Source: ABdhPJyJqSr1Vq440G+8O1KTekBdnigW1j3j4RPrpD5Zi5tno8mWT29k/0lTWCZWBPDEcV0Avyu7dUCzW/M= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a05:6a00:ad0:b0:4f7:a357:6899 with SMTP id c16-20020a056a000ad000b004f7a3576899mr27437539pfl.80.1651023614625; Tue, 26 Apr 2022 18:40:14 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 27 Apr 2022 01:39:59 +0000 In-Reply-To: <20220427014004.1992589-1-seanjc@google.com> Message-Id: <20220427014004.1992589-4-seanjc@google.com> Mime-Version: 1.0 References: <20220427014004.1992589-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v2 3/8] KVM: Drop unused @gpa param from gfn=>pfn cache's __release_gpc() helper From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Woodhouse , Mingwei Zhang , Maxim Levitsky Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Drop the @pga param from __release_gpc() and rename the helper to make it more obvious that the cache itself is not being released. The helper will be reused by a future commit to release a pfn+khva combination that is _never_ associated with the cache, at which point the current name would go from slightly misleading to blatantly wrong. No functional change intended. Signed-off-by: Sean Christopherson --- virt/kvm/pfncache.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index dd84676615f1..e05a6a1b8eff 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -95,7 +95,7 @@ bool kvm_gfn_to_pfn_cache_check(struct kvm *kvm, struct g= fn_to_pfn_cache *gpc, } EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_check); =20 -static void __release_gpc(struct kvm *kvm, kvm_pfn_t pfn, void *khva, gpa_= t gpa) +static void gpc_release_pfn_and_khva(struct kvm *kvm, kvm_pfn_t pfn, void = *khva) { /* Unmap the old page if it was mapped before, and release it */ if (!is_error_noslot_pfn(pfn)) { @@ -146,7 +146,6 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struc= t gfn_to_pfn_cache *gpc, unsigned long page_offset =3D gpa & ~PAGE_MASK; kvm_pfn_t old_pfn, new_pfn; unsigned long old_uhva; - gpa_t old_gpa; void *old_khva; bool old_valid; int ret =3D 0; @@ -160,7 +159,6 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struc= t gfn_to_pfn_cache *gpc, =20 write_lock_irq(&gpc->lock); =20 - old_gpa =3D gpc->gpa; old_pfn =3D gpc->pfn; old_khva =3D gpc->khva - offset_in_page(gpc->khva); old_uhva =3D gpc->uhva; @@ -244,7 +242,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struc= t gfn_to_pfn_cache *gpc, out: write_unlock_irq(&gpc->lock); =20 - __release_gpc(kvm, old_pfn, old_khva, old_gpa); + gpc_release_pfn_and_khva(kvm, old_pfn, old_khva); =20 return ret; } @@ -254,14 +252,12 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, stru= ct gfn_to_pfn_cache *gpc) { void *old_khva; kvm_pfn_t old_pfn; - gpa_t old_gpa; =20 write_lock_irq(&gpc->lock); =20 gpc->valid =3D false; =20 old_khva =3D gpc->khva - offset_in_page(gpc->khva); - old_gpa =3D gpc->gpa; old_pfn =3D gpc->pfn; =20 /* @@ -273,7 +269,7 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct= gfn_to_pfn_cache *gpc) =20 write_unlock_irq(&gpc->lock); =20 - __release_gpc(kvm, old_pfn, old_khva, old_gpa); + gpc_release_pfn_and_khva(kvm, old_pfn, old_khva); } EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_unmap); =20 --=20 2.36.0.rc2.479.g8af0fa9b8e-goog From nobody Sun May 10 19:14:14 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB7CFC433EF for ; Wed, 27 Apr 2022 01:40:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356916AbiD0Bnl (ORCPT ); Tue, 26 Apr 2022 21:43:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356860AbiD0Bn1 (ORCPT ); Tue, 26 Apr 2022 21:43:27 -0400 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D277EDF56 for ; Tue, 26 Apr 2022 18:40:16 -0700 (PDT) Received: by mail-pj1-x1049.google.com with SMTP id m6-20020a17090a730600b001d9041534e4so138377pjk.7 for ; Tue, 26 Apr 2022 18:40:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=rawRdx79XoHw8u43omV3BLmCVZwqFd6EN+RncnsvFlQ=; b=Wy77UfIaH1Mz9LQ3+COp24Uf6HTFpZN3jikhvs5fDCb3iOPjgt5BDq5JW28/5uZEgW EfhDviZS9N2Rj9bFTfA2iNGMgNU0M9nHiAUnW16ps6fAaocJBj1JFneqZOYBNnH0/D2o v0RwN8ACclFIH5cHcM4vRBa3fTFlm1cgcGo6O+BP0xyTbp0LHTVecQhWa/mX7GfSj9s9 WL8GrinfpesY+cgz5KFFUpm76wbH4242+dMvlTM5l9y/Flz4ffOG+o7khGfNHZDpAKFH Y8F4dIVBmZOPCGCR1dxkxSNv93kBBXZZSnt1urN6U1Mk0+7lpfW1wI0RETVqjnc5Ut8i M2jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=rawRdx79XoHw8u43omV3BLmCVZwqFd6EN+RncnsvFlQ=; b=iPFuvoOjejKZlp3JYF46RJSchyGfGTTDEMcbxAK22o8U5zPGp8VfiVE6OAqJavMRh0 xjtfKHdty0t3D9HOAoQ1xd8c5a45O2iS4aG8LuX5Dk+xPu0Q1y9d73qXhiwUuYyYOOfd 8GASWmCHXOFzV7j+7ro0Yi9Vp01Hmzv1Hzekrk4+bUfy86ie3/Bur/ne6I3t4OSVQ1TQ tHRKka3QJUPIF7imBVJJkX0p+6I1fz6qfSVL3MTBjlDqF71Q7FR9TsZJ7LUk0qJRO2FB oJUwovdLi2aCBEYKbCvkc60bIe/+OhZhrLYWBElQgdVM74Hj/rEbc4Z1xcZ9Z2sCwk/R 3qAg== X-Gm-Message-State: AOAM5305SniRd29rEnR1Z/JScFd6E/WMTqESP5dPEwlQD5+qECfGMsrT YDqN0z9xByn8PnE0L6tQ3RJbL73Ic7s= X-Google-Smtp-Source: ABdhPJxHu98OqKrbCDEoXrAFUU0bOj8alASiiDJ2JP3PBWJVsUZI7J3kcLtf88ASn3gUX5Vp4RckwrphTes= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a17:90a:cf:b0:1d9:44ad:2607 with SMTP id v15-20020a17090a00cf00b001d944ad2607mr20399981pjd.25.1651023616264; Tue, 26 Apr 2022 18:40:16 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 27 Apr 2022 01:40:00 +0000 In-Reply-To: <20220427014004.1992589-1-seanjc@google.com> Message-Id: <20220427014004.1992589-5-seanjc@google.com> Mime-Version: 1.0 References: <20220427014004.1992589-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v2 4/8] KVM: Put the extra pfn reference when reusing a pfn in the gpc cache From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Woodhouse , Mingwei Zhang , Maxim Levitsky Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Put the struct page reference to pfn acquired by hva_to_pfn() when the old and new pfns for a gfn=3D>pfn cache match. The cache already has a reference via the old/current pfn, and will only put one reference when the cache is done with the pfn. Fixes: 982ed0de4753 ("KVM: Reinstate gfn_to_pfn_cache with invalidation sup= port") Signed-off-by: Sean Christopherson --- virt/kvm/pfncache.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index e05a6a1b8eff..40cbe90d52e0 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -206,6 +206,14 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, stru= ct gfn_to_pfn_cache *gpc, =20 if (gpc->usage & KVM_HOST_USES_PFN) { if (new_pfn =3D=3D old_pfn) { + /* + * Reuse the existing pfn and khva, but put the + * reference acquired hva_to_pfn_retry(); the + * cache still holds a reference to the pfn + * from the previous refresh. + */ + gpc_release_pfn_and_khva(kvm, new_pfn, NULL); + new_khva =3D old_khva; old_pfn =3D KVM_PFN_ERR_FAULT; old_khva =3D NULL; --=20 2.36.0.rc2.479.g8af0fa9b8e-goog From nobody Sun May 10 19:14:14 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31EFDC433F5 for ; Wed, 27 Apr 2022 01:41:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356898AbiD0BoP (ORCPT ); Tue, 26 Apr 2022 21:44:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356877AbiD0Bnc (ORCPT ); Tue, 26 Apr 2022 21:43:32 -0400 Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com [IPv6:2607:f8b0:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC116E0D4 for ; Tue, 26 Apr 2022 18:40:18 -0700 (PDT) Received: by mail-pg1-x549.google.com with SMTP id i188-20020a636dc5000000b003c143f97bc2so171901pgc.11 for ; Tue, 26 Apr 2022 18:40:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=oIuDd5cR02hrHErNclDHZMSuVoVOazN+F6VIuRriZDA=; b=LFic/f7rom8f+3sEzUp8LncT/ayhH/Id71SNRHr2XaRLR2TbdzmR7eBMyTACE8hyLO ehKTK6h/Lhd8ADH+GgmSkM8ZI5l2RI5D/Sr1w2R9Fo4qiNoukbb0Pt5HG+B7izJylETm Da65BQ/6ZDweC74dSvEtERft9mGiw1Di4/rDH9iBUEfIhIzsl1/iQBcOAfdjjzCfcfz7 QTPKsdtKcTHeJqvRJ8XGFdXuKOcqt6FxOVglXcNzxlQ/f9d2srgft2g74XGwV6O9AFE8 rgi4nBW4G5S+qS2cHwgngAHUlWvaHOKRHYoltuU16p0yhfj8ilCCvLgBifuxns3S9LPX Sjig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=oIuDd5cR02hrHErNclDHZMSuVoVOazN+F6VIuRriZDA=; b=B1ZrJWXMx7c69xhEMmm2oQVxjIei165Wt3LHOWd7p+QiIL/wyKZtYbuY9LhsDanUVg 0jPDYdBth4+7yVV5WYF0QcsjaxSJDlKt8a3BSWYJyJ/twrcT2CZjHTJFmDIFtpqcseIt egyOa78gBiUfKQsDAzNK1TIC5z529pFFAgP2R6h2TjbKw+O7vYPUi4OXnACHTuQG4QBV yHouf3Nu7wcWde/334JEVGDiY/uBA7NCffzQe7WQ7zNL4GO79UKeTl82xsrPcf/Fleoa DAzlTfm+0ormbWQFT1Ed3gggw3iHyxo65w5bCKyNVW+EUXNjLdr3AkeBaU0ZWPDLFGtk TF7g== X-Gm-Message-State: AOAM530KRh/7estWThBL/lpd27+hcighcyGQSLunfiYc493crvfsEAd2 WwtNkyO19DM3sf05dDmaW8vo8vFxP7o= X-Google-Smtp-Source: ABdhPJxL0wfOOjc+DR3645JlGmw0wdarGu9sglFlqvJiGN+pgAEHoaVjWlH4Mk4mSs95mgr4A1GI9T4GYnY= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a17:90a:b78d:b0:1d9:4f4f:bc2a with SMTP id m13-20020a17090ab78d00b001d94f4fbc2amr19717628pjr.155.1651023617849; Tue, 26 Apr 2022 18:40:17 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 27 Apr 2022 01:40:01 +0000 In-Reply-To: <20220427014004.1992589-1-seanjc@google.com> Message-Id: <20220427014004.1992589-6-seanjc@google.com> Mime-Version: 1.0 References: <20220427014004.1992589-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v2 5/8] KVM: Do not incorporate page offset into gfn=>pfn cache user address From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Woodhouse , Mingwei Zhang , Maxim Levitsky Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Don't adjust the userspace address in the gfn=3D>pfn cache by the page offset from the gpa. KVM should never use the user address directly, and all KVM operations that translate a user address to something else require the user address to be page aligned. Ignoring the offset will allow the cache to reuse a gfn=3D>hva translation in the unlikely event that the page offset of the gpa changes, but the gfn does not. Signed-off-by: Sean Christopherson --- virt/kvm/pfncache.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index 40cbe90d52e0..05cb0bcbf662 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -179,8 +179,6 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struc= t gfn_to_pfn_cache *gpc, ret =3D -EFAULT; goto out; } - - gpc->uhva +=3D page_offset; } =20 /* --=20 2.36.0.rc2.479.g8af0fa9b8e-goog From nobody Sun May 10 19:14:14 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21E94C433EF for ; Wed, 27 Apr 2022 01:41:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356921AbiD0Bnt (ORCPT ); Tue, 26 Apr 2022 21:43:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56984 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356884AbiD0Bnd (ORCPT ); Tue, 26 Apr 2022 21:43:33 -0400 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CD5C713F86 for ; Tue, 26 Apr 2022 18:40:19 -0700 (PDT) Received: by mail-pj1-x1049.google.com with SMTP id oo16-20020a17090b1c9000b001c6d21e8c04so2513016pjb.4 for ; Tue, 26 Apr 2022 18:40:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc:content-transfer-encoding; bh=8TMYzuplXVBJjftIf0wWJwIwTvZInrXR9aSkWsO9tM8=; b=pR21KqiH+heq1AXiJUy5J/tRw6+j/6AeYxYPc05VIqlwxRwT+13vU4jLMlbVMO4Tt4 rFRt4eUip8LFde3wtT9xX4AnHsRDes3ri7bhxY2X44KokV/Y4gY9OtfDkAD3Jzn+q62A YTkANdzuqo7oMgiJX+wr/jmDEdTgJ2wbTADbC3ltNJKSJdPp85Cw0RcD3vISL7tkf34L oJ3BEZ9yRP6G6cmA3VkHaBaQ24ET6kA7nu1Ud90wUPkZDtVTaC6QBWsnlhRijEySN73Z W/pktQhX5oj7Ug4vKS4Ql/UOwBCe1fuBiBrsN24cBqRpKgCdnL27GWZB394p0dWa/k7M I39g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc :content-transfer-encoding; bh=8TMYzuplXVBJjftIf0wWJwIwTvZInrXR9aSkWsO9tM8=; b=BUfImFqUNRzkl23k5cRz/aRPXuKGJf4abgby70y8aCw9oyHbUpeuKrc5ow4G/7e/fW 4oEmKYCnOurHjbS7raqTR8hL897pf6LXmQfPSF+H8l6OtbhogoFTQx/7eobhZxl9zQRT Z1eHRcTLatMy+NXXCjI6draNSRo/89A7Vjn8yk8x0GdZsPgvhhAcUBVJ0Lq7ZVefLULJ 15pmFPUgjmgITceA4ya5skQf3cbcWF+rfDwDSNAxj8jDIqzosMRXqND5eZxgI+AG45P0 WOdXWmBc3vBu/X9nOco5Pf70zS5hkC4N8yN0hCQY3N0m8R77nUipa/xxYaTc49QUdY+/ Fv+w== X-Gm-Message-State: AOAM532dS7PhWOAKsjT7I1pECjWAQZFLnTJ15wCxPe2wik8ZFjDZ8Abi AQJ6H2wU17TTFnS3wNF9CPBvcLXIyeM= X-Google-Smtp-Source: ABdhPJxu7HQKSDyKxDWejft8wIx2gqh6LYqranx0Clfvo3O3OyjbgQPAEniOr3qipQk1Ydjn56mi8llXxDY= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a05:6a00:1502:b0:50d:8077:18bc with SMTP id q2-20020a056a00150200b0050d807718bcmr1105402pfu.63.1651023619319; Tue, 26 Apr 2022 18:40:19 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 27 Apr 2022 01:40:02 +0000 In-Reply-To: <20220427014004.1992589-1-seanjc@google.com> Message-Id: <20220427014004.1992589-7-seanjc@google.com> Mime-Version: 1.0 References: <20220427014004.1992589-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v2 6/8] KVM: Fix multiple races in gfn=>pfn cache refresh From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Woodhouse , Mingwei Zhang , Maxim Levitsky Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Rework the gfn=3D>pfn cache (gpc) refresh logic to address multiple races between the cache itself, and between the cache and mmu_notifier events. The existing refresh code attempts to guard against races with the mmu_notifier by speculatively marking the cache valid, and then marking it invalid if a mmu_notifier invalidation occurs. That handles the case where an invalidation occurs between dropping and re-acquiring gpc->lock, but it doesn't handle the scenario where the cache is refreshed after the cache was invalidated by the notifier, but before the notifier elevates mmu_notifier_count. The gpc refresh can't use the "retry" helper as its invalidation occurs _before_ mmu_notifier_count is elevated and before mmu_notifier_range_start is set/updated. CPU0 CPU1 ---- ---- gfn_to_pfn_cache_invalidate_start() | -> gpc->valid =3D false; kvm_gfn_to_pfn_cache_refresh() | |-> gpc->valid =3D true; hva_to_pfn_retry() | -> acquire kvm->mmu_lock kvm->mmu_notifier_count =3D=3D= 0 mmu_seq =3D=3D kvm->mmu_notifi= er_seq drop kvm->mmu_lock return pfn 'X' acquire kvm->mmu_lock kvm_inc_notifier_count() drop kvm->mmu_lock() kernel frees pfn 'X' kvm_gfn_to_pfn_cache_check() | |-> gpc->valid =3D=3D true caller accesses freed pfn 'X' Key off of mn_active_invalidate_count to detect that a pfncache refresh needs to wait for an in-progress mmu_notifier invalidation. While mn_active_invalidate_count is not guaranteed to be stable, it is guaranteed to be elevated prior to an invalidation acquiring gpc->lock, so either the refresh will see an active invalidation and wait, or the invalidation will run after the refresh completes. Speculatively marking the cache valid is itself flawed, as a concurrent kvm_gfn_to_pfn_cache_check() would see a valid cache with stale pfn/khva values. The KVM Xen use case explicitly allows/wants multiple users; even though the caches are allocated per vCPU, __kvm_xen_has_interrupt() can read a different vCPU (or vCPUs). Address this race by invalidating the cache prior to dropping gpc->lock (this is made possible by fixing the above mmu_notifier race). Finally, the refresh logic doesn't protect against concurrent refreshes with different GPAs (which may or may not be a desired use case, but its allowed in the code), nor does it protect against a false negative on the memslot generation. If the first refresh sees a stale memslot generation, it will refresh the hva and generation before moving on to the hva=3D>pfn translation. If it then drops gpc->lock, a different user can come along, acquire gpc->lock, see that the memslot generation is fresh, and skip the hva=3D>pfn update due to the userspace address also matching (because it too was updated). Address this race by adding an "in-progress" flag so that the refresh that acquires gpc->lock first runs to completion before other users can start their refresh. Complicating all of this is the fact that both the hva=3D>pfn resolution and mapping of the kernel address can sleep, i.e. must be done outside of gpc->lock Fix the above races in one fell swoop, trying to fix each individual race in a sane manner is impossible, for all intents and purposes. Fixes: 982ed0de4753 ("KVM: Reinstate gfn_to_pfn_cache with invalidation sup= port") Cc: David Woodhouse Cc: Mingwei Zhang Cc: Maxim Levitsky Signed-off-by: Sean Christopherson --- include/linux/kvm_types.h | 1 + virt/kvm/kvm_main.c | 9 ++ virt/kvm/pfncache.c | 209 +++++++++++++++++++++++++------------- 3 files changed, 148 insertions(+), 71 deletions(-) diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index ac1ebb37a0ff..83dcb97dddf1 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -74,6 +74,7 @@ struct gfn_to_pfn_cache { enum pfn_cache_usage usage; bool active; bool valid; + bool refresh_in_progress; }; =20 #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index dfb7dabdbc63..0848430f36c6 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -705,6 +705,15 @@ static int kvm_mmu_notifier_invalidate_range_start(str= uct mmu_notifier *mn, kvm->mn_active_invalidate_count++; spin_unlock(&kvm->mn_invalidate_lock); =20 + /* + * Invalidate pfn caches _before_ invalidating the secondary MMUs, i.e. + * before acquiring mmu_lock, to avoid holding mmu_lock while acquiring + * each cache's lock. There are relatively few caches in existence at + * any given time, and the caches themselves can check for hva overlap, + * i.e. don't need to rely on memslot overlap checks for performance. + * Because this runs without holding mmu_lock, the pfn caches must use + * mn_active_invalidate_count (see above) instead of mmu_notifier_count. + */ gfn_to_pfn_cache_invalidate_start(kvm, range->start, range->end, hva_range.may_block); =20 diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index 05cb0bcbf662..b1665d0e6c32 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -112,31 +112,122 @@ static void gpc_release_pfn_and_khva(struct kvm *kvm= , kvm_pfn_t pfn, void *khva) } } =20 -static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, unsigned long uhva) +static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, struct gfn_to_pfn_cache= *gpc) { + /* Note, the new page offset may be different than the old! */ + void *old_khva =3D gpc->khva - offset_in_page(gpc->khva); + kvm_pfn_t new_pfn =3D KVM_PFN_ERR_FAULT; + void *new_khva =3D NULL; unsigned long mmu_seq; - kvm_pfn_t new_pfn; - int retry; =20 - do { + lockdep_assert_held_write(&gpc->lock); + + /* + * Invalidate the cache prior to dropping gpc->lock, gpc->uhva has + * already been updated and so a concurrent refresh from a different + * task will not detect that gpa/uhva changed. + */ + gpc->valid =3D false; + + for (;;) { mmu_seq =3D kvm->mmu_notifier_seq; smp_rmb(); =20 + write_unlock_irq(&gpc->lock); + + /* + * If the previous iteration "failed" due to an mmu_notifier + * event, release the pfn and unmap the kernel virtual address + * from the previous attempt. Unmapping might sleep, so this + * needs to be done after dropping the lock. Opportunistically + * check for resched while the lock isn't held. + */ + if (new_pfn !=3D KVM_PFN_ERR_FAULT) { + /* + * Keep the mapping if the previous iteration reused + * the existing mapping and didn't create a new one. + */ + if (new_khva =3D=3D old_khva) + new_khva =3D NULL; + + gpc_release_pfn_and_khva(kvm, new_pfn, new_khva); + + cond_resched(); + } + /* We always request a writeable mapping */ - new_pfn =3D hva_to_pfn(uhva, false, NULL, true, NULL); + new_pfn =3D hva_to_pfn(gpc->uhva, false, NULL, true, NULL); if (is_error_noslot_pfn(new_pfn)) - break; + goto out_error; + + /* + * Obtain a new kernel mapping if KVM itself will access the + * pfn. Note, kmap() and memremap() can both sleep, so this + * too must be done outside of gpc->lock! + */ + if (gpc->usage & KVM_HOST_USES_PFN) { + if (new_pfn =3D=3D gpc->pfn) { + new_khva =3D old_khva; + } else if (pfn_valid(new_pfn)) { + new_khva =3D kmap(pfn_to_page(new_pfn)); +#ifdef CONFIG_HAS_IOMEM + } else { + new_khva =3D memremap(pfn_to_hpa(new_pfn), PAGE_SIZE, MEMREMAP_WB); +#endif + } + if (!new_khva) { + kvm_release_pfn_clean(new_pfn); + goto out_error; + } + } + + write_lock_irq(&gpc->lock); =20 - KVM_MMU_READ_LOCK(kvm); - retry =3D mmu_notifier_retry_hva(kvm, mmu_seq, uhva); - KVM_MMU_READ_UNLOCK(kvm); - if (!retry) + /* + * Other tasks must wait for _this_ refresh to complete before + * attempting to refresh. + */ + WARN_ON_ONCE(gpc->valid); + + /* + * Wait for mn_active_invalidate_count, not mmu_notifier_count, + * to go away, as the invalidation in the mmu_notifier event + * occurs _before_ mmu_notifier_count is elevated. + * + * Note, mn_active_invalidate_count can change at any time as + * it's not protected by gpc->lock. But, it is guaranteed to + * be elevated before the mmu_notifier acquires gpc->lock, and + * isn't dropped until after mmu_notifier_seq is updated. So, + * this task may get a false positive of sorts, i.e. see an + * elevated count and wait even though it's technically safe to + * proceed (becase the mmu_notifier will invalidate the cache + * _after_ it's refreshed here), but the cache will never be + * refreshed with stale data, i.e. won't get false negatives. + */ + if (kvm->mn_active_invalidate_count) + continue; + + /* + * Ensure mn_active_invalidate_count is read before + * mmu_notifier_seq. This pairs with the smp_wmb() in + * mmu_notifier_invalidate_range_end() to guarantee either the + * old (non-zero) value of mn_active_invalidate_count or the + * new (incremented) value of mmu_notifier_seq is observed. + */ + smp_rmb(); + if (kvm->mmu_notifier_seq =3D=3D mmu_seq) break; + } + + gpc->valid =3D true; + gpc->pfn =3D new_pfn; + gpc->khva =3D new_khva + (gpc->gpa & ~PAGE_MASK); + return 0; =20 - cond_resched(); - } while (1); +out_error: + write_lock_irq(&gpc->lock); =20 - return new_pfn; + return -EFAULT; } =20 int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache = *gpc, @@ -147,7 +238,6 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struc= t gfn_to_pfn_cache *gpc, kvm_pfn_t old_pfn, new_pfn; unsigned long old_uhva; void *old_khva; - bool old_valid; int ret =3D 0; =20 /* @@ -159,10 +249,23 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, str= uct gfn_to_pfn_cache *gpc, =20 write_lock_irq(&gpc->lock); =20 + /* + * If another task is refreshing the cache, wait for it to complete. + * There is no guarantee that concurrent refreshes will see the same + * gpa, memslots generation, etc..., so they must be fully serialized. + */ + while (gpc->refresh_in_progress) { + write_unlock_irq(&gpc->lock); + + cond_resched(); + + write_lock_irq(&gpc->lock); + } + gpc->refresh_in_progress =3D true; + old_pfn =3D gpc->pfn; old_khva =3D gpc->khva - offset_in_page(gpc->khva); old_uhva =3D gpc->uhva; - old_valid =3D gpc->valid; =20 /* If the userspace HVA is invalid, refresh that first */ if (gpc->gpa !=3D gpa || gpc->generation !=3D slots->generation || @@ -175,7 +278,6 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struc= t gfn_to_pfn_cache *gpc, gpc->uhva =3D gfn_to_hva_memslot(gpc->memslot, gfn); =20 if (kvm_is_error_hva(gpc->uhva)) { - gpc->pfn =3D KVM_PFN_ERR_FAULT; ret =3D -EFAULT; goto out; } @@ -185,60 +287,8 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, stru= ct gfn_to_pfn_cache *gpc, * If the userspace HVA changed or the PFN was already invalid, * drop the lock and do the HVA to PFN lookup again. */ - if (!old_valid || old_uhva !=3D gpc->uhva) { - unsigned long uhva =3D gpc->uhva; - void *new_khva =3D NULL; - - /* Placeholders for "hva is valid but not yet mapped" */ - gpc->pfn =3D KVM_PFN_ERR_FAULT; - gpc->khva =3D NULL; - gpc->valid =3D true; - - write_unlock_irq(&gpc->lock); - - new_pfn =3D hva_to_pfn_retry(kvm, uhva); - if (is_error_noslot_pfn(new_pfn)) { - ret =3D -EFAULT; - goto map_done; - } - - if (gpc->usage & KVM_HOST_USES_PFN) { - if (new_pfn =3D=3D old_pfn) { - /* - * Reuse the existing pfn and khva, but put the - * reference acquired hva_to_pfn_retry(); the - * cache still holds a reference to the pfn - * from the previous refresh. - */ - gpc_release_pfn_and_khva(kvm, new_pfn, NULL); - - new_khva =3D old_khva; - old_pfn =3D KVM_PFN_ERR_FAULT; - old_khva =3D NULL; - } else if (pfn_valid(new_pfn)) { - new_khva =3D kmap(pfn_to_page(new_pfn)); -#ifdef CONFIG_HAS_IOMEM - } else { - new_khva =3D memremap(pfn_to_hpa(new_pfn), PAGE_SIZE, MEMREMAP_WB); -#endif - } - if (new_khva) - new_khva +=3D page_offset; - else - ret =3D -EFAULT; - } - - map_done: - write_lock_irq(&gpc->lock); - if (ret) { - gpc->valid =3D false; - gpc->pfn =3D KVM_PFN_ERR_FAULT; - gpc->khva =3D NULL; - } else { - /* At this point, gpc->valid may already have been cleared */ - gpc->pfn =3D new_pfn; - gpc->khva =3D new_khva; - } + if (!gpc->valid || old_uhva !=3D gpc->uhva) { + ret =3D hva_to_pfn_retry(kvm, gpc); } else { /* If the HVA=E2=86=92PFN mapping was already valid, don't unmap it. */ old_pfn =3D KVM_PFN_ERR_FAULT; @@ -246,9 +296,26 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, stru= ct gfn_to_pfn_cache *gpc, } =20 out: + /* + * Invalidate the cache and purge the pfn/khva if the refresh failed. + * Some/all of the uhva, gpa, and memslot generation info may still be + * valid, leave it as is. + */ + if (ret) { + gpc->valid =3D false; + gpc->pfn =3D KVM_PFN_ERR_FAULT; + gpc->khva =3D NULL; + } + + gpc->refresh_in_progress =3D false; + + /* Snapshot the new pfn before dropping the lock! */ + new_pfn =3D gpc->pfn; + write_unlock_irq(&gpc->lock); =20 - gpc_release_pfn_and_khva(kvm, old_pfn, old_khva); + if (old_pfn !=3D new_pfn) + gpc_release_pfn_and_khva(kvm, old_pfn, old_khva); =20 return ret; } --=20 2.36.0.rc2.479.g8af0fa9b8e-goog From nobody Sun May 10 19:14:14 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C2F8C433F5 for ; Wed, 27 Apr 2022 01:41:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356928AbiD0BoT (ORCPT ); Tue, 26 Apr 2022 21:44:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356862AbiD0Bnd (ORCPT ); Tue, 26 Apr 2022 21:43:33 -0400 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2ACA1903C for ; Tue, 26 Apr 2022 18:40:22 -0700 (PDT) Received: by mail-pg1-x54a.google.com with SMTP id u5-20020a63f645000000b003aa5613d99cso175247pgj.5 for ; Tue, 26 Apr 2022 18:40:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=8PeOWxVB/PV7PL/FCA2ORAOWJEpKNecTkJJHdcIaayE=; b=RrD+i50ol54Uu6raUIuaeZDx2/6IdM834axlPG+4v5t83Bf9s8q3RaAPFfrz1UDbux f/PCyC6/ueKE9KKAUHYT+3LIrReYWTVceR/SULzRLFt4My46cLHXHeJZeR5XF5GYJ1cH f9oq80168EdgIAO6wyQ9mkgLs656o3yPhX8AkI8exP2MCNUFPICdISLS99kzW+5x1Jfu 4HHLekp8Enb15xRwxsc0I875BdqXGInclxYZBRyLIfmlKh6w3kosxb5ByXEWTGz9vnju zohLS8zFYF8QVQOd6o/XgkptJgI6RDkugHI5+57hIj3NIQFz9JXt5HVcb2kYEYrD50J/ SaMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=8PeOWxVB/PV7PL/FCA2ORAOWJEpKNecTkJJHdcIaayE=; b=K2snwmmRhBJ3kxpo3ukwJgeWA0WD2Z78r37cxnjY+nW/1a92Ecv5mGyOGovXPyOVmT QnobKsww861HnUg1c5lgOL3iw6ZtV2Osz/n0uFQPZ0eRjtnVGXfgj7Kk9jcA0JZLXAaf WP7eILkMP312EgFwipCr5Ex3K51v6g46yXxIYHttpubJu/ez92Lmqzul2OOayNuZmRi5 0XafmAbIvkEn9oknJ1ewxisCYAyLO8gdDjLZ94iAKPr6M9o4HIg9JDihS7tyRTjHoHJU UWQE8eRHBRBipC5rSaezg6D6h4N10mqfwngGlLslxFop/xSXWehjZeST6BS6al37wm05 WIFQ== X-Gm-Message-State: AOAM531udCUOT5+cWe/FFNqRNENg6CyKCKagpRCRW7+tJdteDfv8t7tK UT+YR+yqvVm0Gr7i/Vb7u61J9wCGbQ4= X-Google-Smtp-Source: ABdhPJzwMIlXwOTbAjJI80e2LYtYIqs3I3Oa8m6URYknPp/DdPTBkA86ZY2PcpnMUvkflq+1sIRo5oMw3vs= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a17:903:11c7:b0:151:7290:ccc with SMTP id q7-20020a17090311c700b0015172900cccmr26601873plh.95.1651023620966; Tue, 26 Apr 2022 18:40:20 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 27 Apr 2022 01:40:03 +0000 In-Reply-To: <20220427014004.1992589-1-seanjc@google.com> Message-Id: <20220427014004.1992589-8-seanjc@google.com> Mime-Version: 1.0 References: <20220427014004.1992589-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v2 7/8] KVM: Do not pin pages tracked by gfn=>pfn caches From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Woodhouse , Mingwei Zhang , Maxim Levitsky Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Put the reference to any struct page mapped/tracked by a gfn=3D>pfn cache upon inserting the pfn into its associated cache, as opposed to putting the reference only when the cache is done using the pfn. In other words, don't pin pages while they're in the cache. One of the major roles of the gfn=3D>pfn cache is to play nicely with invalidation events, i.e. it exists in large part so that KVM doesn't rely on pinning pages. Signed-off-by: Sean Christopherson --- virt/kvm/pfncache.c | 36 ++++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 16 deletions(-) diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index b1665d0e6c32..3cb439b505b4 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -95,20 +95,16 @@ bool kvm_gfn_to_pfn_cache_check(struct kvm *kvm, struct= gfn_to_pfn_cache *gpc, } EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_check); =20 -static void gpc_release_pfn_and_khva(struct kvm *kvm, kvm_pfn_t pfn, void = *khva) +static void gpc_unmap_khva(struct kvm *kvm, kvm_pfn_t pfn, void *khva) { - /* Unmap the old page if it was mapped before, and release it */ - if (!is_error_noslot_pfn(pfn)) { - if (khva) { - if (pfn_valid(pfn)) - kunmap(pfn_to_page(pfn)); + /* Unmap the old pfn/page if it was mapped before. */ + if (!is_error_noslot_pfn(pfn) && khva) { + if (pfn_valid(pfn)) + kunmap(pfn_to_page(pfn)); #ifdef CONFIG_HAS_IOMEM - else - memunmap(khva); + else + memunmap(khva); #endif - } - - kvm_release_pfn(pfn, false); } } =20 @@ -147,10 +143,10 @@ static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, st= ruct gfn_to_pfn_cache *gpc) * Keep the mapping if the previous iteration reused * the existing mapping and didn't create a new one. */ - if (new_khva =3D=3D old_khva) - new_khva =3D NULL; + if (new_khva !=3D old_khva) + gpc_unmap_khva(kvm, new_pfn, new_khva); =20 - gpc_release_pfn_and_khva(kvm, new_pfn, new_khva); + kvm_release_pfn_clean(new_pfn); =20 cond_resched(); } @@ -222,6 +218,14 @@ static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, str= uct gfn_to_pfn_cache *gpc) gpc->valid =3D true; gpc->pfn =3D new_pfn; gpc->khva =3D new_khva + (gpc->gpa & ~PAGE_MASK); + + /* + * Put the reference to the _new_ pfn. The pfn is now tracked by the + * cache and can be safely migrated, swapped, etc... as the cache will + * invalidate any mappings in response to relevant mmu_notifier events. + */ + kvm_release_pfn_clean(new_pfn); + return 0; =20 out_error: @@ -315,7 +319,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struc= t gfn_to_pfn_cache *gpc, write_unlock_irq(&gpc->lock); =20 if (old_pfn !=3D new_pfn) - gpc_release_pfn_and_khva(kvm, old_pfn, old_khva); + gpc_unmap_khva(kvm, old_pfn, old_khva); =20 return ret; } @@ -342,7 +346,7 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct= gfn_to_pfn_cache *gpc) =20 write_unlock_irq(&gpc->lock); =20 - gpc_release_pfn_and_khva(kvm, old_pfn, old_khva); + gpc_unmap_khva(kvm, old_pfn, old_khva); } EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_unmap); =20 --=20 2.36.0.rc2.479.g8af0fa9b8e-goog From nobody Sun May 10 19:14:14 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20BD1C433EF for ; Wed, 27 Apr 2022 01:41:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356884AbiD0BoX (ORCPT ); Tue, 26 Apr 2022 21:44:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56988 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356876AbiD0Bnd (ORCPT ); Tue, 26 Apr 2022 21:43:33 -0400 Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com [IPv6:2607:f8b0:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 370D0140CC for ; Tue, 26 Apr 2022 18:40:24 -0700 (PDT) Received: by mail-pg1-x549.google.com with SMTP id b198-20020a6334cf000000b003ab23ccd0cbso170374pga.14 for ; Tue, 26 Apr 2022 18:40:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:in-reply-to:message-id:mime-version:references :subject:from:to:cc; bh=tdAby1TGlV1h8nD87XtrVih1Pygj0UJbvJREenj9JnE=; b=ivkOoyYRCDwEv7I3kjEmBzKP6ALUjVb9oC8s04GcxHKSOoF9jV57s+M/xvmCOuZ5Yg k40c1NhbKEZ2seHy59bKbLcsUw6Rdz3MuX5i9gug7+X87B3tVpe2wP9dTzva/K1IxLtt I26MHaiA2rwTf1ZSIne2Yttolv+dAgzE5eL24RrjNyqLKt0DV37qo+El5ZftIedOh09l xM7aae0kzHaAg+xUvOcMLjKaobcUyClomh6hC8XbMtv08PwP+klerPRKCOf/GmWUUgAO Hy07rOqiLxmHuuVGMJF3nYBt0Ha9v/OQ5ckdJ+qEQsW8et6TScrzww4o6kVtW0iR+kIi ReaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:in-reply-to:message-id :mime-version:references:subject:from:to:cc; bh=tdAby1TGlV1h8nD87XtrVih1Pygj0UJbvJREenj9JnE=; b=Pd+Hcofw+5hYN8kN2OYD1IJqcd/IuU+VUrbyxdm//qlDEOopDQ5KBk9XZn+HW9zwaw MBUgtfm7kbZstDZ2Vzt5JxaMB9LfHrmpbGgTTfqOlg9NyQcYaOikUPif2Ns5jR1T6Z/3 9hixaSVO6aKI1YmgB6FQnibEKZL8TEDay7XLPNC/MD+THVb8859TLY+/isqfDbt9CY/a eAIvST/3cNf2xuWosYcd8BWP69FGip8FY6/M15dKsQXEDytdq3l+YxVlFAzhmd/fWWne mYMB9YhLQEYSnX+Gytimg6IeTyit/s3t4cs8tHgJrwxhNfxQ/wvjNCpGkx/NhoYzn5by h+9A== X-Gm-Message-State: AOAM533icSPLG36/oEOnsyOt+NqRvnKOmZ/xRC+JT03V5ui3y3Ir1OCl 92u9XqG6RRBeLTiFxl3ImnNVaAzjdRc= X-Google-Smtp-Source: ABdhPJzTNiP/xXqBgporbK59dMQrSzIVhqlhXvoYgY2x5e1mB/xWz+rutzl+rGM9cviXD5nxxb0Mo5Ef3D4= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a05:6a00:1702:b0:50a:8181:fed7 with SMTP id h2-20020a056a00170200b0050a8181fed7mr27400903pfc.56.1651023623746; Tue, 26 Apr 2022 18:40:23 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 27 Apr 2022 01:40:04 +0000 In-Reply-To: <20220427014004.1992589-1-seanjc@google.com> Message-Id: <20220427014004.1992589-9-seanjc@google.com> Mime-Version: 1.0 References: <20220427014004.1992589-1-seanjc@google.com> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog Subject: [PATCH v2 8/8] DO NOT MERGE: Hack-a-test to verify gpc invalidation+refresh From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Woodhouse , Mingwei Zhang , Maxim Levitsky Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a VM-wide gfn=3D>pfn cache and a fake MSR to let userspace control the cache. On writes, reflect the value of the MSR into the backing page of a gfn=3D>pfn cache so that userspace can detect if a value was written to the wrong page, i.e. to a stale mapping. Spin up 16 vCPUs (arbitrary) to use/refresh the cache, and another thread to trigger mmu_notifier events and memslot updates. Not-signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 30 ++++ include/linux/kvm_host.h | 2 + tools/testing/selftests/kvm/.gitignore | 1 + tools/testing/selftests/kvm/Makefile | 2 + tools/testing/selftests/kvm/gpc_test.c | 217 +++++++++++++++++++++++++ virt/kvm/pfncache.c | 2 + 6 files changed, 254 insertions(+) create mode 100644 tools/testing/selftests/kvm/gpc_test.c diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 951d0a78ccda..7afdb7f39821 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3473,6 +3473,20 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct= msr_data *msr_info) return kvm_xen_write_hypercall_page(vcpu, data); =20 switch (msr) { + case 0xdeadbeefu: { + struct gfn_to_pfn_cache *gpc =3D &vcpu->kvm->test_cache; + unsigned long flags; + + if (kvm_gfn_to_pfn_cache_refresh(vcpu->kvm, gpc, data, 8)) + break; + + read_lock_irqsave(&gpc->lock, flags); + if (kvm_gfn_to_pfn_cache_check(vcpu->kvm, gpc, data, 8)) + *(u64 *)(gpc->khva) =3D data; + read_unlock_irqrestore(&gpc->lock, flags); + break; + } + case MSR_AMD64_NB_CFG: case MSR_IA32_UCODE_WRITE: case MSR_VM_HSAVE_PA: @@ -3825,6 +3839,19 @@ static int get_msr_mce(struct kvm_vcpu *vcpu, u32 ms= r, u64 *pdata, bool host) int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { switch (msr_info->index) { + case 0xdeadbeefu: { + struct gfn_to_pfn_cache *gpc =3D &vcpu->kvm->test_cache; + unsigned long flags; + + read_lock_irqsave(&gpc->lock, flags); + if (kvm_gfn_to_pfn_cache_check(vcpu->kvm, gpc, gpc->gpa, 8)) + msr_info->data =3D gpc->gpa; + else + msr_info->data =3D 0xdeadbeefu; + read_unlock_irqrestore(&gpc->lock, flags); + return 0; + } + case MSR_IA32_PLATFORM_ID: case MSR_IA32_EBL_CR_POWERON: case MSR_IA32_LASTBRANCHFROMIP: @@ -11794,6 +11821,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long= type) kvm_hv_init_vm(kvm); kvm_xen_init_vm(kvm); =20 + kvm_gfn_to_pfn_cache_init(kvm, &kvm->test_cache, NULL, + KVM_HOST_USES_PFN, 0, 0); + return static_call(kvm_x86_vm_init)(kvm); =20 out_page_track: diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 252ee4a61b58..88ed76ad8bc7 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -718,6 +718,8 @@ struct kvm { spinlock_t gpc_lock; struct list_head gpc_list; =20 + struct gfn_to_pfn_cache test_cache; + /* * created_vcpus is protected by kvm->lock, and is incremented * at the beginning of KVM_CREATE_VCPU. online_vcpus is only diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftes= ts/kvm/.gitignore index 56140068b763..0310a57a1a4f 100644 --- a/tools/testing/selftests/kvm/.gitignore +++ b/tools/testing/selftests/kvm/.gitignore @@ -70,3 +70,4 @@ /steal_time /kvm_binary_stats_test /system_counter_offset_test +/gpc_test diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests= /kvm/Makefile index af582d168621..0adc9ac954d1 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -104,6 +104,8 @@ TEST_GEN_PROGS_x86_64 +=3D steal_time TEST_GEN_PROGS_x86_64 +=3D kvm_binary_stats_test TEST_GEN_PROGS_x86_64 +=3D system_counter_offset_test =20 +TEST_GEN_PROGS_x86_64 +=3D gpc_test + TEST_GEN_PROGS_aarch64 +=3D aarch64/arch_timer TEST_GEN_PROGS_aarch64 +=3D aarch64/debug-exceptions TEST_GEN_PROGS_aarch64 +=3D aarch64/get-reg-list diff --git a/tools/testing/selftests/kvm/gpc_test.c b/tools/testing/selftes= ts/kvm/gpc_test.c new file mode 100644 index 000000000000..5c509e7bb4da --- /dev/null +++ b/tools/testing/selftests/kvm/gpc_test.c @@ -0,0 +1,217 @@ +// SPDX-License-Identifier: GPL-2.0-only +#define _GNU_SOURCE /* for program_invocation_short_name */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "kvm_util.h" +#include "processor.h" +#include "test_util.h" + +#define NR_VCPUS 16 + +#define NR_ITERATIONS 1000 + +#define PAGE_SIZE 4096 + +#ifndef MAP_FIXED_NOREPLACE +#define MAP_FIXED_NOREPLACE 0x100000 +#endif + +static const uint64_t gpa_base =3D (4ull * (1 << 30)); + +static struct kvm_vm *vm; + +static pthread_t memory_thread; +static pthread_t vcpu_threads[NR_VCPUS]; + +static bool fight; + +static uint64_t per_vcpu_gpa_aligned(int vcpu_id) +{ + return gpa_base + (vcpu_id * PAGE_SIZE); +} + +static uint64_t per_vcpu_gpa(int vcpu_id) +{ + return per_vcpu_gpa_aligned(vcpu_id) + vcpu_id; +} + +static void guest_code(int vcpu_id) +{ + uint64_t this_vcpu_gpa; + int i; + + this_vcpu_gpa =3D per_vcpu_gpa(vcpu_id); + + for (i =3D 0; i < NR_ITERATIONS; i++) + wrmsr(0xdeadbeefu, this_vcpu_gpa); + GUEST_SYNC(0); +} + +static void *memory_worker(void *ign) +{ + int i, x, r, k; + uint64_t *hva; + uint64_t gpa; + void *mem; + + while (!READ_ONCE(fight)) + cpu_relax(); + + for (k =3D 0; k < 50; k++) { + i =3D (unsigned int)random() % NR_VCPUS; + + gpa =3D per_vcpu_gpa_aligned(i); + hva =3D (void *)gpa; + + x =3D (unsigned int)random() % 5; + switch (x) { + case 0: + r =3D munmap(hva, PAGE_SIZE); + TEST_ASSERT(!r, "Failed to mumap (hva =3D %lx), errno =3D %d (%s)", + (unsigned long)hva, errno, strerror(errno)); + + mem =3D mmap(hva, PAGE_SIZE, PROT_READ | PROT_WRITE, + MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + TEST_ASSERT(mem !=3D MAP_FAILED || mem !=3D hva, + "Failed to mmap (hva =3D %lx), errno =3D %d (%s)", + (unsigned long)hva, errno, strerror(errno)); + break; + case 1: + vm_set_user_memory_region(vm, i + 1, KVM_MEM_LOG_DIRTY_PAGES, + gpa, PAGE_SIZE, hva); + vm_set_user_memory_region(vm, i + 1, 0, gpa, PAGE_SIZE, hva); + break; + case 2: + r =3D mprotect(hva, PAGE_SIZE, PROT_NONE); + TEST_ASSERT(!r, "Failed to mprotect (hva =3D %lx), errno =3D %d (%s)", + (unsigned long)hva, errno, strerror(errno)); + + r =3D mprotect(hva, PAGE_SIZE, PROT_READ | PROT_WRITE); + TEST_ASSERT(!r, "Failed to mprotect (hva =3D %lx), errno =3D %d (%s)", + (unsigned long)hva, errno, strerror(errno)); + break; + case 3: + r =3D mprotect(hva, PAGE_SIZE, PROT_READ); + TEST_ASSERT(!r, "Failed to mprotect (hva =3D %lx), errno =3D %d (%s)", + (unsigned long)hva, errno, strerror(errno)); + + r =3D mprotect(hva, PAGE_SIZE, PROT_READ | PROT_WRITE); + TEST_ASSERT(!r, "Failed to mprotect (hva =3D %lx), errno =3D %d (%s)", + (unsigned long)hva, errno, strerror(errno)); + break; + case 4: + vm_set_user_memory_region(vm, i + 1, 0, gpa, 0, 0); + vm_set_user_memory_region(vm, i + 1, 0, gpa, PAGE_SIZE, + (void *)per_vcpu_gpa_aligned(NR_VCPUS)); + vm_set_user_memory_region(vm, i + 1, 0, gpa, 0, 0); + vm_set_user_memory_region(vm, i + 1, 0, gpa, PAGE_SIZE, hva); + break; + } + } + return NULL; +} + +static void sync_guest(int vcpu_id) +{ + struct ucall uc; + + switch (get_ucall(vm, vcpu_id, &uc)) { + case UCALL_SYNC: + TEST_ASSERT(uc.args[1] =3D=3D 0, + "Unexpected sync ucall, got %lx", uc.args[1]); + break; + case UCALL_ABORT: + TEST_FAIL("%s at %s:%ld\n\tvalues: %#lx, %#lx", + (const char *)uc.args[0], + __FILE__, uc.args[1], uc.args[2], uc.args[3]); + break; + default: + TEST_FAIL("Unexpected userspace exit, reason =3D %s\n", + exit_reason_str(vcpu_state(vm, vcpu_id)->exit_reason)); + break; + } +} + +static void *vcpu_worker(void *data) +{ + int vcpu_id =3D (unsigned long)data; + + vcpu_args_set(vm, vcpu_id, 1, vcpu_id); + + while (!READ_ONCE(fight)) + cpu_relax(); + + usleep(10); + + vcpu_run(vm, vcpu_id); + + sync_guest(vcpu_id); + + return NULL; +} + +int main(int argc, char *argv[]) +{ + uint64_t *hva; + uint64_t gpa; + void *r; + int i; + + srandom(time(0)); + + vm =3D vm_create_default_with_vcpus(NR_VCPUS, 0, 0, guest_code, NULL); + ucall_init(vm, NULL); + + pthread_create(&memory_thread, NULL, memory_worker, 0); + + for (i =3D 0; i < NR_VCPUS; i++) { + pthread_create(&vcpu_threads[i], NULL, vcpu_worker, (void *)(unsigned lo= ng)i); + + gpa =3D per_vcpu_gpa_aligned(i); + hva =3D (void *)gpa; + r =3D mmap(hva, PAGE_SIZE, PROT_READ | PROT_WRITE, + MAP_FIXED_NOREPLACE | MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + TEST_ASSERT(r !=3D MAP_FAILED, "mmap() '%lx' failed, errno =3D %d (%s)", + gpa, errno, strerror(errno)); + + vm_set_user_memory_region(vm, i + 1, 0, gpa, PAGE_SIZE, hva); + } + + WRITE_ONCE(fight, true); + + for (i =3D 0; i < NR_VCPUS; i++) + pthread_join(vcpu_threads[i], NULL); + + pthread_join(memory_thread, NULL); + + for (i =3D 0; i < NR_VCPUS; i++) { + gpa =3D per_vcpu_gpa(i); + hva =3D (void *)gpa; + + TEST_ASSERT(*hva =3D=3D 0 || *hva =3D=3D gpa, + "Want '0' or '%lx', got '%lx'\n", gpa, *hva); + } + + gpa =3D vcpu_get_msr(vm, 0, 0xdeadbeefu); + hva =3D (void *)gpa; + if (gpa !=3D 0xdeadbeefu) + TEST_ASSERT(*hva =3D=3D gpa, "Want '%lx', got '%lx'\n", gpa, *hva); + + kvm_vm_free(vm); + + return 0; +} diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index 3cb439b505b4..7881e6e6d91a 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -372,6 +372,8 @@ int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct g= fn_to_pfn_cache *gpc, list_add(&gpc->list, &kvm->gpc_list); spin_unlock(&kvm->gpc_lock); } + if (!len) + return -EINVAL; return kvm_gfn_to_pfn_cache_refresh(kvm, gpc, gpa, len); } EXPORT_SYMBOL_GPL(kvm_gfn_to_pfn_cache_init); --=20 2.36.0.rc2.479.g8af0fa9b8e-goog