From nobody Mon Jun 15 12:34:57 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFFE5C433EF for ; Fri, 29 Apr 2022 03:18:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235830AbiD2DVf (ORCPT ); Thu, 28 Apr 2022 23:21:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33082 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235899AbiD2DVY (ORCPT ); Thu, 28 Apr 2022 23:21:24 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71FD258E70 for ; Thu, 28 Apr 2022 20:18:05 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id b6-20020a253406000000b006484c081280so6352512yba.5 for ; Thu, 28 Apr 2022 20:18:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:message-id:mime-version:subject:from:to:cc; bh=UFcuh802XbKYlv+ES2ZLQ9M3Af182PmyKcuzk4JSsQ4=; b=Lsa21VYZWQ8h+FTnvqoeto/N6eqbwdtbZpI4PNgA6ZLRrhfPeumYxVFCazoJREIZIS TU/w/n80w7BsgEP1SVEOniBrt6yivQgiyIyT+gKMG17HNHHSm/ukk7Y3LyWvA6vEtRHw M4txzuT/RXDKAnA/t8YtNv8rWWf3PZkVHBzFMT/dGNv3o72KUZLmjGE+/Doyq2l6ccWd Uu2P4J/saKGPPhUCmTT2dsnac7JEjZpdPZ/nxUKAVqMgHZ2Z4zuf9ka6yaBtSlrhi9QG lqFo9rlhl6gczzXp92KUuXB0dhYcMUxXAGAzrfBXfglSwI04L5QkL6x/mS3gk8YGfIKP SS6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:message-id:mime-version:subject :from:to:cc; bh=UFcuh802XbKYlv+ES2ZLQ9M3Af182PmyKcuzk4JSsQ4=; b=nWKTSsoOugIWFCAV7FgrlsOdwFrnpYR8dbtzeZUS8fkl4H/ESHU9aymzZdKhWutecN AZw1DtTk/cduKLI8e0r94/UfjvuV4b8jYP9onvJnYPOmPODQbBn2eoXXsKvFESJhYj1z 8Sqs0SZxrbK3O99bvJbuaxd6PVbS88cfzSAr0MAVoNqXLHH+RAyW6yKhy8JCLtctQ9Yf fgJtOmr0O02PM8n49iXevU0wlWf8Ccja7803g2Vs/FhzFciQ6RZmDCZ00oWySf8SDQJ+ /uQc7WpFrFBWFg3ulRIcNXmFTc6ZygVoIrAeTOJSMSnCxNEmOd0/AmMtwTCpCeHZcT7Z 9QrA== X-Gm-Message-State: AOAM531l9t2aNM3YCIJFID++OWiaNzzinhH5mZxDsaIUHKfbJbjJnVTl bih/3DurA9E2pbqNySMo3tgt2ybOPOUh X-Google-Smtp-Source: ABdhPJxh1eXns0UK7rI+EI9kMaEm80jJyEvyHtEqYupPScamCw8auMmXnWvpzF16yQgL2yyqEhibLnofUuGU X-Received: from mizhang-super2.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:61c]) (user=mizhang job=sendgmr) by 2002:a25:bf88:0:b0:633:93e9:b2fc with SMTP id l8-20020a25bf88000000b0063393e9b2fcmr32758806ybk.202.1651202284688; Thu, 28 Apr 2022 20:18:04 -0700 (PDT) Reply-To: Mingwei Zhang Date: Fri, 29 Apr 2022 03:17:57 +0000 Message-Id: <20220429031757.2042406-1-mizhang@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH] KVM: x86/mmu: fix potential races when walking host page table From: Mingwei Zhang To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Mingwei Zhang Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement a KVM function for walking host page table in x86 architecture and stop using lookup_address_in_mm(). lookup_address_in_mm() is basically lookup_address_in_pgd() in mm. This function suffer from several issues: - no usage of READ_ONCE(*). This allows multiple dereference of the same page table entry. The TOCTOU problem because of that may cause kernel incorrectly thinks a newly generated leaf entry as a nonleaf one and dereference the content by using its pfn value. - Incorrect information returned. lookup_address_in_mm() returns pte_t pointer and level regardless of the 'presentness' of the entry, ie., even if an PXE entry is 'non-present', as long as it is not 'none', the function still returns its level. In comparison, KVM needs the level information of only 'present' entries. This is a clear bug and may cause KVM incorrectly regard a non-present PXE entry as a present large page mapping. lookup_address_in_mm() and its relevant functions are generally helpful only for walking kernel addresses that have mostly static mappings and no page table tear down would happen. Patching this function does not help other callers, since its return value: a PTE pointer, is NEVER safe to deference after the function returns. Cc: Sean Christopherson Cc: Paolo Bonzini Signed-off-by: Mingwei Zhang --- arch/x86/kvm/mmu/mmu.c | 8 +---- arch/x86/kvm/x86.c | 70 ++++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.h | 2 ++ 3 files changed, 73 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 904f0faff2186..6db195e5eae94 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2822,8 +2822,6 @@ static int host_pfn_mapping_level(struct kvm *kvm, gf= n_t gfn, kvm_pfn_t pfn, const struct kvm_memory_slot *slot) { unsigned long hva; - pte_t *pte; - int level; =20 if (!PageCompound(pfn_to_page(pfn)) && !kvm_is_zone_device_pfn(pfn)) return PG_LEVEL_4K; @@ -2838,11 +2836,7 @@ static int host_pfn_mapping_level(struct kvm *kvm, g= fn_t gfn, kvm_pfn_t pfn, */ hva =3D __gfn_to_hva_memslot(slot, gfn); =20 - pte =3D lookup_address_in_mm(kvm->mm, hva, &level); - if (unlikely(!pte)) - return PG_LEVEL_4K; - - return level; + return kvm_lookup_address_level_in_mm(kvm, hva); } =20 int kvm_mmu_max_mapping_level(struct kvm *kvm, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 951d0a78ccdae..61406efe4ea7f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13044,6 +13044,76 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, un= signed int size, } EXPORT_SYMBOL_GPL(kvm_sev_es_string_io); =20 +/* + * Lookup the valid mapping level for a virtual address in the current mm. + * Return the level of the mapping if there is present one. Otherwise, alw= ays + * return PG_LEVEL_NONE. + * + * Note: the information retrieved may be stale. Use it with causion. + */ +int kvm_lookup_address_level_in_mm(struct kvm *kvm, unsigned long address) +{ + pgd_t *pgdp, pgd; + p4d_t *p4dp, p4d; + pud_t *pudp, pud; + pmd_t *pmdp, pmd; + pte_t *ptep, pte; + unsigned long flags; + int level =3D PG_LEVEL_NONE; + + /* Disable IRQs to prevent any tear down of page tables. */ + local_irq_save(flags); + + pgdp =3D pgd_offset(kvm->mm, address); + pgd =3D READ_ONCE(*pgdp); + if (pgd_none(pgd)) + goto out; + + p4dp =3D p4d_offset(pgdp, address); + p4d =3D READ_ONCE(*p4dp); + if (p4d_none(p4d) || !p4d_present(p4d)) + goto out; + + if (p4d_large(p4d)) { + level =3D PG_LEVEL_512G; + goto out; + } + + pudp =3D pud_offset(p4dp, address); + pud =3D READ_ONCE(*pudp); + if (pud_none(pud) || !pud_present(pud)) + goto out; + + if (pud_large(pud)) { + level =3D PG_LEVEL_1G; + goto out; + } + + pmdp =3D pmd_offset(pudp, address); + pmd =3D READ_ONCE(*pmdp); + if (pmd_none(pmd) || !pmd_present(pmd)) + goto out; + + if (pmd_large(pmd)) { + level =3D PG_LEVEL_2M; + goto out; + } + + ptep =3D pte_offset_map(&pmd, address); + pte =3D ptep_get(ptep); + if (pte_present(pte)) { + pte_unmap(ptep); + level =3D PG_LEVEL_4K; + goto out; + } + pte_unmap(ptep); + +out: + local_irq_restore(flags); + return level; +} +EXPORT_SYMBOL_GPL(kvm_lookup_address_level_in_mm); + EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 588792f003345..f1cdcc8483bd0 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -454,4 +454,6 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigne= d int size, unsigned int port, void *data, unsigned int count, int in); =20 +int kvm_lookup_address_level_in_mm(struct kvm *kvm, unsigned long address); + #endif base-commit: 2a39d8b39bffdaf1a4223d0d22f07baee154c8f3 --=20 2.36.0.464.gb9c8b46e94-goog