From nobody Fri Dec 19 04:52:57 2025 Received: from mail-ej1-f74.google.com (mail-ej1-f74.google.com [209.85.218.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D7EBA1FC7E7 for ; Wed, 18 Dec 2024 19:41:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734550874; cv=none; b=MiZFK54SvP3Sq2Fhy6DSp8ATAaSkwjT5oHrgEPrjsdkyFl2w0x13tg+yRrgaKjVvTsE5MpRJnIqM/5JDTUYUDh8bZ8wjOe4RRNSYDRQHItZti+bsPEyXkiaGWwgWXeGP7b8YbxlNRDvs5EkdkfvCB211HbuExn+ZPTzNICnzOYI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734550874; c=relaxed/simple; bh=2Y2CLznP0aPd7D19GAZhzmNGO1xygRZFBYim/hd3bj8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=aS8nPsmYkWGgXWkmFrcipBMWsG7UjTykZcXoe8lJxUFFGcWYjVzVr0CdcDUyFrlGUAgdUfzyA3Uk+xGM6lYuB3w1ZvM57Ja8HR48+atzc0sMd+N6uVEoPUrz5o91vKmm6Wer7I0VEZMIbWEhlUJS1oRR7mKFVeyrhHKI91sktrU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--qperret.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=3xbagjli; arc=none smtp.client-ip=209.85.218.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--qperret.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="3xbagjli" Received: by mail-ej1-f74.google.com with SMTP id a640c23a62f3a-aa63b02c69cso112638266b.0 for ; Wed, 18 Dec 2024 11:41:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1734550871; x=1735155671; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=UAQ/DW5uS6jQ+6W73NZ/tng8Q0fjY1ucHASdT1u/gX0=; b=3xbagjlih9t7vjczy8jL/Lo38+RbBkbHl+VKRwWm6BQjjFoaJ2Gss8+2QKfx1WaL0J foYLun2k5kcmIzF7F8JWPyZDry9rLpUTkKMwJVMUcAekE4XNU7M6jDjbJLiqsePGuEFh AKTUk4qlrcGrwgenoBFLU3xa2FXU7yLFUyPA6cSHCTy0c8x9sDx7UHGeMsv0pdgc0nE0 8Ft5cZOO864GmN3DYWsMEnLgijHMHg66XxSEstwid2IITPsQ61oEZkQvKUBDTyOOjerb 3axJPhRbnbRwOO/WAWlMwNAIGZPsPVTcZ6uGljzAuy6lfosOwEgbgvQ+99be2+m46miG W44g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734550871; x=1735155671; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UAQ/DW5uS6jQ+6W73NZ/tng8Q0fjY1ucHASdT1u/gX0=; b=XSVQjnIjCVzjZ+OAbSfHoyTDoby7YdxaTHtcorhAdoYUP3FJwb77WH4PCWKl/N17Sw /YL0Syhzx/dH42iWwgB5hNCUoTTm3ZJehi9+8i3g9n33rKUqGH2INT8ulCor9XgUHO1r JA5XhlD6OEjHKuwxP7k9SaLBBkL72iUautV+mvVJQU3mSVWhrpC+g9ZwxN/56gK8ceev 6CcOSFqX5YxB9O2PbZxKw0WG3tDOiMfuFPLn9FxlSqphzfuX1ImHio1zU6w3XbPQgbdP Pg/3ee12ixcooMmkUajgoQk84m28cR2gf5h+su52oM6+BFpy8ODsk9auhByE9LYx8SFj vrYg== X-Forwarded-Encrypted: i=1; AJvYcCUd7QvdTNs2tQn2AzXS1UgaS8yOLcZZReK5aSojm7ys0wDZdrgXdrzQYYse3NCcBfb9dSIqkgKNkCFPiHU=@vger.kernel.org X-Gm-Message-State: AOJu0YxTcK/d21PijIws859GevQ3ltu6x0L6ImsxKOHHVGpuMLwwVNO7 Vp5Z+MdYBY7qKKPKiBJFg9vdcSQ0Xn9XXZ/p5cmmHkjz3DkzT7Ptgzjfb8Pf4wy+7KH4ywWseNt uq7GVmw== X-Google-Smtp-Source: AGHT+IHuOP3GzND5zb+d3QqSk9EUOwQxf9V5iWs+o92uyoO0OD/GyWpKPSyYwE23yI/fPyO3xcyVVYaZrepA X-Received: from edb1.prod.google.com ([2002:a05:6402:2381:b0:5d0:2a58:40c4]) (user=qperret job=prod-delivery.src-stubby-dispatcher) by 2002:a17:907:2d92:b0:aa6:bedc:2e4c with SMTP id a640c23a62f3a-aac08101008mr53941866b.3.1734550871385; Wed, 18 Dec 2024 11:41:11 -0800 (PST) Date: Wed, 18 Dec 2024 19:40:45 +0000 In-Reply-To: <20241218194059.3670226-1-qperret@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241218194059.3670226-1-qperret@google.com> X-Mailer: git-send-email 2.47.1.613.gc27f4b7a9f-goog Message-ID: <20241218194059.3670226-5-qperret@google.com> Subject: [PATCH v4 04/18] KVM: arm64: Move host page ownership tracking to the hyp vmemmap From: Quentin Perret To: Marc Zyngier , Oliver Upton , Joey Gouly , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon Cc: Fuad Tabba , Vincent Donnefort , Sebastian Ene , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" We currently store part of the page-tracking state in PTE software bits for the host, guests and the hypervisor. This is sub-optimal when e.g. sharing pages as this forces to break block mappings purely to support this software tracking. This causes an unnecessarily fragmented stage-2 page-table for the host in particular when it shares pages with Secure, which can lead to measurable regressions. Moreover, having this state stored in the page-table forces us to do multiple costly walks on the page transition path, hence causing overhead. In order to work around these problems, move the host-side page-tracking logic from SW bits in its stage-2 PTEs to the hypervisor's vmemmap. Tested-by: Fuad Tabba Reviewed-by: Fuad Tabba Signed-off-by: Quentin Perret --- arch/arm64/kvm/hyp/include/nvhe/memory.h | 14 +++- arch/arm64/kvm/hyp/nvhe/mem_protect.c | 100 ++++++++++++++++------- arch/arm64/kvm/hyp/nvhe/setup.c | 7 +- 3 files changed, 84 insertions(+), 37 deletions(-) diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/= include/nvhe/memory.h index 8f2b42bcc8e1..2a5eabf4b753 100644 --- a/arch/arm64/kvm/hyp/include/nvhe/memory.h +++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h @@ -8,7 +8,7 @@ #include =20 /* - * SW bits 0-1 are reserved to track the memory ownership state of each pa= ge: + * Bits 0-1 are reserved to track the memory ownership state of each page: * 00: The page is owned exclusively by the page-table owner. * 01: The page is owned by the page-table owner, but is shared * with another entity. @@ -43,7 +43,9 @@ static inline enum pkvm_page_state pkvm_getstate(enum kvm= _pgtable_prot prot) struct hyp_page { u16 refcount; u8 order; - u8 reserved; + + /* Host (non-meta) state. Guarded by the host stage-2 lock. */ + enum pkvm_page_state host_state : 8; }; =20 extern u64 __hyp_vmemmap; @@ -63,7 +65,13 @@ static inline phys_addr_t hyp_virt_to_phys(void *addr) =20 #define hyp_phys_to_pfn(phys) ((phys) >> PAGE_SHIFT) #define hyp_pfn_to_phys(pfn) ((phys_addr_t)((pfn) << PAGE_SHIFT)) -#define hyp_phys_to_page(phys) (&hyp_vmemmap[hyp_phys_to_pfn(phys)]) + +static inline struct hyp_page *hyp_phys_to_page(phys_addr_t phys) +{ + BUILD_BUG_ON(sizeof(struct hyp_page) !=3D sizeof(u32)); + return &hyp_vmemmap[hyp_phys_to_pfn(phys)]; +} + #define hyp_virt_to_page(virt) hyp_phys_to_page(__hyp_pa(virt)) #define hyp_virt_to_pfn(virt) hyp_phys_to_pfn(__hyp_pa(virt)) =20 diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvh= e/mem_protect.c index caba3e4bd09e..12bb5445fe47 100644 --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c @@ -201,8 +201,8 @@ static void *guest_s2_zalloc_page(void *mc) =20 memset(addr, 0, PAGE_SIZE); p =3D hyp_virt_to_page(addr); - memset(p, 0, sizeof(*p)); p->refcount =3D 1; + p->order =3D 0; =20 return addr; } @@ -268,6 +268,7 @@ int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, vo= id *pgd) =20 void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *= mc) { + struct hyp_page *page; void *addr; =20 /* Dump all pgtable pages in the hyp_pool */ @@ -279,7 +280,9 @@ void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct= kvm_hyp_memcache *mc) /* Drain the hyp_pool into the memcache */ addr =3D hyp_alloc_pages(&vm->pool, 0); while (addr) { - memset(hyp_virt_to_page(addr), 0, sizeof(struct hyp_page)); + page =3D hyp_virt_to_page(addr); + page->refcount =3D 0; + page->order =3D 0; push_hyp_memcache(mc, addr, hyp_virt_to_phys); WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1)); addr =3D hyp_alloc_pages(&vm->pool, 0); @@ -382,19 +385,28 @@ bool addr_is_memory(phys_addr_t phys) return !!find_mem_range(phys, &range); } =20 -static bool addr_is_allowed_memory(phys_addr_t phys) +static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range) +{ + return range->start <=3D addr && addr < range->end; +} + +static int check_range_allowed_memory(u64 start, u64 end) { struct memblock_region *reg; struct kvm_mem_range range; =20 - reg =3D find_mem_range(phys, &range); + /* + * Callers can't check the state of a range that overlaps memory and + * MMIO regions, so ensure [start, end[ is in the same kvm_mem_range. + */ + reg =3D find_mem_range(start, &range); + if (!is_in_mem_range(end - 1, &range)) + return -EINVAL; =20 - return reg && !(reg->flags & MEMBLOCK_NOMAP); -} + if (!reg || reg->flags & MEMBLOCK_NOMAP) + return -EPERM; =20 -static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range) -{ - return range->start <=3D addr && addr < range->end; + return 0; } =20 static bool range_is_memory(u64 start, u64 end) @@ -454,8 +466,10 @@ static int host_stage2_adjust_range(u64 addr, struct k= vm_mem_range *range) if (kvm_pte_valid(pte)) return -EAGAIN; =20 - if (pte) + if (pte) { + WARN_ON(addr_is_memory(addr) && hyp_phys_to_page(addr)->host_state !=3D = PKVM_NOPAGE); return -EPERM; + } =20 do { u64 granule =3D kvm_granule_size(level); @@ -477,10 +491,33 @@ int host_stage2_idmap_locked(phys_addr_t addr, u64 si= ze, return host_stage2_try(__host_stage2_idmap, addr, addr + size, prot); } =20 +static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm= _page_state state) +{ + phys_addr_t end =3D addr + size; + + for (; addr < end; addr +=3D PAGE_SIZE) + hyp_phys_to_page(addr)->host_state =3D state; +} + int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id) { - return host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt, - addr, size, &host_s2_pool, owner_id); + int ret; + + if (!addr_is_memory(addr)) + return -EPERM; + + ret =3D host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt, + addr, size, &host_s2_pool, owner_id); + if (ret) + return ret; + + /* Don't forget to update the vmemmap tracking for the host */ + if (owner_id =3D=3D PKVM_ID_HOST) + __host_update_page_state(addr, size, PKVM_PAGE_OWNED); + else + __host_update_page_state(addr, size, PKVM_NOPAGE); + + return 0; } =20 static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_p= rot prot) @@ -604,35 +641,38 @@ static int check_page_state_range(struct kvm_pgtable = *pgt, u64 addr, u64 size, return kvm_pgtable_walk(pgt, addr, size, &walker); } =20 -static enum pkvm_page_state host_get_page_state(kvm_pte_t pte, u64 addr) -{ - if (!addr_is_allowed_memory(addr)) - return PKVM_NOPAGE; - - if (!kvm_pte_valid(pte) && pte) - return PKVM_NOPAGE; - - return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte)); -} - static int __host_check_page_state_range(u64 addr, u64 size, enum pkvm_page_state state) { - struct check_walk_data d =3D { - .desired =3D state, - .get_page_state =3D host_get_page_state, - }; + u64 end =3D addr + size; + int ret; + + ret =3D check_range_allowed_memory(addr, end); + if (ret) + return ret; =20 hyp_assert_lock_held(&host_mmu.lock); - return check_page_state_range(&host_mmu.pgt, addr, size, &d); + for (; addr < end; addr +=3D PAGE_SIZE) { + if (hyp_phys_to_page(addr)->host_state !=3D state) + return -EPERM; + } + + return 0; } =20 static int __host_set_page_state_range(u64 addr, u64 size, enum pkvm_page_state state) { - enum kvm_pgtable_prot prot =3D pkvm_mkstate(PKVM_HOST_MEM_PROT, state); + if (hyp_phys_to_page(addr)->host_state =3D=3D PKVM_NOPAGE) { + int ret =3D host_stage2_idmap_locked(addr, size, PKVM_HOST_MEM_PROT); =20 - return host_stage2_idmap_locked(addr, size, prot); + if (ret) + return ret; + } + + __host_update_page_state(addr, size, state); + + return 0; } =20 static int host_request_owned_transition(u64 *completer_addr, diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setu= p.c index cbdd18cd3f98..7e04d1c2a03d 100644 --- a/arch/arm64/kvm/hyp/nvhe/setup.c +++ b/arch/arm64/kvm/hyp/nvhe/setup.c @@ -180,7 +180,6 @@ static void hpool_put_page(void *addr) static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *c= tx, enum kvm_pgtable_walk_flags visit) { - enum kvm_pgtable_prot prot; enum pkvm_page_state state; phys_addr_t phys; =20 @@ -203,16 +202,16 @@ static int fix_host_ownership_walker(const struct kvm= _pgtable_visit_ctx *ctx, case PKVM_PAGE_OWNED: return host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HYP); case PKVM_PAGE_SHARED_OWNED: - prot =3D pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_BORROWED); + hyp_phys_to_page(phys)->host_state =3D PKVM_PAGE_SHARED_BORROWED; break; case PKVM_PAGE_SHARED_BORROWED: - prot =3D pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_OWNED); + hyp_phys_to_page(phys)->host_state =3D PKVM_PAGE_SHARED_OWNED; break; default: return -EINVAL; } =20 - return host_stage2_idmap_locked(phys, PAGE_SIZE, prot); + return 0; } =20 static int fix_hyp_pgtable_refcnt_walker(const struct kvm_pgtable_visit_ct= x *ctx, --=20 2.47.1.613.gc27f4b7a9f-goog