From nobody Sat Oct 4 00:32:14 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D31A26B764 for ; Thu, 21 Aug 2025 15:17:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755789435; cv=none; b=K3Xggm4k4pZkDvNZh6GrNqFciYfnHxbnjys0USwMwx+uJE4R1RmivpuqEgOkAL0LwxpAhlOBU/FIpYPgtvyCUXAmgkBVv0TLyRtzrBsUauK5S7jSIm2gFdeWYKZSs6FPtk/6UOtZv3hN5TgkIg2W5P9l5eUIIQ87gwawlHHnkyQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755789435; c=relaxed/simple; bh=0SlTH8DzYKTd5bC9/K7EPoIykb1RZYTurVkLTi5AST0=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type; b=XovRvodacGMFbOdmby36p7zFpD12CDesbw5YgazJ7sgNJSLJ5yUD8ur9kK8MlpG8hVYHx3cdm2pdEDh5ZWtEm53Uyg8TeGqZj7Pnv7x+C2sSzI5L9EQDoeSKGtM6MUV5HAz+dI43Ou1+JUiElmdSSfpm1B3p67bktpK7FknuIqM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=cek13Oj5; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="cek13Oj5" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1755789432; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=9MHI9cmaSGzsUAHu3JeD3WxOHSfcmFEXlgGpdm5OhPI=; b=cek13Oj5qvzqO3JlZWp1XQNBXRVDwjml84iR3GFFyaLi4awp5IF2TkMV6g/Ojlqbutq8Gb B/16zQH5ieqpTcTPY51pLiPIpOXZtzSVZ5S9ltW1yiIvUAZzrW7zkr2cpUWfcspUkamPYt hmduWe9Xk3wJKhXLPLVkOHCD52yCtTE= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-277-wWaMV3iKNaaarbOPVHtjPg-1; Thu, 21 Aug 2025 11:17:08 -0400 X-MC-Unique: wWaMV3iKNaaarbOPVHtjPg-1 X-Mimecast-MFC-AGG-ID: wWaMV3iKNaaarbOPVHtjPg_1755789426 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id DBC321800296; Thu, 21 Aug 2025 15:17:05 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.44.33.131]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 19B00197768C; Thu, 21 Aug 2025 15:17:01 +0000 (UTC) From: Vitaly Kuznetsov To: linux-hyperv@vger.kernel.org, Michael Kelley Cc: "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , x86@kernel.org, linux-kernel@vger.kernel.org, Nuno Das Neves , Tianyu Lan , Li Tian , Philipp Rudo Subject: [PATCH v3] x86/hyperv: Fix kdump on Azure CVMs Date: Thu, 21 Aug 2025 17:16:55 +0200 Message-ID: <20250821151655.3051386-1-vkuznets@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Content-Type: text/plain; charset="utf-8" Azure CVM instance types featuring a paravisor hang upon kdump. The investigation shows that makedumpfile causes a hang when it steps on a page which was previously share with the host (HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY). The new kernel has no knowledge of these 'special' regions (which are Vmbus connection pages, GPADL buffers, ...). There are several ways to approach the issue: - Convey the knowledge about these regions to the new kernel somehow. - Unshare these regions before accessing in the new kernel (it is unclear if there's a way to query the status for a given GPA range). - Unshare these regions before jumping to the new kernel (which this patch implements). To make the procedure as robust as possible, store PFN ranges of shared regions in a linked list instead of storing GVAs and re-using hv_vtom_set_host_visibility(). This also allows to avoid memory allocation on the kdump/kexec path. The patch skips implementing weird corner case in hv_list_enc_remove() when a PFN in the middle of a region is unshared. First, it is unlikely that such requests happen. Second, it is not a big problem if hv_list_enc_remove() doesn't actually remove some regions as this will only result in an extra hypercall doing nothing upon kexec/kdump; there's no need to be perfect. Signed-off-by: Vitaly Kuznetsov --- Changes since v2 [Michael Kelley]: - Rebase to hyperv-next. - Move hv_ivm_clear_host_access() call to hyperv_cleanup(). This also makes ARM stub unneeded. - Implement the missing corner case in hv_list_enc_remove(). With this, the math should (hopefully!) always be correct so we don't rely on the idempotency of HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY hypercall. As the case is not something we see, I tested the code with a few synthetic tests. - Fix the math in hv_list_enc_remove() (count -> ent->count). - Typos. --- arch/x86/hyperv/hv_init.c | 3 + arch/x86/hyperv/ivm.c | 213 ++++++++++++++++++++++++++++++-- arch/x86/include/asm/mshyperv.h | 2 + 3 files changed, 210 insertions(+), 8 deletions(-) diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index 2979d15223cf..4bb1578237eb 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -596,6 +596,9 @@ void hyperv_cleanup(void) union hv_x64_msr_hypercall_contents hypercall_msr; union hv_reference_tsc_msr tsc_msr; =20 + /* Retract host access to shared memory in case of isolation */ + hv_ivm_clear_host_access(); + /* Reset our OS id */ wrmsrq(HV_X64_MSR_GUEST_OS_ID, 0); hv_ivm_msr_write(HV_X64_MSR_GUEST_OS_ID, 0); diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c index 3084ae8a3eed..0d74156ad6a7 100644 --- a/arch/x86/hyperv/ivm.c +++ b/arch/x86/hyperv/ivm.c @@ -462,6 +462,188 @@ void hv_ivm_msr_read(u64 msr, u64 *value) hv_ghcb_msr_read(msr, value); } =20 +/* + * Keep track of the PFN regions which were shared with the host. The acce= ss + * must be revoked upon kexec/kdump (see hv_ivm_clear_host_access()). + */ +struct hv_enc_pfn_region { + struct list_head list; + u64 pfn; + int count; +}; + +static LIST_HEAD(hv_list_enc); +static DEFINE_RAW_SPINLOCK(hv_list_enc_lock); + +static int hv_list_enc_add(const u64 *pfn_list, int count) +{ + struct hv_enc_pfn_region *ent; + unsigned long flags; + u64 pfn; + int i; + + for (i =3D 0; i < count; i++) { + pfn =3D pfn_list[i]; + + raw_spin_lock_irqsave(&hv_list_enc_lock, flags); + /* Check if the PFN already exists in some region first */ + list_for_each_entry(ent, &hv_list_enc, list) { + if ((ent->pfn <=3D pfn) && (ent->pfn + ent->count - 1 >=3D pfn)) + /* Nothing to do - pfn is already in the list */ + goto unlock_done; + } + + /* + * Check if the PFN is adjacent to an existing region. Growing + * a region can make it adjacent to another one but merging is + * not (yet) implemented for simplicity. A PFN cannot be added + * to two regions to keep the logic in hv_list_enc_remove() + * correct. + */ + list_for_each_entry(ent, &hv_list_enc, list) { + if (ent->pfn + ent->count =3D=3D pfn) { + /* Grow existing region up */ + ent->count++; + goto unlock_done; + } else if (pfn + 1 =3D=3D ent->pfn) { + /* Grow existing region down */ + ent->pfn--; + ent->count++; + goto unlock_done; + } + } + raw_spin_unlock_irqrestore(&hv_list_enc_lock, flags); + + /* No adjacent region found -- create a new one */ + ent =3D kzalloc(sizeof(struct hv_enc_pfn_region), GFP_KERNEL); + if (!ent) + return -ENOMEM; + + ent->pfn =3D pfn; + ent->count =3D 1; + + raw_spin_lock_irqsave(&hv_list_enc_lock, flags); + list_add(&ent->list, &hv_list_enc); + +unlock_done: + raw_spin_unlock_irqrestore(&hv_list_enc_lock, flags); + } + + return 0; +} + +static void hv_list_enc_remove(const u64 *pfn_list, int count) +{ + struct hv_enc_pfn_region *ent, *t; + struct hv_enc_pfn_region new_region; + unsigned long flags; + u64 pfn; + int i; + + for (i =3D 0; i < count; i++) { + pfn =3D pfn_list[i]; + + raw_spin_lock_irqsave(&hv_list_enc_lock, flags); + list_for_each_entry_safe(ent, t, &hv_list_enc, list) { + if (pfn =3D=3D ent->pfn + ent->count - 1) { + /* Removing tail pfn */ + ent->count--; + if (!ent->count) { + list_del(&ent->list); + kfree(ent); + } + goto unlock_done; + } else if (pfn =3D=3D ent->pfn) { + /* Removing head pfn */ + ent->count--; + ent->pfn++; + if (!ent->count) { + list_del(&ent->list); + kfree(ent); + } + goto unlock_done; + } else if (pfn > ent->pfn && pfn < ent->pfn + ent->count - 1) { + /* + * Removing a pfn in the middle. Cut off the tail + * of the existing region and create a template for + * the new one. + */ + new_region.pfn =3D pfn + 1; + new_region.count =3D ent->count - (pfn - ent->pfn + 1); + ent->count =3D pfn - ent->pfn; + goto unlock_split; + } + + } +unlock_done: + raw_spin_unlock_irqrestore(&hv_list_enc_lock, flags); + continue; + +unlock_split: + raw_spin_unlock_irqrestore(&hv_list_enc_lock, flags); + + ent =3D kzalloc(sizeof(struct hv_enc_pfn_region), GFP_KERNEL); + /* + * There is no apparent good way to recover from -ENOMEM + * situation, the accouting is going to be wrong either way. + * Proceed with the rest of the list to make it 'less wrong'. + */ + if (WARN_ON_ONCE(!ent)) + continue; + + ent->pfn =3D new_region.pfn; + ent->count =3D new_region.count; + + raw_spin_lock_irqsave(&hv_list_enc_lock, flags); + list_add(&ent->list, &hv_list_enc); + raw_spin_unlock_irqrestore(&hv_list_enc_lock, flags); + } +} + +void hv_ivm_clear_host_access(void) +{ + struct hv_gpa_range_for_visibility *input; + struct hv_enc_pfn_region *ent; + unsigned long flags; + u64 hv_status; + int batch_size, cur, i; + + if (!hv_is_isolation_supported()) + return; + + raw_spin_lock_irqsave(&hv_list_enc_lock, flags); + + batch_size =3D MIN(hv_setup_in_array(&input, sizeof(*input), + sizeof(input->gpa_page_list[0])), + HV_MAX_MODIFY_GPA_REP_COUNT); + if (unlikely(!input)) + goto unlock; + + list_for_each_entry(ent, &hv_list_enc, list) { + for (i =3D 0, cur =3D 0; i < ent->count; i++) { + input->gpa_page_list[cur] =3D ent->pfn + i; + cur++; + + if (cur =3D=3D batch_size || i =3D=3D ent->count - 1) { + input->partition_id =3D HV_PARTITION_ID_SELF; + input->host_visibility =3D VMBUS_PAGE_NOT_VISIBLE; + input->reserved0 =3D 0; + input->reserved1 =3D 0; + hv_status =3D hv_do_rep_hypercall( + HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, + cur, 0, input, NULL); + WARN_ON_ONCE(!hv_result_success(hv_status)); + cur =3D 0; + } + } + + }; + +unlock: + raw_spin_unlock_irqrestore(&hv_list_enc_lock, flags); +} +EXPORT_SYMBOL_GPL(hv_ivm_clear_host_access); + /* * hv_mark_gpa_visibility - Set pages visible to host via hvcall. * @@ -476,24 +658,33 @@ static int hv_mark_gpa_visibility(u16 count, const u6= 4 pfn[], u64 hv_status; int batch_size; unsigned long flags; + int ret; =20 /* no-op if partition isolation is not enabled */ if (!hv_is_isolation_supported()) return 0; =20 + if (visibility =3D=3D VMBUS_PAGE_NOT_VISIBLE) { + hv_list_enc_remove(pfn, count); + } else { + ret =3D hv_list_enc_add(pfn, count); + if (ret) + return ret; + } + local_irq_save(flags); batch_size =3D hv_setup_in_array(&input, sizeof(*input), sizeof(input->gpa_page_list[0])); if (unlikely(!input)) { - local_irq_restore(flags); - return -EINVAL; + ret =3D -EINVAL; + goto unlock; } =20 if (count > batch_size) { pr_err("Hyper-V: GPA count:%d exceeds supported:%u\n", count, batch_size); - local_irq_restore(flags); - return -EINVAL; + ret =3D -EINVAL; + goto unlock; } =20 input->partition_id =3D HV_PARTITION_ID_SELF; @@ -502,12 +693,18 @@ static int hv_mark_gpa_visibility(u16 count, const u6= 4 pfn[], hv_status =3D hv_do_rep_hypercall( HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count, 0, input, NULL); - local_irq_restore(flags); - if (hv_result_success(hv_status)) - return 0; + ret =3D 0; else - return -EFAULT; + ret =3D -EFAULT; + +unlock: + local_irq_restore(flags); + + if (ret) + hv_list_enc_remove(pfn, count); + + return ret; } =20 /* diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyper= v.h index abc4659f5809..6a988001e46f 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -263,10 +263,12 @@ static inline int hv_snp_boot_ap(u32 apic_id, unsigne= d long start_ip, void hv_vtom_init(void); void hv_ivm_msr_write(u64 msr, u64 value); void hv_ivm_msr_read(u64 msr, u64 *value); +void hv_ivm_clear_host_access(void); #else static inline void hv_vtom_init(void) {} static inline void hv_ivm_msr_write(u64 msr, u64 value) {} static inline void hv_ivm_msr_read(u64 msr, u64 *value) {} +static inline void hv_ivm_clear_host_access(void) {} #endif =20 static inline bool hv_is_synic_msr(unsigned int reg) --=20 2.50.1