From nobody Wed Jun 17 02:49:30 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3FCA3382384; Wed, 22 Apr 2026 02:33:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825231; cv=none; b=Y4crd1MhzewoYMaHG/Ac28F1Mysv97yH4LOFPuk9BIM8jrw+X08+YIQwEBD82Rj+IE4AZlBtoQRirR7TOOkYsOar/ArhKSySkLiTxog/ssljLm4eO6SX3F87mIXADkLvlSnWsLR+BSV5xEcZiW6dSMhycFUqTzs+q0CRV7oOpbM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825231; c=relaxed/simple; bh=Yvgldd4nQGSF/LYqRa98GkHpoMtDyLqBf3EpIQNEdz4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Mk7FcC9BArgBX5pNI5UXlsrjtnuv3nowrkOJL09TVzyNr05sgThDeuDuVYyn2NOaOEps9qMSS8Ksf5F+CmUMkJtIm2xQhLHTZKCO+sjh9MaCITnpYCZ235akZkbOUpB88nUavj3OEVJgPk/T3wYDJ2FYDR490jtGMHq3oyk665Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=PNjDCwQb; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="PNjDCwQb" Received: from mrdev.corp.microsoft.com (192-184-212-33.fiber.dynamic.sonic.net [192.184.212.33]) by linux.microsoft.com (Postfix) with ESMTPSA id A8D0A20B6F08; Tue, 21 Apr 2026 19:33:37 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com A8D0A20B6F08 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776825218; bh=DGshgrqxUKkRySd8JcGwdzb2ggF+IE9bbLNEuDjQN+E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PNjDCwQbtWZ1gGTne9Wx6233h1/7r/9UKDsA/e2YJdwxrF1V67ohC96DZiM6P8TF3 Qwu8cP6MvQBodgNNu6f3V7L6cX2liQWK2Vz0bCrj6y1d+4fCBUdC+E9/y8pXojK9k0 pIZQwzMMXCtp4s6VRy2+vFe3vLDuXk1pjGRRrAcw= From: Mukesh R To: hpa@zytor.com, robin.murphy@arm.com, robh@kernel.org, wei.liu@kernel.org, mrathor@linux.microsoft.com, mhklinux@outlook.com, muislam@microsoft.com, namjain@linux.microsoft.com, magnuskulke@linux.microsoft.com, anbelski@linux.microsoft.com, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, iommu@lists.linux.dev, linux-pci@vger.kernel.org, linux-arch@vger.kernel.org Cc: kys@microsoft.com, haiyangz@microsoft.com, decui@microsoft.com, longli@microsoft.com, tglx@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, joro@8bytes.org, will@kernel.org, lpieralisi@kernel.org, kwilczynski@kernel.org, bhelgaas@google.com, arnd@arndb.de Subject: [PATCH V1 01/13] iommu/hyperv: rename hyperv-iommu.c to hyperv-irq.c Date: Tue, 21 Apr 2026 19:32:27 -0700 Message-ID: <20260422023239.1171963-2-mrathor@linux.microsoft.com> X-Mailer: git-send-email 2.51.2.vfs.0.1 In-Reply-To: <20260422023239.1171963-1-mrathor@linux.microsoft.com> References: <20260422023239.1171963-1-mrathor@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This file actually implements irq remapping, so rename to more appropriate hyperv-irq.c. A new file to implement hyperv iommu will be introduced later. Also, it should not be tied to HYPERV_IOMMU, but to CONFIG_HYPERV and IRQ_REMAP. The file already has #ifdef CONFIG_IRQ_REMAP. Signed-off-by: Mukesh R Reviewed-by: Anirudh Rayabharam (Microsoft) --- MAINTAINERS | 2 +- drivers/iommu/Makefile | 2 +- drivers/iommu/{hyperv-iommu.c =3D> hyperv-irq.c} | 2 +- drivers/iommu/irq_remapping.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) rename drivers/iommu/{hyperv-iommu.c =3D> hyperv-irq.c} (99%) diff --git a/MAINTAINERS b/MAINTAINERS index d1cc0e12fe1f..f803a6a38fee 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11914,7 +11914,7 @@ F: drivers/clocksource/hyperv_timer.c F: drivers/hid/hid-hyperv.c F: drivers/hv/ F: drivers/input/serio/hyperv-keyboard.c -F: drivers/iommu/hyperv-iommu.c +F: drivers/iommu/hyperv-irq.c F: drivers/net/ethernet/microsoft/ F: drivers/net/hyperv/ F: drivers/pci/controller/pci-hyperv-intf.c diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 0275821f4ef9..335ea77cced6 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -30,7 +30,7 @@ obj-$(CONFIG_TEGRA_IOMMU_SMMU) +=3D tegra-smmu.o obj-$(CONFIG_EXYNOS_IOMMU) +=3D exynos-iommu.o obj-$(CONFIG_FSL_PAMU) +=3D fsl_pamu.o fsl_pamu_domain.o obj-$(CONFIG_S390_IOMMU) +=3D s390-iommu.o -obj-$(CONFIG_HYPERV_IOMMU) +=3D hyperv-iommu.o +obj-$(CONFIG_HYPERV) +=3D hyperv-irq.o obj-$(CONFIG_VIRTIO_IOMMU) +=3D virtio-iommu.o obj-$(CONFIG_IOMMU_SVA) +=3D iommu-sva.o obj-$(CONFIG_IOMMU_IOPF) +=3D io-pgfault.o diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-irq.c similarity index 99% rename from drivers/iommu/hyperv-iommu.c rename to drivers/iommu/hyperv-irq.c index 479103261ae6..cc49c7cbc434 100644 --- a/drivers/iommu/hyperv-iommu.c +++ b/drivers/iommu/hyperv-irq.c @@ -331,4 +331,4 @@ static const struct irq_domain_ops hyperv_root_ir_domai= n_ops =3D { .free =3D hyperv_root_irq_remapping_free, }; =20 -#endif +#endif /* CONFIG_IRQ_REMAP */ diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c index c2443659812a..41bf65e4ea88 100644 --- a/drivers/iommu/irq_remapping.c +++ b/drivers/iommu/irq_remapping.c @@ -108,7 +108,7 @@ int __init irq_remapping_prepare(void) else if (IS_ENABLED(CONFIG_AMD_IOMMU) && amd_iommu_irq_ops.prepare() =3D=3D 0) remap_ops =3D &amd_iommu_irq_ops; - else if (IS_ENABLED(CONFIG_HYPERV_IOMMU) && + else if (IS_ENABLED(CONFIG_HYPERV) && hyperv_irq_remap_ops.prepare() =3D=3D 0) remap_ops =3D &hyperv_irq_remap_ops; else --=20 2.51.2.vfs.0.1 From nobody Wed Jun 17 02:49:30 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 00259382288; Wed, 22 Apr 2026 02:33:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825232; cv=none; b=mOTs3XURcoJ8SU4KXmy6BpCm6lErR3BeoI7uDYF3DLpF/J+9nuF/a1t/7IQiNn2rfwMJdRu/7ujXt5eTcDgxmz8tIjeWLwew+QrvkxlXrXKxVIdPM4QjaGpsAJdTZe4hqUMB5I2H2gUKAqiQdT1xh9YevKO/iNTh+s8Xp0gsTzk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825232; c=relaxed/simple; bh=aZ5nQ4kWLyHURBZkrlbygyaJvL1p2ds7QkhsVlbg7P4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BOkOkLd8k03U6NTP4sQ+wGZg8DdFtcTtEZw7F66bSOeVxLv93kyeuvQzyVjfxuSY+C4ZPLxI2nFdXH/J84l6F4cgmPVcovlaJoPuJDUWornYC4B0e50/LForHEGOrnWVoRz9IYAyzSiSGD/V5xh66KAD/4jrNX8Cbhaz+k0W0ns= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=KKA6qZW5; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="KKA6qZW5" Received: from mrdev.corp.microsoft.com (192-184-212-33.fiber.dynamic.sonic.net [192.184.212.33]) by linux.microsoft.com (Postfix) with ESMTPSA id E897520B6F0C; Tue, 21 Apr 2026 19:33:38 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com E897520B6F0C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776825219; bh=/FJ9bsIxu/JI3kZjnk9v7T9zslA2gv7h8SLu9mSngSI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=KKA6qZW5BjTGetTHaXr1H8yflRGSL50UrgxIwpnE1ji/CbxmlmQ2lYeHJZ6lF3qeF E5VhCdLpQ8kJNyFLS1ku5GfIrK7k35JBcqlwOBGGqu9ai1FDNXcgts90myQ4H+0XQ5 qupbxfCETPcRTSgtEui/pD/BAuxy0B1D6+OP/UqQ= From: Mukesh R To: hpa@zytor.com, robin.murphy@arm.com, robh@kernel.org, wei.liu@kernel.org, mrathor@linux.microsoft.com, mhklinux@outlook.com, muislam@microsoft.com, namjain@linux.microsoft.com, magnuskulke@linux.microsoft.com, anbelski@linux.microsoft.com, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, iommu@lists.linux.dev, linux-pci@vger.kernel.org, linux-arch@vger.kernel.org Cc: kys@microsoft.com, haiyangz@microsoft.com, decui@microsoft.com, longli@microsoft.com, tglx@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, joro@8bytes.org, will@kernel.org, lpieralisi@kernel.org, kwilczynski@kernel.org, bhelgaas@google.com, arnd@arndb.de Subject: [PATCH V1 02/13] x86/hyperv: cosmetic changes in irqdomain.c for readability Date: Tue, 21 Apr 2026 19:32:28 -0700 Message-ID: <20260422023239.1171963-3-mrathor@linux.microsoft.com> X-Mailer: git-send-email 2.51.2.vfs.0.1 In-Reply-To: <20260422023239.1171963-1-mrathor@linux.microsoft.com> References: <20260422023239.1171963-1-mrathor@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Make cosmetic changes: o Rename struct pci_dev *dev to *pdev since there are cases of struct device *dev in the file and all over the kernel o Rename hv_build_pci_dev_id to hv_build_devid_type_pci in anticipation of building different types of device ids o Fix checkpatch.pl issues with return and extraneous printk o Replace spaces with tabs o Rename struct hv_devid *xxx to struct hv_devid *hv_devid given code paths involve many types of device ids o Fix indentation in a large if block by using goto. There are no functional changes. Reviewed-by: Anirudh Rayabharam (Microsoft) Signed-off-by: Mukesh R --- arch/x86/hyperv/irqdomain.c | 198 +++++++++++++++++++----------------- 1 file changed, 104 insertions(+), 94 deletions(-) diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c index 365e364268d9..b3ad50a874dc 100644 --- a/arch/x86/hyperv/irqdomain.c +++ b/arch/x86/hyperv/irqdomain.c @@ -1,5 +1,4 @@ // SPDX-License-Identifier: GPL-2.0 - /* * Irqdomain for Linux to run as the root partition on Microsoft Hyperviso= r. * @@ -14,8 +13,8 @@ #include #include =20 -static int hv_map_interrupt(union hv_device_id device_id, bool level, - int cpu, int vector, struct hv_interrupt_entry *entry) +static int hv_map_interrupt(union hv_device_id hv_devid, bool level, + int cpu, int vector, struct hv_interrupt_entry *ret_entry) { struct hv_input_map_device_interrupt *input; struct hv_output_map_device_interrupt *output; @@ -32,7 +31,7 @@ static int hv_map_interrupt(union hv_device_id device_id,= bool level, intr_desc =3D &input->interrupt_descriptor; memset(input, 0, sizeof(*input)); input->partition_id =3D hv_current_partition_id; - input->device_id =3D device_id.as_uint64; + input->device_id =3D hv_devid.as_uint64; intr_desc->interrupt_type =3D HV_X64_INTERRUPT_TYPE_FIXED; intr_desc->vector_count =3D 1; intr_desc->target.vector =3D vector; @@ -44,7 +43,7 @@ static int hv_map_interrupt(union hv_device_id device_id,= bool level, =20 intr_desc->target.vp_set.valid_bank_mask =3D 0; intr_desc->target.vp_set.format =3D HV_GENERIC_SET_SPARSE_4K; - nr_bank =3D cpumask_to_vpset(&(intr_desc->target.vp_set), cpumask_of(cpu)= ); + nr_bank =3D cpumask_to_vpset(&intr_desc->target.vp_set, cpumask_of(cpu)); if (nr_bank < 0) { local_irq_restore(flags); pr_err("%s: unable to generate VP set\n", __func__); @@ -61,7 +60,7 @@ static int hv_map_interrupt(union hv_device_id device_id,= bool level, =20 status =3D hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, var_size, input, output); - *entry =3D output->interrupt_entry; + *ret_entry =3D output->interrupt_entry; =20 local_irq_restore(flags); =20 @@ -71,21 +70,19 @@ static int hv_map_interrupt(union hv_device_id device_i= d, bool level, return hv_result_to_errno(status); } =20 -static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry *old_entry) +static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry *irq_entry) { unsigned long flags; struct hv_input_unmap_device_interrupt *input; - struct hv_interrupt_entry *intr_entry; u64 status; =20 local_irq_save(flags); input =3D *this_cpu_ptr(hyperv_pcpu_input_arg); =20 memset(input, 0, sizeof(*input)); - intr_entry =3D &input->interrupt_entry; input->partition_id =3D hv_current_partition_id; input->device_id =3D id; - *intr_entry =3D *old_entry; + input->interrupt_entry =3D *irq_entry; =20 status =3D hv_do_hypercall(HVCALL_UNMAP_DEVICE_INTERRUPT, input, NULL); local_irq_restore(flags); @@ -115,67 +112,71 @@ static int get_rid_cb(struct pci_dev *pdev, u16 alias= , void *data) return 0; } =20 -static union hv_device_id hv_build_pci_dev_id(struct pci_dev *dev) +static union hv_device_id hv_build_devid_type_pci(struct pci_dev *pdev) { - union hv_device_id dev_id; + int pos; + union hv_device_id hv_devid; struct rid_data data =3D { .bridge =3D NULL, - .rid =3D PCI_DEVID(dev->bus->number, dev->devfn) + .rid =3D PCI_DEVID(pdev->bus->number, pdev->devfn) }; =20 - pci_for_each_dma_alias(dev, get_rid_cb, &data); + pci_for_each_dma_alias(pdev, get_rid_cb, &data); =20 - dev_id.as_uint64 =3D 0; - dev_id.device_type =3D HV_DEVICE_TYPE_PCI; - dev_id.pci.segment =3D pci_domain_nr(dev->bus); + hv_devid.as_uint64 =3D 0; + hv_devid.device_type =3D HV_DEVICE_TYPE_PCI; + hv_devid.pci.segment =3D pci_domain_nr(pdev->bus); =20 - dev_id.pci.bdf.bus =3D PCI_BUS_NUM(data.rid); - dev_id.pci.bdf.device =3D PCI_SLOT(data.rid); - dev_id.pci.bdf.function =3D PCI_FUNC(data.rid); - dev_id.pci.source_shadow =3D HV_SOURCE_SHADOW_NONE; + hv_devid.pci.bdf.bus =3D PCI_BUS_NUM(data.rid); + hv_devid.pci.bdf.device =3D PCI_SLOT(data.rid); + hv_devid.pci.bdf.function =3D PCI_FUNC(data.rid); + hv_devid.pci.source_shadow =3D HV_SOURCE_SHADOW_NONE; =20 - if (data.bridge) { - int pos; + if (data.bridge =3D=3D NULL) + goto out; =20 - /* - * Microsoft Hypervisor requires a bus range when the bridge is - * running in PCI-X mode. - * - * To distinguish conventional vs PCI-X bridge, we can check - * the bridge's PCI-X Secondary Status Register, Secondary Bus - * Mode and Frequency bits. See PCI Express to PCI/PCI-X Bridge - * Specification Revision 1.0 5.2.2.1.3. - * - * Value zero means it is in conventional mode, otherwise it is - * in PCI-X mode. - */ + /* + * Microsoft Hypervisor requires a bus range when the bridge is + * running in PCI-X mode. + * + * To distinguish conventional vs PCI-X bridge, we can check + * the bridge's PCI-X Secondary Status Register, Secondary Bus + * Mode and Frequency bits. See PCI Express to PCI/PCI-X Bridge + * Specification Revision 1.0 5.2.2.1.3. + * + * Value zero means it is in conventional mode, otherwise it is + * in PCI-X mode. + */ =20 - pos =3D pci_find_capability(data.bridge, PCI_CAP_ID_PCIX); - if (pos) { - u16 status; + pos =3D pci_find_capability(data.bridge, PCI_CAP_ID_PCIX); + if (pos) { + u16 status; =20 - pci_read_config_word(data.bridge, pos + - PCI_X_BRIDGE_SSTATUS, &status); + pci_read_config_word(data.bridge, pos + PCI_X_BRIDGE_SSTATUS, + &status); =20 - if (status & PCI_X_SSTATUS_FREQ) { - /* Non-zero, PCI-X mode */ - u8 sec_bus, sub_bus; + if (status & PCI_X_SSTATUS_FREQ) { + /* Non-zero, PCI-X mode */ + u8 sec_bus, sub_bus; =20 - dev_id.pci.source_shadow =3D HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE; + hv_devid.pci.source_shadow =3D + HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE; =20 - pci_read_config_byte(data.bridge, PCI_SECONDARY_BUS, &sec_bus); - dev_id.pci.shadow_bus_range.secondary_bus =3D sec_bus; - pci_read_config_byte(data.bridge, PCI_SUBORDINATE_BUS, &sub_bus); - dev_id.pci.shadow_bus_range.subordinate_bus =3D sub_bus; - } + pci_read_config_byte(data.bridge, PCI_SECONDARY_BUS, + &sec_bus); + hv_devid.pci.shadow_bus_range.secondary_bus =3D sec_bus; + pci_read_config_byte(data.bridge, PCI_SUBORDINATE_BUS, + &sub_bus); + hv_devid.pci.shadow_bus_range.subordinate_bus =3D sub_bus; } } =20 - return dev_id; +out: + return hv_devid; } =20 -/** - * hv_map_msi_interrupt() - "Map" the MSI IRQ in the hypervisor. +/* + * hv_map_msi_interrupt() - Map the MSI IRQ in the hypervisor. * @data: Describes the IRQ * @out_entry: Hypervisor (MSI) interrupt entry (can be NULL) * @@ -188,22 +189,23 @@ int hv_map_msi_interrupt(struct irq_data *data, { struct irq_cfg *cfg =3D irqd_cfg(data); struct hv_interrupt_entry dummy; - union hv_device_id device_id; + union hv_device_id hv_devid; struct msi_desc *msidesc; - struct pci_dev *dev; + struct pci_dev *pdev; int cpu; =20 msidesc =3D irq_data_get_msi_desc(data); - dev =3D msi_desc_to_pci_dev(msidesc); - device_id =3D hv_build_pci_dev_id(dev); + pdev =3D msi_desc_to_pci_dev(msidesc); + hv_devid =3D hv_build_devid_type_pci(pdev); cpu =3D cpumask_first(irq_data_get_effective_affinity_mask(data)); =20 - return hv_map_interrupt(device_id, false, cpu, cfg->vector, + return hv_map_interrupt(hv_devid, false, cpu, cfg->vector, out_entry ? out_entry : &dummy); } EXPORT_SYMBOL_GPL(hv_map_msi_interrupt); =20 -static inline void entry_to_msi_msg(struct hv_interrupt_entry *entry, stru= ct msi_msg *msg) +static void entry_to_msi_msg(struct hv_interrupt_entry *entry, + struct msi_msg *msg) { /* High address is always 0 */ msg->address_hi =3D 0; @@ -211,17 +213,19 @@ static inline void entry_to_msi_msg(struct hv_interru= pt_entry *entry, struct msi msg->data =3D entry->msi_entry.data.as_uint32; } =20 -static int hv_unmap_msi_interrupt(struct pci_dev *dev, struct hv_interrupt= _entry *old_entry); +static int hv_unmap_msi_interrupt(struct pci_dev *pdev, + struct hv_interrupt_entry *irq_entry); + static void hv_irq_compose_msi_msg(struct irq_data *data, struct msi_msg *= msg) { struct hv_interrupt_entry *stored_entry; struct irq_cfg *cfg =3D irqd_cfg(data); struct msi_desc *msidesc; - struct pci_dev *dev; + struct pci_dev *pdev; int ret; =20 msidesc =3D irq_data_get_msi_desc(data); - dev =3D msi_desc_to_pci_dev(msidesc); + pdev =3D msi_desc_to_pci_dev(msidesc); =20 if (!cfg) { pr_debug("%s: cfg is NULL", __func__); @@ -240,7 +244,7 @@ static void hv_irq_compose_msi_msg(struct irq_data *dat= a, struct msi_msg *msg) stored_entry =3D data->chip_data; data->chip_data =3D NULL; =20 - ret =3D hv_unmap_msi_interrupt(dev, stored_entry); + ret =3D hv_unmap_msi_interrupt(pdev, stored_entry); =20 kfree(stored_entry); =20 @@ -249,10 +253,8 @@ static void hv_irq_compose_msi_msg(struct irq_data *da= ta, struct msi_msg *msg) } =20 stored_entry =3D kzalloc_obj(*stored_entry, GFP_ATOMIC); - if (!stored_entry) { - pr_debug("%s: failed to allocate chip data\n", __func__); + if (!stored_entry) return; - } =20 ret =3D hv_map_msi_interrupt(data, stored_entry); if (ret) { @@ -262,18 +264,21 @@ static void hv_irq_compose_msi_msg(struct irq_data *d= ata, struct msi_msg *msg) =20 data->chip_data =3D stored_entry; entry_to_msi_msg(data->chip_data, msg); - - return; } =20 -static int hv_unmap_msi_interrupt(struct pci_dev *dev, struct hv_interrupt= _entry *old_entry) +static int hv_unmap_msi_interrupt(struct pci_dev *pdev, + struct hv_interrupt_entry *irq_entry) { - return hv_unmap_interrupt(hv_build_pci_dev_id(dev).as_uint64, old_entry); + union hv_device_id hv_devid; + + hv_devid =3D hv_build_devid_type_pci(pdev); + return hv_unmap_interrupt(hv_devid.as_uint64, irq_entry); } =20 -static void hv_teardown_msi_irq(struct pci_dev *dev, struct irq_data *irqd) +/* NB: during map, hv_interrupt_entry is saved via data->chip_data */ +static void hv_teardown_msi_irq(struct pci_dev *pdev, struct irq_data *irq= d) { - struct hv_interrupt_entry old_entry; + struct hv_interrupt_entry irq_entry; struct msi_msg msg; =20 if (!irqd->chip_data) { @@ -281,13 +286,13 @@ static void hv_teardown_msi_irq(struct pci_dev *dev, = struct irq_data *irqd) return; } =20 - old_entry =3D *(struct hv_interrupt_entry *)irqd->chip_data; - entry_to_msi_msg(&old_entry, &msg); + irq_entry =3D *(struct hv_interrupt_entry *)irqd->chip_data; + entry_to_msi_msg(&irq_entry, &msg); =20 kfree(irqd->chip_data); irqd->chip_data =3D NULL; =20 - (void)hv_unmap_msi_interrupt(dev, &old_entry); + (void)hv_unmap_msi_interrupt(pdev, &irq_entry); } =20 /* @@ -302,7 +307,8 @@ static struct irq_chip hv_pci_msi_controller =3D { }; =20 static bool hv_init_dev_msi_info(struct device *dev, struct irq_domain *do= main, - struct irq_domain *real_parent, struct msi_domain_info *info) + struct irq_domain *real_parent, + struct msi_domain_info *info) { struct irq_chip *chip =3D info->chip; =20 @@ -317,7 +323,8 @@ static bool hv_init_dev_msi_info(struct device *dev, st= ruct irq_domain *domain, } =20 #define HV_MSI_FLAGS_SUPPORTED (MSI_GENERIC_FLAGS_MASK | MSI_FLAG_PCI_MSIX) -#define HV_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF= _CHIP_OPS) +#define HV_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS | \ + MSI_FLAG_USE_DEF_CHIP_OPS) =20 static struct msi_parent_ops hv_msi_parent_ops =3D { .supported_flags =3D HV_MSI_FLAGS_SUPPORTED, @@ -329,14 +336,14 @@ static struct msi_parent_ops hv_msi_parent_ops =3D { .init_dev_msi_info =3D hv_init_dev_msi_info, }; =20 -static int hv_msi_domain_alloc(struct irq_domain *d, unsigned int virq, un= signed int nr_irqs, - void *arg) +/* Allocate nr_irqs IRQs for the given irq domain */ +static int hv_msi_domain_alloc(struct irq_domain *d, unsigned int virq, + unsigned int nr_irqs, void *arg) { /* - * TODO: The allocation bits of hv_irq_compose_msi_msg(), i.e. everything= except - * entry_to_msi_msg() should be in here. + * TODO: The allocation bits of hv_irq_compose_msi_msg(), i.e. + * everything except entry_to_msi_msg() should be in here. */ - int ret; =20 ret =3D irq_domain_alloc_irqs_parent(d, virq, nr_irqs, arg); @@ -344,13 +351,15 @@ static int hv_msi_domain_alloc(struct irq_domain *d, = unsigned int virq, unsigned return ret; =20 for (int i =3D 0; i < nr_irqs; ++i) { - irq_domain_set_info(d, virq + i, 0, &hv_pci_msi_controller, NULL, - handle_edge_irq, NULL, "edge"); + irq_domain_set_info(d, virq + i, 0, &hv_pci_msi_controller, + NULL, handle_edge_irq, NULL, "edge"); } + return 0; } =20 -static void hv_msi_domain_free(struct irq_domain *d, unsigned int virq, un= signed int nr_irqs) +static void hv_msi_domain_free(struct irq_domain *d, unsigned int virq, + unsigned int nr_irqs) { for (int i =3D 0; i < nr_irqs; ++i) { struct irq_data *irqd =3D irq_domain_get_irq_data(d, virq); @@ -362,6 +371,7 @@ static void hv_msi_domain_free(struct irq_domain *d, un= signed int virq, unsigned =20 hv_teardown_msi_irq(to_pci_dev(desc->dev), irqd); } + irq_domain_free_irqs_top(d, virq, nr_irqs); } =20 @@ -394,25 +404,25 @@ struct irq_domain * __init hv_create_pci_msi_domain(v= oid) =20 int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *en= try) { - union hv_device_id device_id; + union hv_device_id hv_devid; =20 - device_id.as_uint64 =3D 0; - device_id.device_type =3D HV_DEVICE_TYPE_IOAPIC; - device_id.ioapic.ioapic_id =3D (u8)ioapic_id; + hv_devid.as_uint64 =3D 0; + hv_devid.device_type =3D HV_DEVICE_TYPE_IOAPIC; + hv_devid.ioapic.ioapic_id =3D (u8)ioapic_id; =20 - return hv_unmap_interrupt(device_id.as_uint64, entry); + return hv_unmap_interrupt(hv_devid.as_uint64, entry); } EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt); =20 int hv_map_ioapic_interrupt(int ioapic_id, bool level, int cpu, int vector, struct hv_interrupt_entry *entry) { - union hv_device_id device_id; + union hv_device_id hv_devid; =20 - device_id.as_uint64 =3D 0; - device_id.device_type =3D HV_DEVICE_TYPE_IOAPIC; - device_id.ioapic.ioapic_id =3D (u8)ioapic_id; + hv_devid.as_uint64 =3D 0; + hv_devid.device_type =3D HV_DEVICE_TYPE_IOAPIC; + hv_devid.ioapic.ioapic_id =3D (u8)ioapic_id; =20 - return hv_map_interrupt(device_id, level, cpu, vector, entry); + return hv_map_interrupt(hv_devid, level, cpu, vector, entry); } EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt); --=20 2.51.2.vfs.0.1 From nobody Wed Jun 17 02:49:30 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D578D37CD54; Wed, 22 Apr 2026 02:33:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825230; cv=none; b=c0NV1G5mhRAiXPuDG66LV/JjfQ2/Z1D2SjmK9yZ+aUbCNoJT97YlsFwgjpYy2NNEzp36g4kDwLjwk6xIJLf4AfXtoRfyG90nb6WF7nfP1H2HM1ibxKk/RwkCh7dSfhEWhEOpXm2zzUUhNlJvvb0EMBGVjjRirQQQZgN6jC0gNA8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825230; c=relaxed/simple; bh=Z3Tt+i4ryjnf6ysGYVY1XRQ8YBFCce5JBQ1Bz2twCCw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NKKAy9n35dTs2v+NqsyquEkpoMWQa00CJTcvRBycrcL+XzTZ3rmZLGSQD0tw8eOUnx1DT9HGoZr1UH4hRhDVgIV772LW0p9Vz+R3FJ69WWW8yqEJqo4y51RqXaRsbcevrYxERueJyb77n/9rNPvCPldK2UhuC8CloEXaTyxYY2g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=UtQVZhp7; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="UtQVZhp7" Received: from mrdev.corp.microsoft.com (192-184-212-33.fiber.dynamic.sonic.net [192.184.212.33]) by linux.microsoft.com (Postfix) with ESMTPSA id 8A90C20B6F12; Tue, 21 Apr 2026 19:33:40 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 8A90C20B6F12 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776825221; bh=ONiMZWsb8/8IhaBjkgi5kkshWDnWtbTkSXDdbFle5ZE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UtQVZhp7fv0i6HpN1CrtxqNecqOwBHsibTl5ydNBUd1KFghrBaPB7lDZCCY55sv9D oNNyybQ/srUjUncnRi/0WZ0w8/XwbirBCbC+9DwEvhFb8zKCNNLrnZAmAZxGSUJVWV WQLsQ0zLRlycUT/HH1rx3r4W0SE6E7e4zex1aM5Q= From: Mukesh R To: hpa@zytor.com, robin.murphy@arm.com, robh@kernel.org, wei.liu@kernel.org, mrathor@linux.microsoft.com, mhklinux@outlook.com, muislam@microsoft.com, namjain@linux.microsoft.com, magnuskulke@linux.microsoft.com, anbelski@linux.microsoft.com, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, iommu@lists.linux.dev, linux-pci@vger.kernel.org, linux-arch@vger.kernel.org Cc: kys@microsoft.com, haiyangz@microsoft.com, decui@microsoft.com, longli@microsoft.com, tglx@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, joro@8bytes.org, will@kernel.org, lpieralisi@kernel.org, kwilczynski@kernel.org, bhelgaas@google.com, arnd@arndb.de Subject: [PATCH V1 03/13] x86/hyperv: add insufficient memory support in irqdomain.c Date: Tue, 21 Apr 2026 19:32:29 -0700 Message-ID: <20260422023239.1171963-4-mrathor@linux.microsoft.com> X-Mailer: git-send-email 2.51.2.vfs.0.1 In-Reply-To: <20260422023239.1171963-1-mrathor@linux.microsoft.com> References: <20260422023239.1171963-1-mrathor@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Intermittent insufficient memory hypercall failure have been observed in the current map device interrupt hypercall. In case of such a failure, we must deposit more memory and redo the hypercall. Add support for that. Deposit memory needs partition id, make that a parameter to the map interrupt function. Signed-off-by: Mukesh R --- arch/x86/hyperv/irqdomain.c | 38 +++++++++++++++++++++++++++++++------ 1 file changed, 32 insertions(+), 6 deletions(-) diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c index b3ad50a874dc..229f986e08ea 100644 --- a/arch/x86/hyperv/irqdomain.c +++ b/arch/x86/hyperv/irqdomain.c @@ -13,8 +13,9 @@ #include #include =20 -static int hv_map_interrupt(union hv_device_id hv_devid, bool level, - int cpu, int vector, struct hv_interrupt_entry *ret_entry) +static u64 hv_map_interrupt_hcall(u64 ptid, union hv_device_id hv_devid, + bool level, int cpu, int vector, + struct hv_interrupt_entry *ret_entry) { struct hv_input_map_device_interrupt *input; struct hv_output_map_device_interrupt *output; @@ -30,8 +31,10 @@ static int hv_map_interrupt(union hv_device_id hv_devid,= bool level, =20 intr_desc =3D &input->interrupt_descriptor; memset(input, 0, sizeof(*input)); - input->partition_id =3D hv_current_partition_id; + + input->partition_id =3D ptid; input->device_id =3D hv_devid.as_uint64; + intr_desc->interrupt_type =3D HV_X64_INTERRUPT_TYPE_FIXED; intr_desc->vector_count =3D 1; intr_desc->target.vector =3D vector; @@ -64,6 +67,28 @@ static int hv_map_interrupt(union hv_device_id hv_devid,= bool level, =20 local_irq_restore(flags); =20 + return status; +} + +static int hv_map_interrupt(u64 ptid, union hv_device_id device_id, bool l= evel, + int cpu, int vector, + struct hv_interrupt_entry *ret_entry) +{ + u64 status; + int rc, deposit_pgs =3D 16; /* don't loop forever */ + + while (deposit_pgs--) { + status =3D hv_map_interrupt_hcall(ptid, device_id, level, cpu, + vector, ret_entry); + + if (hv_result(status) !=3D HV_STATUS_INSUFFICIENT_MEMORY) + break; + + rc =3D hv_call_deposit_pages(NUMA_NO_NODE, ptid, 1); + if (rc) + break; + } + if (!hv_result_success(status)) hv_status_err(status, "\n"); =20 @@ -199,8 +224,8 @@ int hv_map_msi_interrupt(struct irq_data *data, hv_devid =3D hv_build_devid_type_pci(pdev); cpu =3D cpumask_first(irq_data_get_effective_affinity_mask(data)); =20 - return hv_map_interrupt(hv_devid, false, cpu, cfg->vector, - out_entry ? out_entry : &dummy); + return hv_map_interrupt(hv_current_partition_id, hv_devid, false, cpu, + cfg->vector, out_entry ? out_entry : &dummy); } EXPORT_SYMBOL_GPL(hv_map_msi_interrupt); =20 @@ -423,6 +448,7 @@ int hv_map_ioapic_interrupt(int ioapic_id, bool level, = int cpu, int vector, hv_devid.device_type =3D HV_DEVICE_TYPE_IOAPIC; hv_devid.ioapic.ioapic_id =3D (u8)ioapic_id; =20 - return hv_map_interrupt(hv_devid, level, cpu, vector, entry); + return hv_map_interrupt(hv_current_partition_id, hv_devid, level, cpu, + vector, entry); } EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt); --=20 2.51.2.vfs.0.1 From nobody Wed Jun 17 02:49:30 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 594993815C5; Wed, 22 Apr 2026 02:33:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825236; cv=none; b=oO6SYhTO+9nm8Ht2ip/7znJV0sAZfBc1xiYNTXARa6N9rVsIKuw3oCCUVMXAFXr7aSjutVYFbLdpwnOhGCwx0vvcx+l2at/eFp/tgzVq2spz/zH3/L83LAqhO4RoCB/1JiS8hhj1+M7mBDat61QOsL/EoXiXVSsa3jO42H6/Fws= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825236; c=relaxed/simple; bh=yqHoN0/B7kB4uVVOu+NW/uh2AcmRAs1ftzDdzQp4n00=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jlj2j5Gk8H00aEuQMINC/AwZzLTXLZ9mKtuh9DRexe7aCyr4dk0d3eSTDU7TVhMj4z8+JHkNZGUe+xR+nvZuWnBjXCS8+wRORDpjEgwDRmn+rxD6n74ANhM0r9sk8FiQfizueph4wRtaiyr2+VfUa41/VI3BX5xcR1OBuzSkdSA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=s+NK51tP; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="s+NK51tP" Received: from mrdev.corp.microsoft.com (192-184-212-33.fiber.dynamic.sonic.net [192.184.212.33]) by linux.microsoft.com (Postfix) with ESMTPSA id BD44420B6F15; Tue, 21 Apr 2026 19:33:41 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com BD44420B6F15 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776825222; bh=F42zVdEs44T8GRjrW0mPW72by/e2QFwD5VC6Q2hyg6E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=s+NK51tPdoLfcxZ6OffDm6U7IvvtWOvDL0BH4ln5pINhW0LUgQc8ibHZoufX7y6gY voDlcKrXSxemicI9qs5qJ3LSpm7jfovEK6myrTsLivNmMDyziKAsh6BhQj/wGyFFa1 2DrPOCAnCFSz15t0IL1xYXUKqU0PDWO3dcKCaw6Q= From: Mukesh R To: hpa@zytor.com, robin.murphy@arm.com, robh@kernel.org, wei.liu@kernel.org, mrathor@linux.microsoft.com, mhklinux@outlook.com, muislam@microsoft.com, namjain@linux.microsoft.com, magnuskulke@linux.microsoft.com, anbelski@linux.microsoft.com, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, iommu@lists.linux.dev, linux-pci@vger.kernel.org, linux-arch@vger.kernel.org Cc: kys@microsoft.com, haiyangz@microsoft.com, decui@microsoft.com, longli@microsoft.com, tglx@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, joro@8bytes.org, will@kernel.org, lpieralisi@kernel.org, kwilczynski@kernel.org, bhelgaas@google.com, arnd@arndb.de Subject: [PATCH V1 04/13] mshv: Provide a way to get partition id if running in a VMM process Date: Tue, 21 Apr 2026 19:32:30 -0700 Message-ID: <20260422023239.1171963-5-mrathor@linux.microsoft.com> X-Mailer: git-send-email 2.51.2.vfs.0.1 In-Reply-To: <20260422023239.1171963-1-mrathor@linux.microsoft.com> References: <20260422023239.1171963-1-mrathor@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Many PCI passthru related hypercalls require partition id of the target guest. Guests are actually managed by MSHV driver and the partition id is only maintained there. Add a field in the partition struct in MSHV driver to save the tgid of the VMM process creating the partition, and add a function there to retrieve partition id if current process is a VMM process. Signed-off-by: Mukesh R Reviewed-by: Anirudh Rayabharam (Microsoft) --- drivers/hv/mshv_root.h | 1 + drivers/hv/mshv_root_main.c | 22 ++++++++++++++++++++++ include/asm-generic/mshyperv.h | 5 +++++ 3 files changed, 28 insertions(+) diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h index 1f086dcb7aa1..a85c24dcc701 100644 --- a/drivers/hv/mshv_root.h +++ b/drivers/hv/mshv_root.h @@ -138,6 +138,7 @@ struct mshv_partition { =20 struct mshv_girq_routing_table __rcu *pt_girq_tbl; u64 isolation_type; + pid_t pt_vmm_tgid; bool import_completed; bool pt_initialized; #if IS_ENABLED(CONFIG_DEBUG_FS) diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c index bd1359eb58dd..02c107458be9 100644 --- a/drivers/hv/mshv_root_main.c +++ b/drivers/hv/mshv_root_main.c @@ -1908,6 +1908,27 @@ mshv_partition_release(struct inode *inode, struct f= ile *filp) return 0; } =20 +/* Given a process tgid, return partition id if it is a VMM process */ +u64 mshv_current_partid(void) +{ + struct mshv_partition *pt; + int i; + u64 ret_ptid =3D HV_PARTITION_ID_INVALID; + + rcu_read_lock(); + + hash_for_each_rcu(mshv_root.pt_htable, i, pt, pt_hnode) { + if (pt->pt_vmm_tgid =3D=3D current->tgid) { + ret_ptid =3D pt->pt_id; + break; + } + } + + rcu_read_unlock(); + return ret_ptid; +} +EXPORT_SYMBOL_GPL(mshv_current_partid); + static int add_partition(struct mshv_partition *partition) { @@ -2073,6 +2094,7 @@ mshv_ioctl_create_partition(void __user *user_arg, st= ruct device *module_dev) goto cleanup_irq_srcu; =20 partition->pt_id =3D pt_id; + partition->pt_vmm_tgid =3D current->tgid; =20 ret =3D add_partition(partition); if (ret) diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h index bf601d67cecb..e8cbc4e3f7ad 100644 --- a/include/asm-generic/mshyperv.h +++ b/include/asm-generic/mshyperv.h @@ -350,6 +350,7 @@ int hv_call_add_logical_proc(int node, u32 lp_index, u3= 2 acpi_id); int hv_call_notify_all_processors_started(void); bool hv_lp_exists(u32 lp_index); int hv_call_create_vp(int node, u64 partition_id, u32 vp_index, u32 flags); +u64 mshv_current_partid(void); =20 #else /* CONFIG_MSHV_ROOT */ static inline bool hv_root_partition(void) { return false; } @@ -380,6 +381,10 @@ static inline int hv_call_create_vp(int node, u64 part= ition_id, u32 vp_index, u3 { return -EOPNOTSUPP; } +static inline u64 mshv_current_partid(void) +{ + return HV_PARTITION_ID_INVALID; +} #endif /* CONFIG_MSHV_ROOT */ =20 static inline int hv_deposit_memory(u64 partition_id, u64 status) --=20 2.51.2.vfs.0.1 From nobody Wed Jun 17 02:49:30 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 56A7B3806C2; Wed, 22 Apr 2026 02:33:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825236; cv=none; b=skL7Hy5AY2VY0UySb0c0vBchmPL8w8unqfZDMACHoxkGHoLpJIm7GreEbUQuKpqMwqpxHuntoG8goRHXuSeL255adQOPAFp0jg3O8pKMx0Se4zHHrydkW+CNo5BHIWni0L9bRY7qwn6K4yDzthvVXmIKYLI2I5c6lxlIEqWlfAM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825236; c=relaxed/simple; bh=i7TTD4rbc9FHgT/PqRax2hCxSF14mcjnAspWxjiAQFI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HmzpEgY5FxytW0hXk65MKzzCpC/ICCyao2ZfFLV8Mjh8thXoInE9Vf+Lx57+rybck0+7OoBI/zgJ4H5pK9FNDih+fMQxukclhi61W0t4yKhiknVqKnRJQNUa06wHWVgVTGZrz/dzv1zZMZtQATbH4+ZWao/IGUR2KRqN9iaUgN4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=QGPwRGKv; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="QGPwRGKv" Received: from mrdev.corp.microsoft.com (192-184-212-33.fiber.dynamic.sonic.net [192.184.212.33]) by linux.microsoft.com (Postfix) with ESMTPSA id 1C17A20B6F1B; Tue, 21 Apr 2026 19:33:43 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 1C17A20B6F1B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776825223; bh=tidzTRCfJneaHjA+fOlX3DgM/7vG+INZY/lk1h6owmY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QGPwRGKvHHZQ5ClWfTVBNhmOekp92Iiqn55G5NGuiCZYTUvYYvnsvCMP3fEQ0/UUA UfP+vFCUAIYN+x9a76g1Jq4uVNYbk2FK2SgvaioAPdeGy1G8DqrU8V2AAKUie1JIC5 G3MyhqLY7hCDtKHMuoSDWn/oe76iNjehF9pIk1kg= From: Mukesh R To: hpa@zytor.com, robin.murphy@arm.com, robh@kernel.org, wei.liu@kernel.org, mrathor@linux.microsoft.com, mhklinux@outlook.com, muislam@microsoft.com, namjain@linux.microsoft.com, magnuskulke@linux.microsoft.com, anbelski@linux.microsoft.com, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, iommu@lists.linux.dev, linux-pci@vger.kernel.org, linux-arch@vger.kernel.org Cc: kys@microsoft.com, haiyangz@microsoft.com, decui@microsoft.com, longli@microsoft.com, tglx@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, joro@8bytes.org, will@kernel.org, lpieralisi@kernel.org, kwilczynski@kernel.org, bhelgaas@google.com, arnd@arndb.de Subject: [PATCH V1 05/13] mshv: Declarations and definitions for VFIO-MSHV bridge device Date: Tue, 21 Apr 2026 19:32:31 -0700 Message-ID: <20260422023239.1171963-6-mrathor@linux.microsoft.com> X-Mailer: git-send-email 2.51.2.vfs.0.1 In-Reply-To: <20260422023239.1171963-1-mrathor@linux.microsoft.com> References: <20260422023239.1171963-1-mrathor@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add data structs needed by the subsequent patch that introduces a new module to implement VFIO-MSHV pseudo device. Signed-off-by: Mukesh R --- drivers/hv/mshv_root.h | 19 +++++++++++++++++++ include/uapi/linux/mshv.h | 30 ++++++++++++++++++++++++++++++ 2 files changed, 49 insertions(+) diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h index a85c24dcc701..b9880d0bdc4d 100644 --- a/drivers/hv/mshv_root.h +++ b/drivers/hv/mshv_root.h @@ -227,6 +227,25 @@ struct port_table_info { }; }; =20 +struct mshv_device { + const struct mshv_device_ops *device_ops; + struct mshv_partition *device_pt; + void *device_private; + struct hlist_node device_ptnode; +}; + +struct mshv_device_ops { + const char *device_name; + long (*device_create)(struct mshv_device *dev); + void (*device_release)(struct mshv_device *dev); + long (*device_set_attr)(struct mshv_device *dev, + struct mshv_device_attr *attr); + long (*device_has_attr)(struct mshv_device *dev, + struct mshv_device_attr *attr); +}; + +extern struct mshv_device_ops mshv_vfio_device_ops; + int mshv_update_routing_table(struct mshv_partition *partition, const struct mshv_user_irq_entry *entries, unsigned int numents); diff --git a/include/uapi/linux/mshv.h b/include/uapi/linux/mshv.h index 32ff92b6342b..4373a8243951 100644 --- a/include/uapi/linux/mshv.h +++ b/include/uapi/linux/mshv.h @@ -404,4 +404,34 @@ struct mshv_sint_mask { /* hv_hvcall device */ #define MSHV_HVCALL_SETUP _IOW(MSHV_IOCTL, 0x1E, struct mshv_vtl_hv= call_setup) #define MSHV_HVCALL _IOWR(MSHV_IOCTL, 0x1F, struct mshv_vtl_h= vcall) + +/* device passhthru */ +#define MSHV_CREATE_DEVICE_TEST 1 + +enum { + MSHV_DEV_TYPE_VFIO, + MSHV_DEV_TYPE_MAX, +}; + +struct mshv_create_device { + __u32 type; /* in: MSHV_DEV_TYPE_xxx */ + __u32 fd; /* out: device handle */ + __u32 flags; /* in: MSHV_CREATE_DEVICE_xxx */ +}; + +#define MSHV_DEV_VFIO_FILE 1 +#define MSHV_DEV_VFIO_FILE_ADD 1 +#define MSHV_DEV_VFIO_FILE_DEL 2 + +struct mshv_device_attr { + __u32 flags; /* no flags currently defined */ + __u32 group; /* device-defined */ + __u64 attr; /* group-defined */ + __u64 addr; /* userspace address of attr data */ +}; + +/* Device fds created with MSHV_CREATE_DEVICE */ +#define MSHV_SET_DEVICE_ATTR _IOW(MSHV_IOCTL, 0x00, struct mshv_device_att= r) +#define MSHV_HAS_DEVICE_ATTR _IOW(MSHV_IOCTL, 0x01, struct mshv_device_att= r) + #endif --=20 2.51.2.vfs.0.1 From nobody Wed Jun 17 02:49:30 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 59BA7382F28; Wed, 22 Apr 2026 02:33:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825238; cv=none; b=CH3KS8gmXPPRB9m6XYJksBcEuggtP1YZ2G6ffjjyuPZPcPKCw3OFMUaGKK8EGJ5j2cFkjvRY/emPqGB21NOu+NQwXJ9o6osJJd0zNALps4m0t5in4isyKl2MSakb6OaddNQQFae0a9i9aDn2X0L8sLNvQTJ4C1+7zX3Y5Y8pqcg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825238; c=relaxed/simple; bh=jYrebEGyLACqbZL/w3y9a7O0xWgw6FHchfvbYBMKEM0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eRc/AUM1OY/zrRDaaTIkMHvLUV8piHXbpsJdlLiYBguWKJqxuG3TGFqsq1wxmq6GXDVY6FYCux3a7FXPS0fIrqo1YEfP/j/2Gj7QlIELF8mcooRiI3IzjVnhuPpUEyGzAtv5NEvfvoN7kJyJtUqFpGvaAoIkypaeJcVkVmQ90MU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=GgLzpdJZ; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="GgLzpdJZ" Received: from mrdev.corp.microsoft.com (192-184-212-33.fiber.dynamic.sonic.net [192.184.212.33]) by linux.microsoft.com (Postfix) with ESMTPSA id 4F8A320B6F1F; Tue, 21 Apr 2026 19:33:44 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 4F8A320B6F1F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776825225; bh=YnF+uI+QwqeYTK6oQpqBg7HAiFmqfGlkyQt6D84StzY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=GgLzpdJZ9nmTE/p4NZSlFoqA8wfROdvtb29VmC5gy4u/g+sAFKx15pHMNK0ZG8yfE ZZXvc8Tcv/AFHSLs7vfDcTEqxfvehzsr5Ba8763F99tj78RMpsTTOAbzlpVQ1i8sAI YdWhJVQ0mYd+YLMldvIq5xIVa5h4ubRqYaMgZAfk= From: Mukesh R To: hpa@zytor.com, robin.murphy@arm.com, robh@kernel.org, wei.liu@kernel.org, mrathor@linux.microsoft.com, mhklinux@outlook.com, muislam@microsoft.com, namjain@linux.microsoft.com, magnuskulke@linux.microsoft.com, anbelski@linux.microsoft.com, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, iommu@lists.linux.dev, linux-pci@vger.kernel.org, linux-arch@vger.kernel.org Cc: kys@microsoft.com, haiyangz@microsoft.com, decui@microsoft.com, longli@microsoft.com, tglx@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, joro@8bytes.org, will@kernel.org, lpieralisi@kernel.org, kwilczynski@kernel.org, bhelgaas@google.com, arnd@arndb.de Subject: [PATCH V1 06/13] mshv: Implement mshv bridge device for VFIO Date: Tue, 21 Apr 2026 19:32:32 -0700 Message-ID: <20260422023239.1171963-7-mrathor@linux.microsoft.com> X-Mailer: git-send-email 2.51.2.vfs.0.1 In-Reply-To: <20260422023239.1171963-1-mrathor@linux.microsoft.com> References: <20260422023239.1171963-1-mrathor@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a new file to implement VFIO-MSHV bridge pseudo device. These functions are called in the VFIO framework, and credits to kvm/vfio.c as this file was adapted from it. Co-developed-by: Wei Liu Signed-off-by: Wei Liu Signed-off-by: Mukesh R --- drivers/hv/Makefile | 3 +- drivers/hv/mshv_vfio.c | 211 ++++++++++++++++++++++++++++++++++++++ include/uapi/linux/mshv.h | 1 + 3 files changed, 214 insertions(+), 1 deletion(-) create mode 100644 drivers/hv/mshv_vfio.c diff --git a/drivers/hv/Makefile b/drivers/hv/Makefile index 888a748cc7cb..9ab6fc254c38 100644 --- a/drivers/hv/Makefile +++ b/drivers/hv/Makefile @@ -14,7 +14,8 @@ hv_vmbus-y :=3D vmbus_drv.o \ hv_vmbus-$(CONFIG_HYPERV_TESTING) +=3D hv_debugfs.o hv_utils-y :=3D hv_util.o hv_kvp.o hv_snapshot.o hv_utils_transport.o mshv_root-y :=3D mshv_root_main.o mshv_synic.o mshv_eventfd.o mshv_irq.o \ - mshv_root_hv_call.o mshv_portid_table.o mshv_regions.o + mshv_root_hv_call.o mshv_portid_table.o mshv_regions.o \ + mshv_vfio.o mshv_root-$(CONFIG_DEBUG_FS) +=3D mshv_debugfs.o mshv_root-$(CONFIG_TRACEPOINTS) +=3D mshv_trace.o mshv_vtl-y :=3D mshv_vtl_main.o diff --git a/drivers/hv/mshv_vfio.c b/drivers/hv/mshv_vfio.c new file mode 100644 index 000000000000..00a97920e25b --- /dev/null +++ b/drivers/hv/mshv_vfio.c @@ -0,0 +1,211 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * VFIO-MSHV bridge pseudo device + * + * Heavily inspired by the VFIO-KVM bridge pseudo device. + */ +#include +#include +#include +#include +#include +#include +#include +#include + +#include "mshv.h" +#include "mshv_root.h" + +struct mshv_vfio_file { + struct list_head node; + struct file *file; /* list of struct mshv_vfio_file */ +}; + +struct mshv_vfio { + struct list_head file_list; + struct mutex lock; +}; + +static bool mshv_vfio_file_is_valid(struct file *file) +{ + bool (*fn)(struct file *file); + bool ret; + + fn =3D symbol_get(vfio_file_is_valid); + if (!fn) + return false; + + ret =3D fn(file); + + symbol_put(vfio_file_is_valid); + + return ret; +} + +static long mshv_vfio_file_add(struct mshv_device *mshvdev, unsigned int f= d) +{ + struct mshv_vfio *mshv_vfio =3D mshvdev->device_private; + struct mshv_vfio_file *mvf; + struct file *filp; + long ret =3D 0; + + filp =3D fget(fd); + if (!filp) + return -EBADF; + + /* Ensure the FD is a vfio FD. */ + if (!mshv_vfio_file_is_valid(filp)) { + ret =3D -EINVAL; + goto out_fput; + } + + mutex_lock(&mshv_vfio->lock); + + list_for_each_entry(mvf, &mshv_vfio->file_list, node) { + if (mvf->file =3D=3D filp) { + ret =3D -EEXIST; + goto out_unlock; + } + } + + mvf =3D kzalloc(sizeof(*mvf), GFP_KERNEL_ACCOUNT); + if (!mvf) { + ret =3D -ENOMEM; + goto out_unlock; + } + + mvf->file =3D get_file(filp); + list_add_tail(&mvf->node, &mshv_vfio->file_list); + +out_unlock: + mutex_unlock(&mshv_vfio->lock); +out_fput: + fput(filp); + return ret; +} + +static long mshv_vfio_file_del(struct mshv_device *mshvdev, unsigned int f= d) +{ + struct mshv_vfio *mshv_vfio =3D mshvdev->device_private; + struct mshv_vfio_file *mvf; + long ret; + + CLASS(fd, f)(fd); + + if (fd_empty(f)) + return -EBADF; + + ret =3D -ENOENT; + mutex_lock(&mshv_vfio->lock); + + list_for_each_entry(mvf, &mshv_vfio->file_list, node) { + if (mvf->file !=3D fd_file(f)) + continue; + + list_del(&mvf->node); + fput(mvf->file); + kfree(mvf); + ret =3D 0; + break; + } + + mutex_unlock(&mshv_vfio->lock); + return ret; +} + +static long mshv_vfio_set_file(struct mshv_device *mshvdev, long attr, + void __user *arg) +{ + int32_t __user *argp =3D arg; + int32_t fd; + + switch (attr) { + case MSHV_DEV_VFIO_FILE_ADD: + if (get_user(fd, argp)) + return -EFAULT; + return mshv_vfio_file_add(mshvdev, fd); + + case MSHV_DEV_VFIO_FILE_DEL: + if (get_user(fd, argp)) + return -EFAULT; + return mshv_vfio_file_del(mshvdev, fd); + } + + return -ENXIO; +} + +static long mshv_vfio_set_attr(struct mshv_device *mshvdev, + struct mshv_device_attr *attr) +{ + switch (attr->group) { + case MSHV_DEV_VFIO_FILE: + return mshv_vfio_set_file(mshvdev, attr->attr, + u64_to_user_ptr(attr->addr)); + } + + return -ENXIO; +} + +static long mshv_vfio_has_attr(struct mshv_device *mshvdev, + struct mshv_device_attr *attr) +{ + switch (attr->group) { + case MSHV_DEV_VFIO_FILE: + switch (attr->attr) { + case MSHV_DEV_VFIO_FILE_ADD: + case MSHV_DEV_VFIO_FILE_DEL: + return 0; + } + + break; + } + + return -ENXIO; +} + +static long mshv_vfio_create_device(struct mshv_device *mshvdev) +{ + struct mshv_device *tmp; + struct mshv_vfio *mshv_vfio; + + /* Only one VFIO "device" per VM */ + hlist_for_each_entry(tmp, &mshvdev->device_pt->pt_devices, + device_ptnode) + if (tmp->device_ops =3D=3D &mshv_vfio_device_ops) + return -EBUSY; + + mshv_vfio =3D kzalloc(sizeof(*mshv_vfio), GFP_KERNEL_ACCOUNT); + if (mshv_vfio =3D=3D NULL) + return -ENOMEM; + + INIT_LIST_HEAD(&mshv_vfio->file_list); + mutex_init(&mshv_vfio->lock); + + mshvdev->device_private =3D mshv_vfio; + + return 0; +} + +/* This is called from mshv_device_fop_release() */ +static void mshv_vfio_release_device(struct mshv_device *mshvdev) +{ + struct mshv_vfio *mv =3D mshvdev->device_private; + struct mshv_vfio_file *mvf, *tmp; + + list_for_each_entry_safe(mvf, tmp, &mv->file_list, node) { + fput(mvf->file); + list_del(&mvf->node); + kfree(mvf); + } + + kfree(mv); + kfree(mshvdev); +} + +struct mshv_device_ops mshv_vfio_device_ops =3D { + .device_name =3D "mshv-vfio", + .device_create =3D mshv_vfio_create_device, + .device_release =3D mshv_vfio_release_device, + .device_set_attr =3D mshv_vfio_set_attr, + .device_has_attr =3D mshv_vfio_has_attr, +}; diff --git a/include/uapi/linux/mshv.h b/include/uapi/linux/mshv.h index 4373a8243951..6404e8a98237 100644 --- a/include/uapi/linux/mshv.h +++ b/include/uapi/linux/mshv.h @@ -254,6 +254,7 @@ struct mshv_root_hvcall { #define MSHV_GET_GPAP_ACCESS_BITMAP _IOWR(MSHV_IOCTL, 0x06, struct mshv_gp= ap_access_bitmap) /* Generic hypercall */ #define MSHV_ROOT_HVCALL _IOWR(MSHV_IOCTL, 0x07, struct mshv_root_hvcall) +#define MSHV_CREATE_DEVICE _IOWR(MSHV_IOCTL, 0x08, struct mshv= _create_device) =20 /* ******************************** --=20 2.51.2.vfs.0.1 From nobody Wed Jun 17 02:49:30 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id BD1053822AA; Wed, 22 Apr 2026 02:33:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825241; cv=none; b=l1JwkynXCuM+du2XKxDcIhz1QOFvgh4DJfBasm+GcYpryATm419TpnfshZJgcwdgErvYQyS6iLUPO23t5hznyPrwA60RqwvA6iy3MhlaR1gIjgjlY0hXNvewtWp4QNj9UCqg2NW9kePBmGWuFUdxh30vg5wUWLkspY86JrG59O4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825241; c=relaxed/simple; bh=C8TsFVJy+pAFOBhAef081HD3HUSmaQJnLst8p/bA8vk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gO9M8NdemE4bh2vq280HWycHd5xjyErF0mgzi8B2ZLvmpKgkWRsqZBfedCtAoWD1MzoiKsTjq/iNRbDX1RwSmcrP3LfNDZgjJrI7wQfuXHRgwjubLTOGk7KpTPGcfOYMQByAafKjjXap7FUTX5O2XJeA7u3fA73Ihz6EPp/oGD0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=Vr2BHVut; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="Vr2BHVut" Received: from mrdev.corp.microsoft.com (192-184-212-33.fiber.dynamic.sonic.net [192.184.212.33]) by linux.microsoft.com (Postfix) with ESMTPSA id B043720B6F20; Tue, 21 Apr 2026 19:33:45 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com B043720B6F20 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776825226; bh=torw1rFNIs6S4jLC1p2Rij05fcbQgV06MR8D7ovym88=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Vr2BHVutj1SgonSfVzqIBGit/CYCrmt3i9EL9E7s+WEPffaoxJUZmaOLtA8+8mV/B cnEX+hEifNEQiJZ7pjhLzeAUZ0zf05/SqosiNGnUJfE+/b5B0XthWnEBi9S35jsAHm VD3COLKj2+sN5CiLmOzyPdgQQCROIoyojZE+N5+w= From: Mukesh R To: hpa@zytor.com, robin.murphy@arm.com, robh@kernel.org, wei.liu@kernel.org, mrathor@linux.microsoft.com, mhklinux@outlook.com, muislam@microsoft.com, namjain@linux.microsoft.com, magnuskulke@linux.microsoft.com, anbelski@linux.microsoft.com, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, iommu@lists.linux.dev, linux-pci@vger.kernel.org, linux-arch@vger.kernel.org Cc: kys@microsoft.com, haiyangz@microsoft.com, decui@microsoft.com, longli@microsoft.com, tglx@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, joro@8bytes.org, will@kernel.org, lpieralisi@kernel.org, kwilczynski@kernel.org, bhelgaas@google.com, arnd@arndb.de Subject: [PATCH V1 07/13] mshv: Add ioctl support for MSHV-VFIO bridge device Date: Tue, 21 Apr 2026 19:32:33 -0700 Message-ID: <20260422023239.1171963-8-mrathor@linux.microsoft.com> X-Mailer: git-send-email 2.51.2.vfs.0.1 In-Reply-To: <20260422023239.1171963-1-mrathor@linux.microsoft.com> References: <20260422023239.1171963-1-mrathor@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add ioctl support for creating MSHV devices for a partition. At present only VFIO device types are supported, but more could be added. At a high level, a partition ioctl to create device verifies it is of type VFIO and does some setup for bridge code in mshv_vfio.c. Adapted from KVM device ioctls. Co-developed-by: Wei Liu Signed-off-by: Wei Liu Signed-off-by: Mukesh R --- drivers/hv/mshv_root_main.c | 116 ++++++++++++++++++++++++++++++++++++ 1 file changed, 116 insertions(+) diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c index 02c107458be9..6ceb5f608589 100644 --- a/drivers/hv/mshv_root_main.c +++ b/drivers/hv/mshv_root_main.c @@ -1625,6 +1625,119 @@ mshv_partition_ioctl_initialize(struct mshv_partiti= on *partition) return ret; } =20 +static long mshv_device_attr_ioctl(struct mshv_device *mshv_dev, int cmd, + ulong uarg) +{ + struct mshv_device_attr attr; + const struct mshv_device_ops *devops =3D mshv_dev->device_ops; + + if (copy_from_user(&attr, (void __user *)uarg, sizeof(attr))) + return -EFAULT; + + switch (cmd) { + case MSHV_SET_DEVICE_ATTR: + if (devops->device_set_attr) + return devops->device_set_attr(mshv_dev, &attr); + break; + case MSHV_HAS_DEVICE_ATTR: + if (devops->device_has_attr) + return devops->device_has_attr(mshv_dev, &attr); + break; + } + + return -EPERM; +} + +static long mshv_device_fop_ioctl(struct file *filp, unsigned int cmd, + ulong uarg) +{ + struct mshv_device *mshv_dev =3D filp->private_data; + + switch (cmd) { + case MSHV_SET_DEVICE_ATTR: + case MSHV_HAS_DEVICE_ATTR: + return mshv_device_attr_ioctl(mshv_dev, cmd, uarg); + } + + return -ENOTTY; +} + +static int mshv_device_fop_release(struct inode *inode, struct file *filp) +{ + struct mshv_device *mshv_dev =3D filp->private_data; + struct mshv_partition *partition =3D mshv_dev->device_pt; + + if (mshv_dev->device_ops->device_release) { + mutex_lock(&partition->pt_mutex); + hlist_del(&mshv_dev->device_ptnode); + mshv_dev->device_ops->device_release(mshv_dev); + mutex_unlock(&partition->pt_mutex); + } + + mshv_partition_put(partition); + return 0; +} + +static const struct file_operations mshv_device_fops =3D { + .owner =3D THIS_MODULE, + .unlocked_ioctl =3D mshv_device_fop_ioctl, + .release =3D mshv_device_fop_release, +}; + +static long mshv_partition_ioctl_create_device(struct mshv_partition *part= ition, + void __user *uarg) +{ + long rc; + struct mshv_create_device devargk; + struct mshv_device *mshv_dev; + const struct mshv_device_ops *vfio_ops; + + if (copy_from_user(&devargk, uarg, sizeof(devargk))) + return -EFAULT; + + /* At present, only VFIO is supported */ + if (devargk.type !=3D MSHV_DEV_TYPE_VFIO) + return -ENODEV; + + if (devargk.flags & MSHV_CREATE_DEVICE_TEST) + return 0; + + /* This is freed later by mshv_vfio_release_device() */ + mshv_dev =3D kzalloc(sizeof(*mshv_dev), GFP_KERNEL_ACCOUNT); + if (mshv_dev =3D=3D NULL) + return -ENOMEM; + + vfio_ops =3D &mshv_vfio_device_ops; + mshv_dev->device_ops =3D vfio_ops; + mshv_dev->device_pt =3D partition; + + rc =3D vfio_ops->device_create(mshv_dev); + if (rc < 0) { + kfree(mshv_dev); + return rc; + } + + hlist_add_head(&mshv_dev->device_ptnode, &partition->pt_devices); + + mshv_partition_get(partition); + rc =3D anon_inode_getfd(vfio_ops->device_name, &mshv_device_fops, + mshv_dev, O_RDWR | O_CLOEXEC); + if (rc < 0) + goto undo_out; + + devargk.fd =3D rc; + if (copy_to_user(uarg, &devargk, sizeof(devargk))) + return -EFAULT; /* cleanup in mshv_device_fop_release() */ + + return 0; + +undo_out: + hlist_del(&mshv_dev->device_ptnode); + vfio_ops->device_release(mshv_dev); /* will kfree(mshv_dev) */ + mshv_partition_put(partition); + return rc; +} + static long mshv_partition_ioctl(struct file *filp, unsigned int ioctl, unsigned long = arg) { @@ -1661,6 +1774,9 @@ mshv_partition_ioctl(struct file *filp, unsigned int = ioctl, unsigned long arg) case MSHV_ROOT_HVCALL: ret =3D mshv_ioctl_passthru_hvcall(partition, true, uarg); break; + case MSHV_CREATE_DEVICE: + ret =3D mshv_partition_ioctl_create_device(partition, uarg); + break; default: ret =3D -ENOTTY; } --=20 2.51.2.vfs.0.1 From nobody Wed Jun 17 02:49:30 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7F6F438737F; Wed, 22 Apr 2026 02:33:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825244; cv=none; b=bdlkaatiBgs1X0SuKsjs6nJO4S3d5Ke7ky8iWs/PCR7y/im9AktI7ATXgKEPdgniGvgC07v0uXupIbzmdBSiR59TuPS0tFphwmcoM7x+xUpriw9QlW2z+MYbYl7dTcOaHHLdFvN3+p6EuZhBCXSuOFbIGw0toucFi3pJUDHsfEM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825244; c=relaxed/simple; bh=3p+UYzYAVK6Oc1T3Ui1NuwCE4yM8MDdRlrXl61kkDDE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OrNBcvrSWcmI5ejHh0qN+42OOBIdPjWI1BVOT5HvQR+5VS8I7dofSSW+5x9sdKB+UuLefOqeoGOOgIbxJxWZQGAu5GpAGIkElfBoMXaUME3AIRqAoMJ71eEBQjeRH2AvFDffjaMcacbdSR/RbcWsXEisVfxl3DMXANGR7SPqnT0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=Pc0TBr6d; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="Pc0TBr6d" Received: from mrdev.corp.microsoft.com (192-184-212-33.fiber.dynamic.sonic.net [192.184.212.33]) by linux.microsoft.com (Postfix) with ESMTPSA id 7D62C20B6F01; Tue, 21 Apr 2026 19:33:47 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 7D62C20B6F01 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776825228; bh=o6ztznKN0i+oDxkVKgM3YRDnCNaOjwDWRVoXxAfAsww=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Pc0TBr6dRCkXuecEpzOQW9DZK1BlT8DoPhItNDPwLPiilJpMdJ46A+h4qz0zVtPMs m7eaU3/0fYsOx5Fo+aJzdGgMUH4F5VTXRLhFPExrpYpbyDUWJi3Wxr8+M1gkoHtj6O /TaNasCwXiWRoz8aGA8N8ODVTgf6l41putV/KTAA= From: Mukesh R To: hpa@zytor.com, robin.murphy@arm.com, robh@kernel.org, wei.liu@kernel.org, mrathor@linux.microsoft.com, mhklinux@outlook.com, muislam@microsoft.com, namjain@linux.microsoft.com, magnuskulke@linux.microsoft.com, anbelski@linux.microsoft.com, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, iommu@lists.linux.dev, linux-pci@vger.kernel.org, linux-arch@vger.kernel.org Cc: kys@microsoft.com, haiyangz@microsoft.com, decui@microsoft.com, longli@microsoft.com, tglx@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, joro@8bytes.org, will@kernel.org, lpieralisi@kernel.org, kwilczynski@kernel.org, bhelgaas@google.com, arnd@arndb.de Subject: [PATCH V1 08/13] PCI: hv: rename hv_compose_msi_msg to hv_vmbus_compose_msi_msg Date: Tue, 21 Apr 2026 19:32:34 -0700 Message-ID: <20260422023239.1171963-9-mrathor@linux.microsoft.com> X-Mailer: git-send-email 2.51.2.vfs.0.1 In-Reply-To: <20260422023239.1171963-1-mrathor@linux.microsoft.com> References: <20260422023239.1171963-1-mrathor@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Main change here is to rename hv_compose_msi_msg to hv_vmbus_compose_msi_msg as we introduce hv_compose_msi_msg in upcoming patches that builds MSI messages for both VMBus and non-VMBus cases. VMBus is not used on baremetal root partition for example. While at it, replace spaces with tabs and fix some formatting involving excessive line wraps. There is no functional change. Signed-off-by: Mukesh R --- drivers/pci/controller/pci-hyperv.c | 95 +++++++++++++++-------------- 1 file changed, 48 insertions(+), 47 deletions(-) diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/p= ci-hyperv.c index cfc8fa403dad..ed6b399afc80 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -30,7 +30,7 @@ * function's configuration space is zero. * * The rest of this driver mostly maps PCI concepts onto underlying Hyper-V - * facilities. For instance, the configuration space of a function exposed + * facilities. For instance, the configuration space of a function exposed * by Hyper-V is mapped into a single page of memory space, and the * read and write handlers for config space must be aware of this mechanis= m. * Similarly, device setup and teardown involves messages sent to and from @@ -109,33 +109,33 @@ enum pci_message_type { /* * Version 1.1 */ - PCI_MESSAGE_BASE =3D 0x42490000, - PCI_BUS_RELATIONS =3D PCI_MESSAGE_BASE + 0, - PCI_QUERY_BUS_RELATIONS =3D PCI_MESSAGE_BASE + 1, - PCI_POWER_STATE_CHANGE =3D PCI_MESSAGE_BASE + 4, + PCI_MESSAGE_BASE =3D 0x42490000, + PCI_BUS_RELATIONS =3D PCI_MESSAGE_BASE + 0, + PCI_QUERY_BUS_RELATIONS =3D PCI_MESSAGE_BASE + 1, + PCI_POWER_STATE_CHANGE =3D PCI_MESSAGE_BASE + 4, PCI_QUERY_RESOURCE_REQUIREMENTS =3D PCI_MESSAGE_BASE + 5, - PCI_QUERY_RESOURCE_RESOURCES =3D PCI_MESSAGE_BASE + 6, - PCI_BUS_D0ENTRY =3D PCI_MESSAGE_BASE + 7, - PCI_BUS_D0EXIT =3D PCI_MESSAGE_BASE + 8, - PCI_READ_BLOCK =3D PCI_MESSAGE_BASE + 9, - PCI_WRITE_BLOCK =3D PCI_MESSAGE_BASE + 0xA, - PCI_EJECT =3D PCI_MESSAGE_BASE + 0xB, - PCI_QUERY_STOP =3D PCI_MESSAGE_BASE + 0xC, - PCI_REENABLE =3D PCI_MESSAGE_BASE + 0xD, - PCI_QUERY_STOP_FAILED =3D PCI_MESSAGE_BASE + 0xE, - PCI_EJECTION_COMPLETE =3D PCI_MESSAGE_BASE + 0xF, - PCI_RESOURCES_ASSIGNED =3D PCI_MESSAGE_BASE + 0x10, - PCI_RESOURCES_RELEASED =3D PCI_MESSAGE_BASE + 0x11, - PCI_INVALIDATE_BLOCK =3D PCI_MESSAGE_BASE + 0x12, - PCI_QUERY_PROTOCOL_VERSION =3D PCI_MESSAGE_BASE + 0x13, - PCI_CREATE_INTERRUPT_MESSAGE =3D PCI_MESSAGE_BASE + 0x14, - PCI_DELETE_INTERRUPT_MESSAGE =3D PCI_MESSAGE_BASE + 0x15, + PCI_QUERY_RESOURCE_RESOURCES =3D PCI_MESSAGE_BASE + 6, + PCI_BUS_D0ENTRY =3D PCI_MESSAGE_BASE + 7, + PCI_BUS_D0EXIT =3D PCI_MESSAGE_BASE + 8, + PCI_READ_BLOCK =3D PCI_MESSAGE_BASE + 9, + PCI_WRITE_BLOCK =3D PCI_MESSAGE_BASE + 0xA, + PCI_EJECT =3D PCI_MESSAGE_BASE + 0xB, + PCI_QUERY_STOP =3D PCI_MESSAGE_BASE + 0xC, + PCI_REENABLE =3D PCI_MESSAGE_BASE + 0xD, + PCI_QUERY_STOP_FAILED =3D PCI_MESSAGE_BASE + 0xE, + PCI_EJECTION_COMPLETE =3D PCI_MESSAGE_BASE + 0xF, + PCI_RESOURCES_ASSIGNED =3D PCI_MESSAGE_BASE + 0x10, + PCI_RESOURCES_RELEASED =3D PCI_MESSAGE_BASE + 0x11, + PCI_INVALIDATE_BLOCK =3D PCI_MESSAGE_BASE + 0x12, + PCI_QUERY_PROTOCOL_VERSION =3D PCI_MESSAGE_BASE + 0x13, + PCI_CREATE_INTERRUPT_MESSAGE =3D PCI_MESSAGE_BASE + 0x14, + PCI_DELETE_INTERRUPT_MESSAGE =3D PCI_MESSAGE_BASE + 0x15, PCI_RESOURCES_ASSIGNED2 =3D PCI_MESSAGE_BASE + 0x16, PCI_CREATE_INTERRUPT_MESSAGE2 =3D PCI_MESSAGE_BASE + 0x17, PCI_DELETE_INTERRUPT_MESSAGE2 =3D PCI_MESSAGE_BASE + 0x18, /* unused */ PCI_BUS_RELATIONS2 =3D PCI_MESSAGE_BASE + 0x19, - PCI_RESOURCES_ASSIGNED3 =3D PCI_MESSAGE_BASE + 0x1A, - PCI_CREATE_INTERRUPT_MESSAGE3 =3D PCI_MESSAGE_BASE + 0x1B, + PCI_RESOURCES_ASSIGNED3 =3D PCI_MESSAGE_BASE + 0x1A, + PCI_CREATE_INTERRUPT_MESSAGE3 =3D PCI_MESSAGE_BASE + 0x1B, PCI_MESSAGE_MAXIMUM }; =20 @@ -1774,20 +1774,21 @@ static u32 hv_compose_msi_req_v1( * via the HVCALL_RETARGET_INTERRUPT hypercall. But the choice of dummy vC= PU is * not irrelevant because Hyper-V chooses the physical CPU to handle the * interrupts based on the vCPU specified in message sent to the vPCI VSP = in - * hv_compose_msi_msg(). Hyper-V's choice of pCPU is not visible to the gu= est, - * but assigning too many vPCI device interrupts to the same pCPU can caus= e a - * performance bottleneck. So we spread out the dummy vCPUs to influence H= yper-V - * to spread out the pCPUs that it selects. + * hv_vmbus_compose_msi_msg(). Hyper-V's choice of pCPU is not visible to = the + * guest, but assigning too many vPCI device interrupts to the same pCPU c= an + * cause a performance bottleneck. So we spread out the dummy vCPUs to inf= luence + * Hyper-V to spread out the pCPUs that it selects. * * For the single-MSI and MSI-X cases, it's OK for hv_compose_msi_req_get_= cpu() * to always return the same dummy vCPU, because a second call to - * hv_compose_msi_msg() contains the "real" vCPU, causing Hyper-V to choos= e a - * new pCPU for the interrupt. But for the multi-MSI case, the second call= to - * hv_compose_msi_msg() exits without sending a message to the vPCI VSP, s= o the - * original dummy vCPU is used. This dummy vCPU must be round-robin'ed so = that - * the pCPUs are spread out. All interrupts for a multi-MSI device end up = using - * the same pCPU, even though the vCPUs will be spread out by later calls - * to hv_irq_unmask(), but that is the best we can do now. + * hv_vmbus_compose_msi_msg() contains the "real" vCPU, causing Hyper-V to + * choose a new pCPU for the interrupt. But for the multi-MSI case, the se= cond + * call to hv_vmbus_compose_msi_msg() exits without sending a message to t= he + * vPCI VSP, so the original dummy vCPU is used. This dummy vCPU must be + * round-robin'ed so that the pCPUs are spread out. All interrupts for a + * multi-MSI device end up using the same pCPU, even though the vCPUs will= be + * spread out by later calls to hv_irq_unmask(), but that is the best we c= an do + * now. * * With Hyper-V in Nov 2022, the HVCALL_RETARGET_INTERRUPT hypercall does = *not* * cause Hyper-V to reselect the pCPU based on the specified vCPU. Such an @@ -1862,7 +1863,7 @@ static u32 hv_compose_msi_req_v3( } =20 /** - * hv_compose_msi_msg() - Supplies a valid MSI address/data + * hv_vmbus_compose_msi_msg() - Supplies a valid MSI address/data * @data: Everything about this MSI * @msg: Buffer that is filled in by this function * @@ -1872,7 +1873,7 @@ static u32 hv_compose_msi_req_v3( * response supplies a data value and address to which that data * should be written to trigger that interrupt. */ -static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) +static void hv_vmbus_compose_msi_msg(struct irq_data *data, struct msi_msg= *msg) { struct hv_pcibus_device *hbus; struct vmbus_channel *channel; @@ -1954,7 +1955,7 @@ static void hv_compose_msi_msg(struct irq_data *data,= struct msi_msg *msg) return; } /* - * The vector we select here is a dummy value. The correct + * The vector we select here is a dummy value. The correct * value gets sent to the hypervisor in unmask(). This needs * to be aligned with the count, and also not zero. Multi-msi * is powers of 2 up to 32, so 32 will always work here. @@ -2046,7 +2047,7 @@ static void hv_compose_msi_msg(struct irq_data *data,= struct msi_msg *msg) =20 /* * Make sure that the ring buffer data structure doesn't get - * freed while we dereference the ring buffer pointer. Test + * freed while we dereference the ring buffer pointer. Test * for the channel's onchannel_callback being NULL within a * sched_lock critical section. See also the inline comments * in vmbus_reset_channel_cb(). @@ -2146,7 +2147,7 @@ static const struct msi_parent_ops hv_pcie_msi_parent= _ops =3D { /* HW Interrupt Chip Descriptor */ static struct irq_chip hv_msi_irq_chip =3D { .name =3D "Hyper-V PCIe MSI", - .irq_compose_msi_msg =3D hv_compose_msi_msg, + .irq_compose_msi_msg =3D hv_vmbus_compose_msi_msg, .irq_set_affinity =3D irq_chip_set_affinity_parent, .irq_ack =3D irq_chip_ack_parent, .irq_eoi =3D irq_chip_eoi_parent, @@ -2158,8 +2159,8 @@ static int hv_pcie_domain_alloc(struct irq_domain *d,= unsigned int virq, unsigne void *arg) { /* - * TODO: Allocating and populating struct tran_int_desc in hv_compose_msi= _msg() - * should be moved here. + * TODO: Allocating and populating struct tran_int_desc in + * hv_vmbus_compose_msi_msg() should be moved here. */ int ret; =20 @@ -2226,7 +2227,7 @@ static int hv_pcie_init_irq_domain(struct hv_pcibus_d= evice *hbus) /** * get_bar_size() - Get the address space consumed by a BAR * @bar_val: Value that a BAR returned after -1 was written - * to it. + * to it. * * This function returns the size of the BAR, rounded up to 1 * page. It has to be rounded up because the hypervisor's page @@ -2580,7 +2581,7 @@ static void q_resource_requirements(void *context, st= ruct pci_response *resp, * new_pcichild_device() - Create a new child device * @hbus: The internal struct tracking this root PCI bus. * @desc: The information supplied so far from the host - * about the device. + * about the device. * * This function creates the tracking structure for a new child * device and kicks off the process of figuring out what it is. @@ -3105,7 +3106,7 @@ static void hv_pci_onchannelcallback(void *context) * sure that the packet pointer is still valid during the call: * here 'valid' means that there's a task still waiting for the * completion, and that the packet data is still on the waiting - * task's stack. Cf. hv_compose_msi_msg(). + * task's stack. Cf. hv_vmbus_compose_msi_msg(). */ comp_packet->completion_func(comp_packet->compl_ctxt, response, @@ -3422,7 +3423,7 @@ static int hv_allocate_config_window(struct hv_pcibus= _device *hbus) * vmbus_allocate_mmio() gets used for allocating both device endpoint * resource claims (those which cannot be overlapped) and the ranges * which are valid for the children of this bus, which are intended - * to be overlapped by those children. Set the flag on this claim + * to be overlapped by those children. Set the flag on this claim * meaning that this region can't be overlapped. */ =20 @@ -4069,7 +4070,7 @@ static int hv_pci_restore_msi_msg(struct pci_dev *pde= v, void *arg) irq_data =3D irq_get_irq_data(entry->irq); if (WARN_ON_ONCE(!irq_data)) return -EINVAL; - hv_compose_msi_msg(irq_data, &entry->msg); + hv_vmbus_compose_msi_msg(irq_data, &entry->msg); } return 0; } @@ -4077,7 +4078,7 @@ static int hv_pci_restore_msi_msg(struct pci_dev *pde= v, void *arg) /* * Upon resume, pci_restore_msi_state() -> ... -> __pci_write_msi_msg() * directly writes the MSI/MSI-X registers via MMIO, but since Hyper-V - * doesn't trap and emulate the MMIO accesses, here hv_compose_msi_msg() + * doesn't trap and emulate the MMIO accesses, here hv_vmbus_compose_msi_m= sg() * must be used to ask Hyper-V to re-create the IOMMU Interrupt Remapping * Table entries. */ --=20 2.51.2.vfs.0.1 From nobody Wed Jun 17 02:49:30 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 17E2437F75B; Wed, 22 Apr 2026 02:33:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825243; cv=none; b=Zeu/kspUik3w7j9TidQSQPvibPFQ2ieXAAHPIRm46KiCuRpUTGgprwrZBH8vZ5sDHTFhLdaGRpvW+gvvsYnR5zrHZR2OjNPbb+ywcsp0Y1ebxOslTCLktX6KKC2Y+D87h3LnZwWNcudn7YSkd7spY2U+kFQcTVnzrXwqMqfhnNY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825243; c=relaxed/simple; bh=i0VTBJ+JnPrBDXBmboabc0DGz729xEaksDvoWqta6X8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IeEONEXlbnW4DUH5Ik1BKqxENGx5BcEeIXf9HpQmZmKExkb7I5uWMn5QcB3+LDP9JUZE0QYcCCWm5izcNKULlZ4lUQ579DUNZcRfwb/KkhES7VSeeFAi0OhHtDDKNyyTWp7lUF1qugb4tpo5i7i09SuMEIXIzt496+X2udcUbSA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=KSXqriht; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="KSXqriht" Received: from mrdev.corp.microsoft.com (192-184-212-33.fiber.dynamic.sonic.net [192.184.212.33]) by linux.microsoft.com (Postfix) with ESMTPSA id ED9A020B6F21; Tue, 21 Apr 2026 19:33:48 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com ED9A020B6F21 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776825229; bh=0bjgMj0k5nPoMwets23GFJzSwvZMJJftuqSY4+V25iQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=KSXqrihtAgVG9XWdFdwCwru9n3hQ4behZV7iMeZ+4xukeURssOqspE+157qhVzqtE FcWf0Kqa9z1IMmlXd0RXAk5UDgU6m+Krr21XJYtyiHUD3hPw94cm3d4ANawOaG/guA k8Sc0Ahq2V7Pyqtx95aCFFwz+95MqfwUe9gjoEwc= From: Mukesh R To: hpa@zytor.com, robin.murphy@arm.com, robh@kernel.org, wei.liu@kernel.org, mrathor@linux.microsoft.com, mhklinux@outlook.com, muislam@microsoft.com, namjain@linux.microsoft.com, magnuskulke@linux.microsoft.com, anbelski@linux.microsoft.com, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, iommu@lists.linux.dev, linux-pci@vger.kernel.org, linux-arch@vger.kernel.org Cc: kys@microsoft.com, haiyangz@microsoft.com, decui@microsoft.com, longli@microsoft.com, tglx@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, joro@8bytes.org, will@kernel.org, lpieralisi@kernel.org, kwilczynski@kernel.org, bhelgaas@google.com, arnd@arndb.de Subject: [PATCH V1 09/13] mshv: Import data structs around device passthru from hyperv headers Date: Tue, 21 Apr 2026 19:32:35 -0700 Message-ID: <20260422023239.1171963-10-mrathor@linux.microsoft.com> X-Mailer: git-send-email 2.51.2.vfs.0.1 In-Reply-To: <20260422023239.1171963-1-mrathor@linux.microsoft.com> References: <20260422023239.1171963-1-mrathor@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Copy/import from Hyper-V public headers, definitions and declarations that are related to attaching and detaching of device domains, and building device ids for those purposes. Signed-off-by: Mukesh R --- include/hyperv/hvgdk_mini.h | 11 ++++ include/hyperv/hvhdk_mini.h | 112 ++++++++++++++++++++++++++++++++++++ 2 files changed, 123 insertions(+) diff --git a/include/hyperv/hvgdk_mini.h b/include/hyperv/hvgdk_mini.h index 6a4e8b9d570f..da622fb06440 100644 --- a/include/hyperv/hvgdk_mini.h +++ b/include/hyperv/hvgdk_mini.h @@ -326,6 +326,9 @@ union hv_hypervisor_version_info { /* stimer Direct Mode is available */ #define HV_STIMER_DIRECT_MODE_AVAILABLE BIT(19) =20 +#define HV_DEVICE_DOMAIN_AVAILABLE BIT(24) +#define HV_S1_DEVICE_DOMAIN_AVAILABLE BIT(25) + /* * Implementation recommendations. Indicates which behaviors the hypervisor * recommends the OS implement for optimal performance. @@ -475,6 +478,8 @@ union hv_vp_assist_msr_contents { /* HV_REGISTER_VP_AS= SIST_PAGE */ #define HVCALL_MAP_DEVICE_INTERRUPT 0x007c #define HVCALL_UNMAP_DEVICE_INTERRUPT 0x007d #define HVCALL_RETARGET_INTERRUPT 0x007e +#define HVCALL_ATTACH_DEVICE 0x0082 +#define HVCALL_DETACH_DEVICE 0x0083 #define HVCALL_NOTIFY_PARTITION_EVENT 0x0087 #define HVCALL_ENTER_SLEEP_STATE 0x0084 #define HVCALL_NOTIFY_PORT_RING_EMPTY 0x008b @@ -486,9 +491,15 @@ union hv_vp_assist_msr_contents { /* HV_REGISTER_VP_A= SSIST_PAGE */ #define HVCALL_GET_VP_INDEX_FROM_APIC_ID 0x009a #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0 +#define HVCALL_CREATE_DEVICE_DOMAIN 0x00b1 +#define HVCALL_ATTACH_DEVICE_DOMAIN 0x00b2 +#define HVCALL_MAP_DEVICE_GPA_PAGES 0x00b3 +#define HVCALL_UNMAP_DEVICE_GPA_PAGES 0x00b4 #define HVCALL_SIGNAL_EVENT_DIRECT 0x00c0 #define HVCALL_POST_MESSAGE_DIRECT 0x00c1 #define HVCALL_DISPATCH_VP 0x00c2 +#define HVCALL_DETACH_DEVICE_DOMAIN 0x00c4 +#define HVCALL_DELETE_DEVICE_DOMAIN 0x00c5 #define HVCALL_GET_GPA_PAGES_ACCESS_STATES 0x00c9 #define HVCALL_ACQUIRE_SPARSE_SPA_PAGE_HOST_ACCESS 0x00d7 #define HVCALL_RELEASE_SPARSE_SPA_PAGE_HOST_ACCESS 0x00d8 diff --git a/include/hyperv/hvhdk_mini.h b/include/hyperv/hvhdk_mini.h index b4cb2fa26e9b..60425052a799 100644 --- a/include/hyperv/hvhdk_mini.h +++ b/include/hyperv/hvhdk_mini.h @@ -468,6 +468,32 @@ struct hv_send_ipi_ex { /* HV_INPUT_SEND_SYNTHETIC_CLU= STER_IPI_EX */ struct hv_vpset vp_set; } __packed; =20 +union hv_attdev_flags { /* HV_ATTACH_DEVICE_FLAGS */ + struct { + u32 logical_id : 1; + u32 resvd0 : 1; + u32 ats_enabled : 1; + u32 virt_func : 1; + u32 shared_irq_child : 1; + u32 virt_dev : 1; + u32 ats_supported : 1; + u32 small_irt : 1; + u32 resvd : 24; + } __packed; + u32 as_uint32; +}; + +union hv_dev_pci_caps { /* HV_DEVICE_PCI_CAPABILITIES */ + struct { + u32 max_pasid_width : 5; + u32 invalidate_qdepth : 5; + u32 global_inval : 1; + u32 prg_response_req : 1; + u32 resvd : 20; + } __packed; + u32 as_uint32; +}; + typedef u16 hv_pci_rid; /* HV_PCI_RID */ typedef u16 hv_pci_segment; /* HV_PCI_SEGMENT */ typedef u64 hv_logical_device_id; @@ -547,4 +573,90 @@ union hv_device_id { /* HV_DEVICE_ID */ } acpi; } __packed; =20 +struct hv_input_attach_device { /* HV_INPUT_ATTACH_DEVICE */ + u64 partition_id; + union hv_device_id device_id; + union hv_attdev_flags attdev_flags; + u8 attdev_vtl; + u8 rsvd0; + u16 rsvd1; + u64 logical_devid; + union hv_dev_pci_caps dev_pcicaps; + u16 pf_pci_rid; + u16 resvd2; +} __packed; + +struct hv_input_detach_device { /* HV_INPUT_DETACH_DEVICE */ + u64 partition_id; + u64 logical_devid; +} __packed; + + +/* 3 domain types: stage 1, stage 2, and SOC */ +#define HV_DEVICE_DOMAIN_TYPE_S2 0 /* HV_DEVICE_DOMAIN_ID_TYPE_S2 */ +#define HV_DEVICE_DOMAIN_TYPE_S1 1 /* HV_DEVICE_DOMAIN_ID_TYPE_S1 */ +#define HV_DEVICE_DOMAIN_TYPE_SOC 2 /* HV_DEVICE_DOMAIN_ID_TYPE_SOC */ + +/* ID for stage 2 default domain and NULL domain */ +#define HV_DEVICE_DOMAIN_ID_S2_DEFAULT 0 +#define HV_DEVICE_DOMAIN_ID_S2_NULL 0xFFFFFFFFULL + +union hv_device_domain_id { + u64 as_uint64; + struct { + u32 type : 4; + u32 reserved : 28; + u32 id; + }; +} __packed; + +struct hv_input_device_domain { /* HV_INPUT_DEVICE_DOMAIN */ + u64 partition_id; + union hv_input_vtl owner_vtl; + u8 padding[7]; + union hv_device_domain_id domain_id; +} __packed; + +union hv_create_device_domain_flags { /* HV_CREATE_DEVICE_DOMAIN_FLAGS */ + u32 as_uint32; + struct { + u32 forward_progress_required : 1; + u32 inherit_owning_vtl : 1; + u32 reserved : 30; + } __packed; +} __packed; + +struct hv_input_create_device_domain { /* HV_INPUT_CREATE_DEVICE_DOMAIN */ + struct hv_input_device_domain device_domain; + union hv_create_device_domain_flags create_device_domain_flags; +} __packed; + +struct hv_input_delete_device_domain { /* HV_INPUT_DELETE_DEVICE_DOMAIN */ + struct hv_input_device_domain device_domain; +} __packed; + +struct hv_input_attach_device_domain { /* HV_INPUT_ATTACH_DEVICE_DOMAIN */ + struct hv_input_device_domain device_domain; + union hv_device_id device_id; +} __packed; + +struct hv_input_detach_device_domain { /* HV_INPUT_DETACH_DEVICE_DOMAIN */ + u64 partition_id; + union hv_device_id device_id; +} __packed; + +struct hv_input_map_device_gpa_pages { /* HV_INPUT_MAP_DEVICE_GPA_PAGES */ + struct hv_input_device_domain device_domain; + union hv_input_vtl target_vtl; + u8 padding[3]; + u32 map_flags; + u64 target_device_va_base; + u64 gpa_page_list[]; +} __packed; + +struct hv_input_unmap_device_gpa_pages { /* HV_INPUT_UNMAP_DEVICE_GPA_PAG= ES */ + struct hv_input_device_domain device_domain; + u64 target_device_va_base; +} __packed; + #endif /* _HV_HVHDK_MINI_H */ --=20 2.51.2.vfs.0.1 From nobody Wed Jun 17 02:49:30 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id EFBC7383C78; Wed, 22 Apr 2026 02:33:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825253; cv=none; b=oIlZFYsFIShaO5roZr5r6mhp16S+/hWb/fCRYXmZgYk8iuxRrtPFfQ5ZxvyExBmCBJoSo24YNcy4d3i3j4rWGnQAoYxrhstSoZVTZUS2TrZsMA/+qt6j9GVnkffUolVA5zI2JXkgaqtqPYDJSibSxlesuRtLg+UfxbvXwT2pA6c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825253; c=relaxed/simple; bh=/Lw+KcSgHoHzffRnhyMXmnV0XRzyqvf5vEVQCn2n6FQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WVC760ZKSE0i22YCsCDqrjzT8ZJpmQyPe0XFxWmVlsstHs9mLq2iZBvSIM3I2xkB0wi4ECROGJ5MvoLTtgWxXLpm9Nt/B9T7Kv9dCxGjyJi9niDEIyzj3WJonxIU8ohTft/D3fkMrtNi4G6MfEV3CN8wwWT94YlhAOV7E+KbEYg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=Ic1yo5q3; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="Ic1yo5q3" Received: from mrdev.corp.microsoft.com (192-184-212-33.fiber.dynamic.sonic.net [192.184.212.33]) by linux.microsoft.com (Postfix) with ESMTPSA id 4179320B6F22; Tue, 21 Apr 2026 19:33:50 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 4179320B6F22 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776825231; bh=gW30GO/QQW7nzaJ7Yfwf+7LrHlL5wIbU4fBiOZJvGkU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Ic1yo5q3LYqOj6CnC5HptX8+MMhLYozrXi4UCdgjcq6buP3vSBaLinIDDMLVshfP6 CvEOskA5omxJw3fPoBPqwFFgnaHHNWpdi/V0GU4w8DMm9dy88Dzw+z1L5QFx4h7vfS AKa7sd938DsJ/ipTBUDb9aIMj1Sr1zH2tafiVPNk= From: Mukesh R To: hpa@zytor.com, robin.murphy@arm.com, robh@kernel.org, wei.liu@kernel.org, mrathor@linux.microsoft.com, mhklinux@outlook.com, muislam@microsoft.com, namjain@linux.microsoft.com, magnuskulke@linux.microsoft.com, anbelski@linux.microsoft.com, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, iommu@lists.linux.dev, linux-pci@vger.kernel.org, linux-arch@vger.kernel.org Cc: kys@microsoft.com, haiyangz@microsoft.com, decui@microsoft.com, longli@microsoft.com, tglx@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, joro@8bytes.org, will@kernel.org, lpieralisi@kernel.org, kwilczynski@kernel.org, bhelgaas@google.com, arnd@arndb.de Subject: [PATCH V1 10/13] PCI: hv: Build device id for a VMBus device, export PCI devid function Date: Tue, 21 Apr 2026 19:32:36 -0700 Message-ID: <20260422023239.1171963-11-mrathor@linux.microsoft.com> X-Mailer: git-send-email 2.51.2.vfs.0.1 In-Reply-To: <20260422023239.1171963-1-mrathor@linux.microsoft.com> References: <20260422023239.1171963-1-mrathor@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" On Hyper-V, most hypercalls related to PCI passthru to map/unmap regions, interrupts, etc need a device id as a parameter. This device id refers to that specific device during the lifetime of passthru. An L1VH VM only contains VMBus based devices. A device id for a VMBus device is slightly different in that it uses the hv_pcibus_device info for building it to make sure it matches exactly what the hypervisor expects. This VMBus based device id is needed when attaching devices in an L1VH based guest VM. Before building it, a check is done to make sure the device is a valid VMBus device. In remaining cases, PCI device id is used. So, also make pci device id build function public. Signed-off-by: Mukesh R --- arch/x86/hyperv/irqdomain.c | 9 +++++---- arch/x86/include/asm/mshyperv.h | 4 ++++ drivers/pci/controller/pci-hyperv.c | 25 +++++++++++++++++++++++++ include/asm-generic/mshyperv.h | 7 +++++++ 4 files changed, 41 insertions(+), 4 deletions(-) diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c index 229f986e08ea..527835b99a70 100644 --- a/arch/x86/hyperv/irqdomain.c +++ b/arch/x86/hyperv/irqdomain.c @@ -137,7 +137,7 @@ static int get_rid_cb(struct pci_dev *pdev, u16 alias, = void *data) return 0; } =20 -static union hv_device_id hv_build_devid_type_pci(struct pci_dev *pdev) +u64 hv_build_devid_type_pci(struct pci_dev *pdev) { int pos; union hv_device_id hv_devid; @@ -197,8 +197,9 @@ static union hv_device_id hv_build_devid_type_pci(struc= t pci_dev *pdev) } =20 out: - return hv_devid; + return hv_devid.as_uint64; } +EXPORT_SYMBOL_GPL(hv_build_devid_type_pci); =20 /* * hv_map_msi_interrupt() - Map the MSI IRQ in the hypervisor. @@ -221,7 +222,7 @@ int hv_map_msi_interrupt(struct irq_data *data, =20 msidesc =3D irq_data_get_msi_desc(data); pdev =3D msi_desc_to_pci_dev(msidesc); - hv_devid =3D hv_build_devid_type_pci(pdev); + hv_devid.as_uint64 =3D hv_build_devid_type_pci(pdev); cpu =3D cpumask_first(irq_data_get_effective_affinity_mask(data)); =20 return hv_map_interrupt(hv_current_partition_id, hv_devid, false, cpu, @@ -296,7 +297,7 @@ static int hv_unmap_msi_interrupt(struct pci_dev *pdev, { union hv_device_id hv_devid; =20 - hv_devid =3D hv_build_devid_type_pci(pdev); + hv_devid.as_uint64 =3D hv_build_devid_type_pci(pdev); return hv_unmap_interrupt(hv_devid.as_uint64, irq_entry); } =20 diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyper= v.h index f64393e853ee..039e8a986be3 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -271,6 +271,10 @@ static inline u64 hv_get_non_nested_msr(unsigned int r= eg) { return 0; } static inline int hv_apicid_to_vp_index(u32 apic_id) { return -EINVAL; } #endif /* CONFIG_HYPERV */ =20 +#if IS_ENABLED(CONFIG_HYPERV_IOMMU) +u64 hv_build_devid_type_pci(struct pci_dev *pdev); +#endif /* IS_ENABLED(CONFIG_HYPERV_IOMMU) */ + struct mshv_vtl_cpu_context { union { struct { diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/p= ci-hyperv.c index ed6b399afc80..8f6b818ee09b 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -578,6 +578,8 @@ static void hv_pci_onchannelcallback(void *context); #define DELIVERY_MODE APIC_DELIVERY_MODE_FIXED #define HV_MSI_CHIP_FLAGS MSI_CHIP_FLAG_SET_ACK =20 +static bool hv_vmbus_pci_device(struct pci_bus *pbus); + static int hv_pci_irqchip_init(void) { return 0; @@ -1005,6 +1007,24 @@ static struct irq_domain *hv_pci_get_root_domain(voi= d) static void hv_arch_irq_unmask(struct irq_data *data) { } #endif /* CONFIG_ARM64 */ =20 +u64 hv_pci_vmbus_device_id(struct pci_dev *pdev) +{ + struct hv_pcibus_device *hbus; + struct pci_bus *pbus =3D pdev->bus; + + if (!hv_vmbus_pci_device(pbus)) + return 0; + + hbus =3D container_of(pbus->sysdata, struct hv_pcibus_device, sysdata); + + return (hbus->hdev->dev_instance.b[5] << 24) | + (hbus->hdev->dev_instance.b[4] << 16) | + (hbus->hdev->dev_instance.b[7] << 8) | + (hbus->hdev->dev_instance.b[6] & 0xf8) | + PCI_FUNC(pdev->devfn); +} +EXPORT_SYMBOL_GPL(hv_pci_vmbus_device_id); + /** * hv_pci_generic_compl() - Invoked for a completion packet * @context: Set up by the sender of the packet. @@ -1403,6 +1423,11 @@ static struct pci_ops hv_pcifront_ops =3D { .write =3D hv_pcifront_write_config, }; =20 +static bool hv_vmbus_pci_device(struct pci_bus *pbus) +{ + return pbus->ops =3D=3D &hv_pcifront_ops; +} + /* * Paravirtual backchannel * diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h index e8cbc4e3f7ad..fe5ddd1c43ff 100644 --- a/include/asm-generic/mshyperv.h +++ b/include/asm-generic/mshyperv.h @@ -329,6 +329,13 @@ static inline enum hv_isolation_type hv_get_isolation_= type(void) } #endif /* CONFIG_HYPERV */ =20 +#if IS_ENABLED(CONFIG_PCI_HYPERV) +u64 hv_pci_vmbus_device_id(struct pci_dev *pdev); +#else /* IS_ENABLED(CONFIG_PCI_HYPERV) */ +static inline u64 hv_pci_vmbus_device_id(struct pci_dev *pdev) +{ return 0; } +#endif /* IS_ENABLED(CONFIG_PCI_HYPERV) */ + #if IS_ENABLED(CONFIG_MSHV_ROOT) static inline bool hv_root_partition(void) { --=20 2.51.2.vfs.0.1 From nobody Wed Jun 17 02:49:30 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7639338AC7B; Wed, 22 Apr 2026 02:33:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825255; cv=none; b=ezjP8soaspAyuTtMgHKGCM9zet589qdOl9VDSYj+nVfRYojYR/eKl+xTkTTEuMiBiIRkvkMKalvXyDembig8QMG3UfzMl44NGmdMnXAouWHpEVegLbkScLAY2Ai4Xz6s+dN9ZZHsdUQ4bH8Z/tPVGsA4fK08hLOxYPVej/bW0gw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825255; c=relaxed/simple; bh=h+Ny31frSUM48Jpi/cd7zYpVI/BIBt1tEJTL0lAuKbY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sPykYZI4fQJ4uHfFZIof8tnY/1k6TG9Dyg4lq7awN85C8yLNL/TgvwNdJl3ds/aD3Bh4rKnHTbVYHJY9sDi0Q5TWlreiAdc1QF0Ma4byRdle4gGTi0c5pYKa7bMXhWxrZnqPkLMb7D8f/Fo+QTON57kp1lsTER2kKgh4PolpYe4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=TcXjEJ/P; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="TcXjEJ/P" Received: from mrdev.corp.microsoft.com (192-184-212-33.fiber.dynamic.sonic.net [192.184.212.33]) by linux.microsoft.com (Postfix) with ESMTPSA id CBADC20B6F24; Tue, 21 Apr 2026 19:33:51 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com CBADC20B6F24 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776825232; bh=IyVWoBAu8eoWBZGeXBFmCaTyCy4N0jOKyOef3d/z/ho=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=TcXjEJ/P+AcV6722ar/CoYdYshkqWa8TsFFOqI0KlYMvaw0B1XIJ+5k7tc/2g7sDG yOdVaHW4JE2EKlasdErus6zgt52jFLeAbaf6TlrWafu12jUEyzv2p78qujjsUh/fvp ii0XSueUcDF2zdxFeghnRJmOwSTRR3OzC+PpdNN0= From: Mukesh R To: hpa@zytor.com, robin.murphy@arm.com, robh@kernel.org, wei.liu@kernel.org, mrathor@linux.microsoft.com, mhklinux@outlook.com, muislam@microsoft.com, namjain@linux.microsoft.com, magnuskulke@linux.microsoft.com, anbelski@linux.microsoft.com, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, iommu@lists.linux.dev, linux-pci@vger.kernel.org, linux-arch@vger.kernel.org Cc: kys@microsoft.com, haiyangz@microsoft.com, decui@microsoft.com, longli@microsoft.com, tglx@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, joro@8bytes.org, will@kernel.org, lpieralisi@kernel.org, kwilczynski@kernel.org, bhelgaas@google.com, arnd@arndb.de Subject: [PATCH V1 11/13] x86/hyperv: Implement hyperv virtual iommu Date: Tue, 21 Apr 2026 19:32:37 -0700 Message-ID: <20260422023239.1171963-12-mrathor@linux.microsoft.com> X-Mailer: git-send-email 2.51.2.vfs.0.1 In-Reply-To: <20260422023239.1171963-1-mrathor@linux.microsoft.com> References: <20260422023239.1171963-1-mrathor@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a new file to implement management of device domains, mapping and unmapping of iommu memory, and other iommu_ops to fit within the VFIO framework for PCI passthru on Hyper-V running Linux as root or L1VH parent. This also implements direct attach mechanism for PCI passthru, and it is also made to work within the VFIO framework. At a high level, during boot the hypervisor creates a default identity domain and attaches all devices to it. This nicely maps to Linux iommu subsystem IOMMU_DOMAIN_IDENTITY domain. As a result, Linux does not need to explicitly ask Hyper-V to attach devices and do maps/unmaps during boot. As mentioned previously, Hyper-V supports two ways to do PCI passthru: 1. Device Domain: root must create a device domain in the hypervisor, and do map/unmap hypercalls for mapping and unmapping guest RAM. All hypervisor communications use device id of type PCI for identifying and referencing the device. 2. Direct Attach: the hypervisor will simply use the guest's HW page table for mappings, thus the host need not do map/unmap device memory hypercalls. As such, direct attach passthru setup during guest boot is extremely fast. A direct attached device must be referenced via logical device id and not via the PCI device id. At present, L1VH root/parent only supports direct attaches. Also direct attach is default in non-L1VH cases because there are some significant performance issues with device domain implementation currently for guests with higher RAM (say more than 8GB), and that unfortunately cannot be addressed in the short term. Co-developed-by: Wei Liu Signed-off-by: Wei Liu Signed-off-by: Mukesh R --- MAINTAINERS | 1 + arch/x86/kernel/pci-dma.c | 2 + drivers/iommu/Kconfig | 5 +- drivers/iommu/Makefile | 1 + drivers/iommu/hyperv-iommu-root.c | 899 ++++++++++++++++++++++++++++++ include/asm-generic/mshyperv.h | 24 +- include/linux/hyperv.h | 6 + 7 files changed, 934 insertions(+), 4 deletions(-) create mode 100644 drivers/iommu/hyperv-iommu-root.c diff --git a/MAINTAINERS b/MAINTAINERS index f803a6a38fee..8ae040b89a56 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11914,6 +11914,7 @@ F: drivers/clocksource/hyperv_timer.c F: drivers/hid/hid-hyperv.c F: drivers/hv/ F: drivers/input/serio/hyperv-keyboard.c +F: drivers/iommu/hyperv-iommu-root.c F: drivers/iommu/hyperv-irq.c F: drivers/net/ethernet/microsoft/ F: drivers/net/hyperv/ diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c index 6267363e0189..cfeee6505e17 100644 --- a/arch/x86/kernel/pci-dma.c +++ b/arch/x86/kernel/pci-dma.c @@ -8,6 +8,7 @@ #include #include #include +#include =20 #include #include @@ -105,6 +106,7 @@ void __init pci_iommu_alloc(void) gart_iommu_hole_init(); amd_iommu_detect(); detect_intel_iommu(); + hv_iommu_detect(); swiotlb_init(x86_swiotlb_enable, x86_swiotlb_flags); } =20 diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index f86262b11416..7909cf4373a6 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -352,13 +352,12 @@ config MTK_IOMMU_V1 if unsure, say N here. =20 config HYPERV_IOMMU - bool "Hyper-V IRQ Handling" + bool "Hyper-V IOMMU Unit" depends on HYPERV && X86 select IOMMU_API default HYPERV help - Stub IOMMU driver to handle IRQs to support Hyper-V Linux - guest and root partitions. + Hyper-V pseudo IOMMU unit. =20 config VIRTIO_IOMMU tristate "Virtio IOMMU driver" diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 335ea77cced6..296fbc6ca829 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -31,6 +31,7 @@ obj-$(CONFIG_EXYNOS_IOMMU) +=3D exynos-iommu.o obj-$(CONFIG_FSL_PAMU) +=3D fsl_pamu.o fsl_pamu_domain.o obj-$(CONFIG_S390_IOMMU) +=3D s390-iommu.o obj-$(CONFIG_HYPERV) +=3D hyperv-irq.o +obj-$(CONFIG_HYPERV_IOMMU) +=3D hyperv-iommu-root.o obj-$(CONFIG_VIRTIO_IOMMU) +=3D virtio-iommu.o obj-$(CONFIG_IOMMU_SVA) +=3D iommu-sva.o obj-$(CONFIG_IOMMU_IOPF) +=3D io-pgfault.o diff --git a/drivers/iommu/hyperv-iommu-root.c b/drivers/iommu/hyperv-iommu= -root.c new file mode 100644 index 000000000000..492de5a1cf23 --- /dev/null +++ b/drivers/iommu/hyperv-iommu-root.c @@ -0,0 +1,899 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Hyper-V root vIOMMU driver. + * Copyright (C) 2026, Microsoft, Inc. + */ + +#include +#include +#include +#include +#include "dma-iommu.h" +#include +#include + +/* We will not claim these PCI devices, eg hypervisor needs it for debugge= r */ +static char *pci_devs_to_skip; +static int __init hv_iommu_setup_skip(char *str) +{ + pci_devs_to_skip =3D str; + + return 0; +} +/* hv_iommu_skip=3D(SSSS:BB:DD.F)(SSSS:BB:DD.F) */ +__setup("hv_iommu_skip=3D", hv_iommu_setup_skip); + +bool hv_no_attdev; /* disable direct device attach for passthru */ +EXPORT_SYMBOL_GPL(hv_no_attdev); +static int __init setup_hv_no_attdev(char *str) +{ + hv_no_attdev =3D true; + return 0; +} +__setup("hv_no_attdev", setup_hv_no_attdev); + +/* Iommu device that we export to the world. HyperV supports max of one */ +static struct iommu_device hv_virt_iommu; + +struct hv_domain { + struct iommu_domain iommu_dom; + u32 domid_num; /* as opposed to domain_id.type */ + bool attached_dom; /* is this direct attached dom? */ + u64 partid; /* partition id */ + spinlock_t mappings_lock; /* protects mappings_tree */ + struct rb_root_cached mappings_tree; /* iova to pa lookup tree */ +}; + +#define to_hv_domain(d) container_of(d, struct hv_domain, iommu_dom) + +struct hv_iommu_mapping { + phys_addr_t paddr; + struct interval_tree_node iova; + u32 flags; +}; + +/* + * By default, during boot the hypervisor creates one Stage 2 (S2) default + * domain. Stage 2 means that the page table is controlled by the hypervis= or. + * S2 default: access to entire root partition memory. This for us easily + * maps to IOMMU_DOMAIN_IDENTITY in the iommu subsystem, and + * is called HV_DEVICE_DOMAIN_ID_S2_DEFAULT in the hypervisor. + * + * Device Management: + * There are two ways to manage device attaches to domains: + * 1. Domain Attach: A device domain is created in the hypervisor, the + * device is attached to this domain, and then memory + * ranges are mapped in the map callbacks. + * 2. Direct Attach: No need to create a domain in the hypervisor for = direct + * attached devices. A hypercall is made to tell the + * hypervisor to attach the device to a guest. There is + * no need for explicit memory mappings because the + * hypervisor will just use the guest HW page table. + * + * Since a direct attach is much faster, it is the default. This can be + * changed via hv_no_attdev. + * + * L1VH: hypervisor only supports direct attach. + */ + +/* + * Create dummy domain to correspond to hypervisor prebuilt default identi= ty + * domain (dummy because we do not make hypercall to create them). + */ +static struct hv_domain hv_def_identity_dom; + +static bool hv_special_domain(struct hv_domain *hvdom) +{ + return hvdom =3D=3D &hv_def_identity_dom; +} + +struct iommu_domain_geometry default_geometry =3D (struct iommu_domain_geo= metry) { + .aperture_start =3D 0, + .aperture_end =3D -1UL, + .force_aperture =3D true, +}; + +/* + * Since the relevant hypercalls can only fit less than 512 PFNs in the pfn + * array, report 1M max. + */ +#define HV_IOMMU_PGSIZES (SZ_4K | SZ_1M) + +static u32 unique_id; /* unique numeric id of a new domain */ + +static void hv_iommu_detach_dev(struct iommu_domain *immdom, + struct device *dev); +static size_t hv_iommu_unmap_pages(struct iommu_domain *immdom, ulong iova, + size_t pgsize, size_t pgcount, + struct iommu_iotlb_gather *gather); + +/* + * If the current thread is a VMM thread, return the partition id of the V= M it + * is managing, else return HV_PARTITION_ID_INVALID. + */ +u64 hv_get_current_partid(void) +{ + u64 (*fn)(void); + u64 ptid; + + fn =3D symbol_get(mshv_current_partid); + if (!fn) + return HV_PARTITION_ID_INVALID; + + ptid =3D fn(); + symbol_put(mshv_current_partid); + + return ptid; +} +EXPORT_SYMBOL_GPL(hv_get_current_partid); + +/* If this is a VMM thread, then this domain is for a guest vm */ +static bool hv_curr_thread_is_vmm(void) +{ + return hv_get_current_partid() !=3D HV_PARTITION_ID_INVALID; +} + +/* As opposed to some host app like SPDK etc... */ +static bool hv_dom_owner_is_vmm(struct hv_domain *hvdom) +{ + return hvdom && hvdom->partid !=3D HV_PARTITION_ID_INVALID; +} + +static bool hv_iommu_capable(struct device *dev, enum iommu_cap cap) +{ + switch (cap) { + case IOMMU_CAP_CACHE_COHERENCY: + return true; + default: + return false; + } +} + +/* + * Check if given pci device is a direct attached device. Caller must have + * verified pdev is a valid pci device. + */ +bool hv_pcidev_is_attached_dev(struct pci_dev *pdev) +{ + struct iommu_domain *iommu_domain; + struct hv_domain *hvdom; + struct device *dev =3D &pdev->dev; + + iommu_domain =3D iommu_get_domain_for_dev(dev); + if (iommu_domain) { + hvdom =3D to_hv_domain(iommu_domain); + return hvdom->attached_dom; + } + + return false; +} +EXPORT_SYMBOL_GPL(hv_pcidev_is_attached_dev); + +bool hv_pcidev_is_pthru_dev(struct pci_dev *pdev) +{ + struct device *dev =3D &pdev->dev; + struct hv_domain *hvdom =3D dev_iommu_priv_get(dev); + + if (hvdom && !hv_special_domain(hvdom)) + return true; + + return false; +} +EXPORT_SYMBOL_GPL(hv_pcidev_is_pthru_dev); + +/* Build device id for direct attached devices */ +static u64 hv_build_devid_type_logical(struct pci_dev *pdev) +{ + hv_pci_segment segment; + union hv_device_id hv_devid; + union hv_pci_bdf bdf =3D {.as_uint16 =3D 0}; + u32 rid =3D PCI_DEVID(pdev->bus->number, pdev->devfn); + + segment =3D pci_domain_nr(pdev->bus); + bdf.bus =3D PCI_BUS_NUM(rid); + bdf.device =3D PCI_SLOT(rid); + bdf.function =3D PCI_FUNC(rid); + + hv_devid.as_uint64 =3D 0; + hv_devid.device_type =3D HV_DEVICE_TYPE_LOGICAL; + hv_devid.logical.id =3D (u64)segment << 16 | bdf.as_uint16; + + return hv_devid.as_uint64; +} + +u64 hv_build_devid_oftype(struct pci_dev *pdev, enum hv_device_type type) +{ + if (type =3D=3D HV_DEVICE_TYPE_LOGICAL) { + if (hv_l1vh_partition()) + return hv_pci_vmbus_device_id(pdev); + else + return hv_build_devid_type_logical(pdev); + } else if (type =3D=3D HV_DEVICE_TYPE_PCI) +#ifdef CONFIG_X86 + return hv_build_devid_type_pci(pdev); +#else + return 0; +#endif + return 0; +} +EXPORT_SYMBOL_GPL(hv_build_devid_oftype); + +/* Create a new device domain in the hypervisor */ +static int hv_iommu_create_hyp_devdom(struct hv_domain *hvdom) +{ + u64 status; + struct hv_input_device_domain *ddp; + struct hv_input_create_device_domain *input; + unsigned long flags; + + local_irq_save(flags); + input =3D *this_cpu_ptr(hyperv_pcpu_input_arg); + memset(input, 0, sizeof(*input)); + + ddp =3D &input->device_domain; + ddp->partition_id =3D HV_PARTITION_ID_SELF; + ddp->domain_id.type =3D HV_DEVICE_DOMAIN_TYPE_S2; + ddp->domain_id.id =3D hvdom->domid_num; + + input->create_device_domain_flags.forward_progress_required =3D 1; + input->create_device_domain_flags.inherit_owning_vtl =3D 0; + + status =3D hv_do_hypercall(HVCALL_CREATE_DEVICE_DOMAIN, input, NULL); + + local_irq_restore(flags); + + if (!hv_result_success(status)) + hv_status_err(status, "\n"); + + return hv_result_to_errno(status); +} + +/* During boot, all devices are attached to this */ +static struct iommu_domain *hv_iommu_domain_alloc_identity(struct device *= dev) +{ + return &hv_def_identity_dom.iommu_dom; +} + +static struct iommu_domain *hv_iommu_domain_alloc_paging(struct device *de= v) +{ + struct hv_domain *hvdom; + int rc; + + if (hv_l1vh_partition() && !hv_curr_thread_is_vmm()) { + pr_err("Hyper-V: l1vh iommu does not support host devices\n"); + return NULL; + } + + hvdom =3D kzalloc(sizeof(struct hv_domain), GFP_KERNEL); + if (hvdom =3D=3D NULL) + return NULL; + + spin_lock_init(&hvdom->mappings_lock); + hvdom->mappings_tree =3D RB_ROOT_CACHED; + + /* Called under iommu group mutex, so single threaded */ + if (++unique_id =3D=3D HV_DEVICE_DOMAIN_ID_S2_DEFAULT) /* ie, 0 */ + goto out_err; + + hvdom->domid_num =3D unique_id; + hvdom->partid =3D hv_get_current_partid(); + hvdom->iommu_dom.geometry =3D default_geometry; + hvdom->iommu_dom.pgsize_bitmap =3D HV_IOMMU_PGSIZES; + + /* For guests, by default we do direct attaches, so no domain in hyp */ + if (hv_dom_owner_is_vmm(hvdom) && !hv_no_attdev) + hvdom->attached_dom =3D true; + else { + rc =3D hv_iommu_create_hyp_devdom(hvdom); + if (rc) + goto out_err; + } + + return &hvdom->iommu_dom; + +out_err: + unique_id--; + kfree(hvdom); + return NULL; +} + +static void hv_iommu_domain_free(struct iommu_domain *immdom) +{ + struct hv_domain *hvdom =3D to_hv_domain(immdom); + unsigned long flags; + u64 status; + struct hv_input_delete_device_domain *input; + + if (hv_special_domain(hvdom)) + return; + + if (!hv_dom_owner_is_vmm(hvdom) || hv_no_attdev) { + struct hv_input_device_domain *ddp; + + local_irq_save(flags); + input =3D *this_cpu_ptr(hyperv_pcpu_input_arg); + ddp =3D &input->device_domain; + memset(input, 0, sizeof(*input)); + + ddp->partition_id =3D HV_PARTITION_ID_SELF; + ddp->domain_id.type =3D HV_DEVICE_DOMAIN_TYPE_S2; + ddp->domain_id.id =3D hvdom->domid_num; + + status =3D hv_do_hypercall(HVCALL_DELETE_DEVICE_DOMAIN, input, + NULL); + local_irq_restore(flags); + + if (!hv_result_success(status)) + hv_status_err(status, "\n"); + } + + kfree(hvdom); +} + +/* Attach a device to a domain previously created in the hypervisor */ +static int hv_iommu_att_dev2dom(struct hv_domain *hvdom, struct pci_dev *p= dev) +{ + unsigned long flags; + u64 status; + enum hv_device_type dev_type; + struct hv_input_attach_device_domain *input; + + local_irq_save(flags); + input =3D *this_cpu_ptr(hyperv_pcpu_input_arg); + memset(input, 0, sizeof(*input)); + + input->device_domain.partition_id =3D HV_PARTITION_ID_SELF; + input->device_domain.domain_id.type =3D HV_DEVICE_DOMAIN_TYPE_S2; + input->device_domain.domain_id.id =3D hvdom->domid_num; + + /* NB: Upon guest shutdown, device is re-attached to the default domain + * without explicit detach. + */ + if (hv_l1vh_partition()) + dev_type =3D HV_DEVICE_TYPE_LOGICAL; + else + dev_type =3D HV_DEVICE_TYPE_PCI; + + input->device_id.as_uint64 =3D hv_build_devid_oftype(pdev, dev_type); + + status =3D hv_do_hypercall(HVCALL_ATTACH_DEVICE_DOMAIN, input, NULL); + local_irq_restore(flags); + + if (!hv_result_success(status)) + hv_status_err(status, "\n"); + + return hv_result_to_errno(status); +} + +/* Caller must have validated that dev is a valid pci dev */ +static int hv_iommu_direct_attach_device(struct pci_dev *pdev, u64 ptid) +{ + struct hv_input_attach_device *input; + u64 status; + int rc; + unsigned long flags; + union hv_device_id host_devid; + enum hv_device_type dev_type; + + if (ptid =3D=3D HV_PARTITION_ID_INVALID) { + pr_err("Hyper-V: Invalid partition id in direct attach\n"); + return -EINVAL; + } + + if (hv_l1vh_partition()) + dev_type =3D HV_DEVICE_TYPE_LOGICAL; + else + dev_type =3D HV_DEVICE_TYPE_PCI; + + host_devid.as_uint64 =3D hv_build_devid_oftype(pdev, dev_type); + + do { + local_irq_save(flags); + input =3D *this_cpu_ptr(hyperv_pcpu_input_arg); + memset(input, 0, sizeof(*input)); + input->partition_id =3D ptid; + input->device_id =3D host_devid; + + /* Hypervisor associates logical_id with this device, and in + * some hypercalls like retarget interrupts, logical_id must be + * used instead of the BDF. It is a required parameter. + */ + input->attdev_flags.logical_id =3D 1; + input->logical_devid =3D + hv_build_devid_oftype(pdev, HV_DEVICE_TYPE_LOGICAL); + + status =3D hv_do_hypercall(HVCALL_ATTACH_DEVICE, input, NULL); + local_irq_restore(flags); + + if (hv_result(status) =3D=3D HV_STATUS_INSUFFICIENT_MEMORY) { + rc =3D hv_call_deposit_pages(NUMA_NO_NODE, ptid, 1); + if (rc) + break; + } + } while (hv_result(status) =3D=3D HV_STATUS_INSUFFICIENT_MEMORY); + + if (!hv_result_success(status)) + hv_status_err(status, "\n"); + + return hv_result_to_errno(status); +} + +/* Attach a device for passthru to guest VMs, host apps like SPDK, etc */ +static int hv_iommu_attach_dev(struct iommu_domain *immdom, struct device = *dev, + struct iommu_domain *old) +{ + struct pci_dev *pdev; + int rc; + struct hv_domain *hvdom_new =3D to_hv_domain(immdom); + struct hv_domain *hvdom_prev =3D dev_iommu_priv_get(dev); + + /* Only allow PCI devices for now */ + if (!dev_is_pci(dev)) + return -EINVAL; + + pdev =3D to_pci_dev(dev); + + if (hv_l1vh_partition() && !hv_special_domain(hvdom_new) && + !hvdom_new->attached_dom) + return -EINVAL; + + /* VFIO does not do explicit detach calls, hence check first if we need + * to detach first. Also, in case of guest shutdown, it's the VMM + * thread that attaches it back to the hv_def_identity_dom, and + * hvdom_prev will not be null then. It is null during boot. + */ + if (hvdom_prev) + if (!hv_l1vh_partition() || !hv_special_domain(hvdom_prev)) + hv_iommu_detach_dev(&hvdom_prev->iommu_dom, dev); + + if (hv_l1vh_partition() && hv_special_domain(hvdom_new)) { + dev_iommu_priv_set(dev, hvdom_new); /* sets "private" field */ + return 0; + } + + if (hvdom_new->attached_dom) + rc =3D hv_iommu_direct_attach_device(pdev, hvdom_new->partid); + else + rc =3D hv_iommu_att_dev2dom(hvdom_new, pdev); + + if (rc =3D=3D 0) + dev_iommu_priv_set(dev, hvdom_new); /* sets "private" field */ + + return rc; +} + +static void hv_iommu_det_dev_from_guest(struct pci_dev *pdev, u64 ptid) +{ + struct hv_input_detach_device *input; + u64 status, log_devid; + unsigned long flags; + + log_devid =3D hv_build_devid_oftype(pdev, HV_DEVICE_TYPE_LOGICAL); + + local_irq_save(flags); + input =3D *this_cpu_ptr(hyperv_pcpu_input_arg); + memset(input, 0, sizeof(*input)); + + input->partition_id =3D ptid; + input->logical_devid =3D log_devid; + status =3D hv_do_hypercall(HVCALL_DETACH_DEVICE, input, NULL); + local_irq_restore(flags); + + if (!hv_result_success(status)) + hv_status_err(status, "\n"); +} + +static void hv_iommu_det_dev_from_dom(struct pci_dev *pdev) +{ + u64 status, devid; + unsigned long flags; + struct hv_input_detach_device_domain *input; + + devid =3D hv_build_devid_oftype(pdev, HV_DEVICE_TYPE_PCI); + + local_irq_save(flags); + input =3D *this_cpu_ptr(hyperv_pcpu_input_arg); + memset(input, 0, sizeof(*input)); + + input->partition_id =3D HV_PARTITION_ID_SELF; + input->device_id.as_uint64 =3D devid; + status =3D hv_do_hypercall(HVCALL_DETACH_DEVICE_DOMAIN, input, NULL); + local_irq_restore(flags); + + if (!hv_result_success(status)) + hv_status_err(status, "\n"); +} + +static void hv_iommu_detach_dev(struct iommu_domain *immdom, struct device= *dev) +{ + struct pci_dev *pdev; + struct hv_domain *hvdom =3D to_hv_domain(immdom); + + /* See the attach function, only PCI devices for now */ + if (!dev_is_pci(dev)) + return; + + pdev =3D to_pci_dev(dev); + + if (hvdom->attached_dom) + hv_iommu_det_dev_from_guest(pdev, hvdom->partid); + + /* Do not reset attached_dom, hv_iommu_unmap_pages happens + * next. + */ + else + hv_iommu_det_dev_from_dom(pdev); +} + +static int hv_iommu_add_tree_mapping(struct hv_domain *hvdom, + unsigned long iova, phys_addr_t paddr, + size_t size, u32 flags) +{ + unsigned long irqflags; + struct hv_iommu_mapping *mapping; + + mapping =3D kzalloc(sizeof(*mapping), GFP_ATOMIC); + if (!mapping) + return -ENOMEM; + + mapping->paddr =3D paddr; + mapping->iova.start =3D iova; + mapping->iova.last =3D iova + size - 1; + mapping->flags =3D flags; + + spin_lock_irqsave(&hvdom->mappings_lock, irqflags); + interval_tree_insert(&mapping->iova, &hvdom->mappings_tree); + spin_unlock_irqrestore(&hvdom->mappings_lock, irqflags); + + return 0; +} + +static size_t hv_iommu_del_tree_mappings(struct hv_domain *hvdom, + unsigned long iova, size_t size) +{ + unsigned long flags; + size_t unmapped =3D 0; + unsigned long last =3D iova + size - 1; + struct hv_iommu_mapping *mapping =3D NULL; + struct interval_tree_node *node, *next; + + spin_lock_irqsave(&hvdom->mappings_lock, flags); + next =3D interval_tree_iter_first(&hvdom->mappings_tree, iova, last); + while (next) { + node =3D next; + mapping =3D container_of(node, struct hv_iommu_mapping, iova); + next =3D interval_tree_iter_next(node, iova, last); + + /* Trying to split a mapping? Not supported for now. */ + if (mapping->iova.start < iova) + break; + + unmapped +=3D mapping->iova.last - mapping->iova.start + 1; + + interval_tree_remove(node, &hvdom->mappings_tree); + kfree(mapping); + } + spin_unlock_irqrestore(&hvdom->mappings_lock, flags); + + return unmapped; +} + +/* Return: must return exact status from the hypercall without changes */ +static u64 hv_iommu_map_pgs(struct hv_domain *hvdom, + unsigned long iova, phys_addr_t paddr, + unsigned long npages, u32 map_flags) +{ + u64 status; + int i; + struct hv_input_map_device_gpa_pages *input; + unsigned long flags, pfn; + + local_irq_save(flags); + input =3D *this_cpu_ptr(hyperv_pcpu_input_arg); + memset(input, 0, sizeof(*input)); + + input->device_domain.partition_id =3D HV_PARTITION_ID_SELF; + input->device_domain.domain_id.type =3D HV_DEVICE_DOMAIN_TYPE_S2; + input->device_domain.domain_id.id =3D hvdom->domid_num; + input->map_flags =3D map_flags; + input->target_device_va_base =3D iova; + + pfn =3D paddr >> HV_HYP_PAGE_SHIFT; + for (i =3D 0; i < npages; i++, pfn++) + input->gpa_page_list[i] =3D pfn; + + status =3D hv_do_rep_hypercall(HVCALL_MAP_DEVICE_GPA_PAGES, npages, 0, + input, NULL); + + local_irq_restore(flags); + return status; +} + +/* + * The core VFIO code loops over memory ranges calling this function with + * the largest size from HV_IOMMU_PGSIZES. cond_resched() is in vfio_iommu= _map. + */ +static int hv_iommu_map_pages(struct iommu_domain *immdom, ulong iova, + phys_addr_t paddr, size_t pgsize, size_t pgcount, + int prot, gfp_t gfp, size_t *mapped) +{ + u32 map_flags; + int ret; + u64 status; + unsigned long npages, done =3D 0; + struct hv_domain *hvdom =3D to_hv_domain(immdom); + size_t size =3D pgsize * pgcount; + + map_flags =3D HV_MAP_GPA_READABLE; /* required */ + map_flags |=3D prot & IOMMU_WRITE ? HV_MAP_GPA_WRITABLE : 0; + + ret =3D hv_iommu_add_tree_mapping(hvdom, iova, paddr, size, map_flags); + if (ret) + return ret; + + if (hvdom->attached_dom) { + *mapped =3D size; + return 0; + } + + npages =3D size >> HV_HYP_PAGE_SHIFT; + while (done < npages) { + ulong completed, remain =3D npages - done; + + status =3D hv_iommu_map_pgs(hvdom, iova, paddr, remain, + map_flags); + + completed =3D hv_repcomp(status); + done =3D done + completed; + iova =3D iova + (completed << HV_HYP_PAGE_SHIFT); + paddr =3D paddr + (completed << HV_HYP_PAGE_SHIFT); + + if (hv_result(status) =3D=3D HV_STATUS_INSUFFICIENT_MEMORY) { + ret =3D hv_call_deposit_pages(NUMA_NO_NODE, + hv_current_partition_id, + 256); + if (ret) + break; + continue; + } + if (!hv_result_success(status)) + break; + } + + if (!hv_result_success(status)) { + size_t done_size =3D done << HV_HYP_PAGE_SHIFT; + + hv_status_err(status, "pgs:%lx/%lx iova:%lx\n", + done, npages, iova); + /* + * lookup tree has all mappings [0 - size-1]. Below unmap will + * only remove from [0 - done], we need to remove second chunk + * [done+1 - size-1]. + */ + hv_iommu_del_tree_mappings(hvdom, iova, size - done_size); + hv_iommu_unmap_pages(immdom, iova - done_size, HV_HYP_PAGE_SIZE, + done, NULL); + if (mapped) + *mapped =3D 0; + } else + if (mapped) + *mapped =3D size; + + return hv_result_to_errno(status); +} + +static size_t hv_iommu_unmap_pages(struct iommu_domain *immdom, ulong iova, + size_t pgsize, size_t pgcount, + struct iommu_iotlb_gather *gather) +{ + unsigned long flags, npages; + struct hv_input_unmap_device_gpa_pages *input; + u64 status; + struct hv_domain *hvdom =3D to_hv_domain(immdom); + size_t unmapped, size =3D pgsize * pgcount; + + unmapped =3D hv_iommu_del_tree_mappings(hvdom, iova, size); + if (unmapped < size) + pr_err("%s: could not delete all mappings (%lx:%lx/%lx)\n", + __func__, iova, unmapped, size); + + if (hvdom->attached_dom) + return size; + + npages =3D size >> HV_HYP_PAGE_SHIFT; + + local_irq_save(flags); + input =3D *this_cpu_ptr(hyperv_pcpu_input_arg); + memset(input, 0, sizeof(*input)); + + input->device_domain.partition_id =3D HV_PARTITION_ID_SELF; + input->device_domain.domain_id.type =3D HV_DEVICE_DOMAIN_TYPE_S2; + input->device_domain.domain_id.id =3D hvdom->domid_num; + input->target_device_va_base =3D iova; + + status =3D hv_do_rep_hypercall(HVCALL_UNMAP_DEVICE_GPA_PAGES, npages, + 0, input, NULL); + local_irq_restore(flags); + + if (!hv_result_success(status)) + hv_status_err(status, "\n"); + + return unmapped; +} + +static phys_addr_t hv_iommu_iova_to_phys(struct iommu_domain *immdom, + dma_addr_t iova) +{ + unsigned long flags; + struct hv_iommu_mapping *mapping; + struct interval_tree_node *node; + u64 paddr =3D 0; + struct hv_domain *hvdom =3D to_hv_domain(immdom); + + spin_lock_irqsave(&hvdom->mappings_lock, flags); + node =3D interval_tree_iter_first(&hvdom->mappings_tree, iova, iova); + if (node) { + mapping =3D container_of(node, struct hv_iommu_mapping, iova); + paddr =3D mapping->paddr + (iova - mapping->iova.start); + } + spin_unlock_irqrestore(&hvdom->mappings_lock, flags); + + return paddr; +} + +/* + * Currently, hypervisor does not provide list of devices it is using + * dynamically. So use this to allow users to manually specify devices that + * should be skipped. (eg. hypervisor debugger using some network device). + */ +static struct iommu_device *hv_iommu_probe_device(struct device *dev) +{ + if (!dev_is_pci(dev)) + return ERR_PTR(-ENODEV); + + if (pci_devs_to_skip && *pci_devs_to_skip) { + int rc, pos =3D 0; + int parsed; + int segment, bus, slot, func; + struct pci_dev *pdev =3D to_pci_dev(dev); + + do { + parsed =3D 0; + + rc =3D sscanf(pci_devs_to_skip + pos, " (%x:%x:%x.%x) %n", + &segment, &bus, &slot, &func, &parsed); + if (rc) + break; + if (parsed <=3D 0) + break; + + if (pci_domain_nr(pdev->bus) =3D=3D segment && + pdev->bus->number =3D=3D bus && + PCI_SLOT(pdev->devfn) =3D=3D slot && + PCI_FUNC(pdev->devfn) =3D=3D func) { + + dev_info(dev, "skipped by Hyper-V IOMMU\n"); + return ERR_PTR(-ENODEV); + } + pos +=3D parsed; + + } while (pci_devs_to_skip[pos]); + } + + /* Device will be explicitly attached to the default domain, so no need + * to do dev_iommu_priv_set() here. + */ + + return &hv_virt_iommu; +} + +static void hv_iommu_probe_finalize(struct device *dev) +{ + struct iommu_domain *immdom =3D iommu_get_domain_for_dev(dev); + + if (immdom && immdom->type =3D=3D IOMMU_DOMAIN_DMA) + iommu_setup_dma_ops(dev, immdom); + else + set_dma_ops(dev, NULL); +} + +static void hv_iommu_release_device(struct device *dev) +{ + struct hv_domain *hvdom =3D dev_iommu_priv_get(dev); + + /* Need to detach device from device domain if necessary. */ + if (hvdom) + hv_iommu_detach_dev(&hvdom->iommu_dom, dev); + + dev_iommu_priv_set(dev, NULL); + set_dma_ops(dev, NULL); +} + +static struct iommu_group *hv_iommu_device_group(struct device *dev) +{ + if (dev_is_pci(dev)) + return pci_device_group(dev); + else + return generic_device_group(dev); +} + +static int hv_iommu_def_domain_type(struct device *dev) +{ + /* The hypervisor always creates this by default during boot */ + return IOMMU_DOMAIN_IDENTITY; +} + +static struct iommu_ops hv_iommu_ops =3D { + .capable =3D hv_iommu_capable, + .domain_alloc_identity =3D hv_iommu_domain_alloc_identity, + .domain_alloc_paging =3D hv_iommu_domain_alloc_paging, + .probe_device =3D hv_iommu_probe_device, + .probe_finalize =3D hv_iommu_probe_finalize, + .release_device =3D hv_iommu_release_device, + .def_domain_type =3D hv_iommu_def_domain_type, + .device_group =3D hv_iommu_device_group, + .default_domain_ops =3D &(const struct iommu_domain_ops) { + .attach_dev =3D hv_iommu_attach_dev, + .map_pages =3D hv_iommu_map_pages, + .unmap_pages =3D hv_iommu_unmap_pages, + .iova_to_phys =3D hv_iommu_iova_to_phys, + .free =3D hv_iommu_domain_free, + }, + .owner =3D THIS_MODULE, +}; + +static void __init hv_initialize_special_domains(void) +{ + hv_def_identity_dom.iommu_dom.geometry =3D default_geometry; + hv_def_identity_dom.domid_num =3D HV_DEVICE_DOMAIN_ID_S2_DEFAULT; /* 0 */ +} + +static int __init hv_iommu_init(void) +{ + int ret; + struct iommu_device *iommup =3D &hv_virt_iommu; + + if (!hv_is_hyperv_initialized()) + return -ENODEV; + + ret =3D iommu_device_sysfs_add(iommup, NULL, NULL, "%s", "hyperv-iommu"); + if (ret) { + pr_err("Hyper-V: iommu_device_sysfs_add failed: %d\n", ret); + return ret; + } + + /* This must come before iommu_device_register because the latter calls + * into the hooks. + */ + hv_initialize_special_domains(); + + ret =3D iommu_device_register(iommup, &hv_iommu_ops, NULL); + if (ret) { + pr_err("Hyper-V: iommu_device_register failed: %d\n", ret); + goto err_sysfs_remove; + } + + pr_info("Hyper-V IOMMU initialized\n"); + + return 0; + +err_sysfs_remove: + iommu_device_sysfs_remove(iommup); + return ret; +} + +void __init hv_iommu_detect(void) +{ + if (no_iommu || iommu_detected) + return; + + /* For l1vh, always expose an iommu unit */ + if (!hv_l1vh_partition()) + if (!(ms_hyperv.misc_features & HV_DEVICE_DOMAIN_AVAILABLE)) + return; + + iommu_detected =3D 1; + x86_init.iommu.iommu_init =3D hv_iommu_init; + + pci_request_acs(); +} diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h index fe5ddd1c43ff..edbcfc2a9b60 100644 --- a/include/asm-generic/mshyperv.h +++ b/include/asm-generic/mshyperv.h @@ -331,11 +331,33 @@ static inline enum hv_isolation_type hv_get_isolation= _type(void) =20 #if IS_ENABLED(CONFIG_PCI_HYPERV) u64 hv_pci_vmbus_device_id(struct pci_dev *pdev); -#else /* IS_ENABLED(CONFIG_PCI_HYPERV) */ +#else /* IS_ENABLED(CONFIG_PCI_HYPERV) */ static inline u64 hv_pci_vmbus_device_id(struct pci_dev *pdev) { return 0; } #endif /* IS_ENABLED(CONFIG_PCI_HYPERV) */ =20 +#if IS_ENABLED(CONFIG_HYPERV_IOMMU) +u64 hv_get_current_partid(void); +bool hv_pcidev_is_attached_dev(struct pci_dev *pdev); +bool hv_pcidev_is_pthru_dev(struct pci_dev *pdev); +u64 hv_build_devid_oftype(struct pci_dev *pdev, enum hv_device_type type); + +#else /* Remove following after arm64 implementation is done */ + +static inline bool hv_pcidev_is_attached_dev(struct pci_dev *pdev) +{ return false; } + +static inline bool hv_pcidev_is_pthru_dev(struct pci_dev *pdev) +{ return false; } + +static inline u64 hv_build_devid_oftype(struct pci_dev *pdev, + enum hv_device_type type) +{ return 0; } + +static inline u64 hv_get_current_partid(void) +{ return HV_PARTITION_ID_INVALID; } +#endif /* IS_ENABLED(CONFIG_HYPERV_IOMMU) */ + #if IS_ENABLED(CONFIG_MSHV_ROOT) static inline bool hv_root_partition(void) { diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 5459e776ec17..6eee1cbf6f23 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1769,4 +1769,10 @@ static inline unsigned long virt_to_hvpfn(void *addr) #define HVPFN_DOWN(x) ((x) >> HV_HYP_PAGE_SHIFT) #define page_to_hvpfn(page) (page_to_pfn(page) * NR_HV_HYP_PAGES_IN_PAGE) =20 +#ifdef CONFIG_HYPERV_IOMMU +void __init hv_iommu_detect(void); +#else +static inline void hv_iommu_detect(void) { } +#endif /* CONFIG_HYPERV_IOMMU */ + #endif /* _HYPERV_H */ --=20 2.51.2.vfs.0.1 From nobody Wed Jun 17 02:49:30 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 769D338AC7E; Wed, 22 Apr 2026 02:34:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825253; cv=none; b=N5SacK5SaWWHaAs+E4J2UK0Mwd8ZWiiYiNAUpCbUwZJJH17ske5Dcx/2Hwh18Xc1HQ6jRRwFiyWYBzWhY3skyvkXmxjDVGWNgaC/hiXiluu1imbJc34eQPTsoaGUKB3kpqxTwwZRtNiEXT6ZXaCFayhuk8beAme2N0Rwsm6p1Fg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825253; c=relaxed/simple; bh=3TNU4EMnv17LluIqrnkhvD8xOvL+YdsPt3gnwRHpmUc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kTUqVjg/NXv0brKByuOWEFH/ohnfMAiBuTbIZFUCxUgxde3JelU8C4wNXvH9geMOR0mxGTQWEswvHy2BfGI+VW+4jercaq5050ZFq+M1rNe/X/smlfImoxsKADf1LIz6QJp81hkMh3QFkfZEuVGHY1ONk2zl3tIL6kEqpn94Mgw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=YViEiMXm; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="YViEiMXm" Received: from mrdev.corp.microsoft.com (192-184-212-33.fiber.dynamic.sonic.net [192.184.212.33]) by linux.microsoft.com (Postfix) with ESMTPSA id CB81720B6F08; Tue, 21 Apr 2026 19:33:57 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com CB81720B6F08 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776825238; bh=84awl3Z2Kbawc6FZaMJAbg5OX/GvsOXiTuFqfIzdbWs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YViEiMXm980dK9+OXkB0Jxegh6wk9Bxkhn8VP0IxBr0DaR7ZjYCu90PMX7OVW6hbF ixwXK4YlP+caCujsCD3EdpL2Lqb+QNfYsnV1w1SZtBZIbILsTmhuU1Mrh2APSmKAcL J0vXWvQm+zFvJ+7eTv+aJ0j+7Zb4sAp9lzfAKkKA= From: Mukesh R To: hpa@zytor.com, robin.murphy@arm.com, robh@kernel.org, wei.liu@kernel.org, mrathor@linux.microsoft.com, mhklinux@outlook.com, muislam@microsoft.com, namjain@linux.microsoft.com, magnuskulke@linux.microsoft.com, anbelski@linux.microsoft.com, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, iommu@lists.linux.dev, linux-pci@vger.kernel.org, linux-arch@vger.kernel.org Cc: kys@microsoft.com, haiyangz@microsoft.com, decui@microsoft.com, longli@microsoft.com, tglx@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, joro@8bytes.org, will@kernel.org, lpieralisi@kernel.org, kwilczynski@kernel.org, bhelgaas@google.com, arnd@arndb.de Subject: [PATCH V1 12/13] mshv: Populate mmio mappings for PCI passthru Date: Tue, 21 Apr 2026 19:32:38 -0700 Message-ID: <20260422023239.1171963-13-mrathor@linux.microsoft.com> X-Mailer: git-send-email 2.51.2.vfs.0.1 In-Reply-To: <20260422023239.1171963-1-mrathor@linux.microsoft.com> References: <20260422023239.1171963-1-mrathor@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Upon guest access, in case of missing mmio mapping, the hypervisor generates an unmapped gpa intercept. In this path, lookup the PCI resource pfn for the guest gpa, and ask the hypervisor to map it via hypercall. The PCI resource pfn is maintained by the VFIO driver, and obtained via fixup_user_fault call (similar to KVM). Also, VFIO no longer puts the mmio pfn in vma->vm_pgoff. So, remove code that is using it to map mmio space. It is broken and will cause panic. Signed-off-by: Mukesh R --- drivers/hv/mshv_root_main.c | 113 ++++++++++++++++++++++++++++++------ 1 file changed, 96 insertions(+), 17 deletions(-) diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c index 6ceb5f608589..a7864463961b 100644 --- a/drivers/hv/mshv_root_main.c +++ b/drivers/hv/mshv_root_main.c @@ -46,6 +46,9 @@ MODULE_DESCRIPTION("Microsoft Hyper-V root partition VMM = interface /dev/mshv"); #define HV_VP_COUNTER_ROOT_DISPATCH_THREAD_BLOCKED 95 #endif =20 +static bool hv_nofull_mmio; /* don't map entire mmio region upon fault */ +module_param(hv_nofull_mmio, bool, 0644); + struct mshv_root mshv_root; =20 enum hv_scheduler_type hv_scheduler_type; @@ -641,6 +644,94 @@ mshv_partition_region_by_gfn_get(struct mshv_partition= *p, u64 gfn) return region; } =20 +/* + * Check if uaddr is for mmio range. If yes, return 0 with mmio_pfn filled= in + * else just return -errno. + */ +static int mshv_chk_get_mmio_start_pfn(u64 uaddr, u64 *mmio_pfnp) +{ + struct vm_area_struct *vma; + bool is_mmio; + struct follow_pfnmap_args pfnmap_args; + int rc =3D -EINVAL; + + mmap_read_lock(current->mm); + vma =3D vma_lookup(current->mm, uaddr); + is_mmio =3D vma ? !!(vma->vm_flags & (VM_IO | VM_PFNMAP)) : 0; + if (!is_mmio) + goto unlock_mmap_out; + + pfnmap_args.vma =3D vma; + pfnmap_args.address =3D uaddr; + + rc =3D follow_pfnmap_start(&pfnmap_args); + if (rc) { + rc =3D fixup_user_fault(current->mm, uaddr, FAULT_FLAG_WRITE, + NULL); + if (rc) + goto unlock_mmap_out; + + rc =3D follow_pfnmap_start(&pfnmap_args); + if (rc) + goto unlock_mmap_out; + } + + *mmio_pfnp =3D pfnmap_args.pfn; + follow_pfnmap_end(&pfnmap_args); + +unlock_mmap_out: + mmap_read_unlock(current->mm); + return rc; +} + +/* + * Check if the unmapped gpa belongs to mmio space. If yes, resolve it. + * + * Returns: True if valid mmio intercept and handled, else false. + */ +static bool mshv_handle_unmapped_gpa(struct mshv_vp *vp) +{ + struct hv_message *hvmsg =3D vp->vp_intercept_msg_page; + u64 gfn, uaddr, mmio_spa, numpgs; + struct mshv_mem_region *rg; + int rc =3D -EINVAL; + struct mshv_partition *pt =3D vp->vp_partition; +#if defined(CONFIG_X86_64) + struct hv_x64_memory_intercept_message *msg =3D + (struct hv_x64_memory_intercept_message *)hvmsg->u.payload; +#elif defined(CONFIG_ARM64) + struct hv_arm64_memory_intercept_message *msg =3D + (struct hv_arm64_memory_intercept_message *)hvmsg->u.payload; +#endif + + gfn =3D msg->guest_physical_address >> HV_HYP_PAGE_SHIFT; + + rg =3D mshv_partition_region_by_gfn_get(pt, gfn); + if (rg =3D=3D NULL) + return false; + if (rg->mreg_type !=3D MSHV_REGION_TYPE_MMIO) + goto put_rg_out; + + uaddr =3D rg->start_uaddr + ((gfn - rg->start_gfn) << HV_HYP_PAGE_SHIFT); + + rc =3D mshv_chk_get_mmio_start_pfn(uaddr, &mmio_spa); + if (rc) + goto put_rg_out; + + if (!hv_nofull_mmio) { /* default case */ + mmio_spa =3D mmio_spa - (gfn - rg->start_gfn); + gfn =3D rg->start_gfn; + numpgs =3D rg->nr_pages; + } else + numpgs =3D 1; + + rc =3D hv_call_map_mmio_pages(pt->pt_id, gfn, mmio_spa, numpgs); + +put_rg_out: + mshv_region_put(rg); + return rc =3D=3D 0; +} + /** * mshv_handle_gpa_intercept - Handle GPA (Guest Physical Address) interce= pts. * @vp: Pointer to the virtual processor structure. @@ -699,6 +790,8 @@ static bool mshv_handle_gpa_intercept(struct mshv_vp *v= p) static bool mshv_vp_handle_intercept(struct mshv_vp *vp) { switch (vp->vp_intercept_msg_page->header.message_type) { + case HVMSG_UNMAPPED_GPA: + return mshv_handle_unmapped_gpa(vp); case HVMSG_GPA_INTERCEPT: return mshv_handle_gpa_intercept(vp); } @@ -1322,16 +1415,8 @@ static int mshv_prepare_pinned_region(struct mshv_me= m_region *region) } =20 /* - * This maps two things: guest RAM and for pci passthru mmio space. - * - * mmio: - * - vfio overloads vm_pgoff to store the mmio start pfn/spa. - * - Two things need to happen for mapping mmio range: - * 1. mapped in the uaddr so VMM can access it. - * 2. mapped in the hwpt (gfn <-> mmio phys addr) so guest can access it. - * - * This function takes care of the second. The first one is managed by v= fio, - * and hence is taken care of via vfio_pci_mmap_fault(). + * This is called for both user ram and mmio space. The mmio space is not + * mapped here, but later during intercept on demand. */ static long mshv_map_user_memory(struct mshv_partition *partition, @@ -1340,7 +1425,6 @@ mshv_map_user_memory(struct mshv_partition *partition, struct mshv_mem_region *region; struct vm_area_struct *vma; bool is_mmio; - ulong mmio_pfn; long ret; =20 if (mem->flags & BIT(MSHV_SET_MEM_BIT_UNMAP) || @@ -1350,7 +1434,6 @@ mshv_map_user_memory(struct mshv_partition *partition, mmap_read_lock(current->mm); vma =3D vma_lookup(current->mm, mem->userspace_addr); is_mmio =3D vma ? !!(vma->vm_flags & (VM_IO | VM_PFNMAP)) : 0; - mmio_pfn =3D is_mmio ? vma->vm_pgoff : 0; mmap_read_unlock(current->mm); =20 if (!vma) @@ -1376,11 +1459,7 @@ mshv_map_user_memory(struct mshv_partition *partitio= n, region->nr_pages, HV_MAP_GPA_NO_ACCESS, NULL); break; - case MSHV_REGION_TYPE_MMIO: - ret =3D hv_call_map_mmio_pages(partition->pt_id, - region->start_gfn, - mmio_pfn, - region->nr_pages); + default: break; } =20 --=20 2.51.2.vfs.0.1 From nobody Wed Jun 17 02:49:30 2026 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0019F383C85; Wed, 22 Apr 2026 02:33:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=13.77.154.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825253; cv=none; b=YrkOHnj9FOwwj7nw1CZp1Ld1uOFV79TLwKrKvxw2yMS4snepUxemD0bTTZQCVm3s/Ek+NAITKlqKCi2+T6bokfCqTMN9NBuUwh1bzdvT1ib5Z9HAcVmS1Uif1aqHSF6TAtL5APR0+DZkBgCtau6qaRFdi1Y6j7ZmJVcdwPWwv/U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776825253; c=relaxed/simple; bh=DeXjz0SRhQLAqrfm4RH5YKzrqZmSYmsxnp314WPB2yE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=W+DIVA2CQe5ZfBkp8UuPf8fgbsAgd4l4lbi7+JpF88xYeogFXluiaDBPRbSjqjRG4/yMZM0plvQ5Ec6/tGGhmKDJklqYOAD9elCNsqByhNe+YrY4mIseMQlTgbPOhl5i8N8riH2m5ohzOLP9LQvDDAMjmWG0aIEno7aZfWtG6B0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com; spf=pass smtp.mailfrom=linux.microsoft.com; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b=IZXylSwM; arc=none smtp.client-ip=13.77.154.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.microsoft.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="IZXylSwM" Received: from mrdev.corp.microsoft.com (192-184-212-33.fiber.dynamic.sonic.net [192.184.212.33]) by linux.microsoft.com (Postfix) with ESMTPSA id 3A9B320B6F0C; Tue, 21 Apr 2026 19:33:59 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 3A9B320B6F0C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1776825240; bh=y1jx/hTTfAyQ3CCJQjV0+0ow1B+C2JnMqOep7R/bNGA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=IZXylSwMUWnVpnK/zPOW1AocwdVaoMN1VvQgjP8RPfybnucqhnKEM2kR91gSL6erO GLdPDi6Ja3KA6eP6YyMcrj0hGsvermCipK5rA4kVfM37cx1oLOZby0eON+S6iBFfNM oML8I9tJJp7BjXpoDnF2iMxzXoEqvXYFsiVMOzGs= From: Mukesh R To: hpa@zytor.com, robin.murphy@arm.com, robh@kernel.org, wei.liu@kernel.org, mrathor@linux.microsoft.com, mhklinux@outlook.com, muislam@microsoft.com, namjain@linux.microsoft.com, magnuskulke@linux.microsoft.com, anbelski@linux.microsoft.com, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, iommu@lists.linux.dev, linux-pci@vger.kernel.org, linux-arch@vger.kernel.org Cc: kys@microsoft.com, haiyangz@microsoft.com, decui@microsoft.com, longli@microsoft.com, tglx@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, joro@8bytes.org, will@kernel.org, lpieralisi@kernel.org, kwilczynski@kernel.org, bhelgaas@google.com, arnd@arndb.de Subject: [PATCH V1 13/13] mshv: pin all ram mem regions if partition has device passthru Date: Tue, 21 Apr 2026 19:32:39 -0700 Message-ID: <20260422023239.1171963-14-mrathor@linux.microsoft.com> X-Mailer: git-send-email 2.51.2.vfs.0.1 In-Reply-To: <20260422023239.1171963-1-mrathor@linux.microsoft.com> References: <20260422023239.1171963-1-mrathor@linux.microsoft.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Given the sporadic errors, mostly from high end devices, when regions are not pinned and a PCI device is passthru'd to a VM, for now, force all regions for the VM to be pinned. Even tho VFIO may pin them, the regions would still be marked movable, so do it upfront in mshv. Signed-off-by: Mukesh R --- drivers/hv/mshv_root.h | 6 ++++++ drivers/hv/mshv_root_main.c | 5 ++++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h index b9880d0bdc4d..32260df84f86 100644 --- a/drivers/hv/mshv_root.h +++ b/drivers/hv/mshv_root.h @@ -141,6 +141,7 @@ struct mshv_partition { pid_t pt_vmm_tgid; bool import_completed; bool pt_initialized; + bool pt_regions_pinned; #if IS_ENABLED(CONFIG_DEBUG_FS) struct dentry *pt_stats_dentry; struct dentry *pt_vp_dentry; @@ -277,6 +278,11 @@ static inline bool mshv_partition_encrypted(struct msh= v_partition *partition) return partition->isolation_type =3D=3D HV_PARTITION_ISOLATION_TYPE_SNP; } =20 +static inline bool mshv_pt_regions_pinned(struct mshv_partition *pt) +{ + return pt->pt_regions_pinned || mshv_partition_encrypted(pt); +} + struct mshv_partition *mshv_partition_get(struct mshv_partition *partition= ); void mshv_partition_put(struct mshv_partition *partition); struct mshv_partition *mshv_partition_find(u64 partition_id) __must_hold(R= CU); diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c index a7864463961b..251cf88a2b0b 100644 --- a/drivers/hv/mshv_root_main.c +++ b/drivers/hv/mshv_root_main.c @@ -1333,7 +1333,7 @@ static int mshv_partition_create_region(struct mshv_p= artition *partition, =20 if (is_mmio) rg->mreg_type =3D MSHV_REGION_TYPE_MMIO; - else if (mshv_partition_encrypted(partition) || + else if (mshv_pt_regions_pinned(partition) || !mshv_region_movable_init(rg)) rg->mreg_type =3D MSHV_REGION_TYPE_MEM_PINNED; else @@ -1406,6 +1406,9 @@ static int mshv_prepare_pinned_region(struct mshv_mem= _region *region) goto err_out; } =20 + /* For now, all regions must be pinned if there is device passthru. */ + partition->pt_regions_pinned =3D true; + return 0; =20 invalidate_region: --=20 2.51.2.vfs.0.1