From nobody Fri Nov 22 07:10:50 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=quarantine dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1718986166; cv=none; d=zohomail.com; s=zohoarc; b=fOHqR76b+0DMIhx9+EeX3qr891oPeou0kwfOwwYabvCYcg2aex9izMFRlgvOFuF/hu72RgX0k2Jr0p0J/6boo50q/enMZ4rHXN7pvsz4TaDcUdMaZqeLfAThplD1IdD8vAQHd4UYY7NmHRPWiM58B1YMyarAA57PgZYz/ub6Mlo= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1718986166; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=rk8qCgsNjgkmAX/pojUQbJPkI3R/bP9yagRBiEKnV0s=; b=epSSfePA83k5N/OShABkDgDvNPRkaASKwQl2pUZu826YiqnDXhiLcECX37zYNp9LKfHZg3pd/BXcKHd3I+yHKLM8JY7XqYRZuOyrJEIgcmKFq62KjL26uH5pfVbk3zI7Hd4mHdMEcD3oJt/OuFU/vDb5DyGWHWkfSCBn0gWG9r8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1718986166733701.438990455498; Fri, 21 Jun 2024 09:09:26 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.745385.1152502 (Exim 4.92) (envelope-from ) id 1sKgop-0002yO-1M; Fri, 21 Jun 2024 16:08:55 +0000 Received: by outflank-mailman (output) from mailman id 745385.1152502; Fri, 21 Jun 2024 16:08:55 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1sKgoo-0002yH-UP; Fri, 21 Jun 2024 16:08:54 +0000 Received: by outflank-mailman (input) for mailman id 745385; Fri, 21 Jun 2024 16:08:53 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1sKgon-0002wx-El for xen-devel@lists.xenproject.org; Fri, 21 Jun 2024 16:08:53 +0000 Received: from mail187-10.suw11.mandrillapp.com (mail187-10.suw11.mandrillapp.com [198.2.187.10]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id 8bdcac4f-2fe8-11ef-b4bb-af5377834399; Fri, 21 Jun 2024 18:08:50 +0200 (CEST) Received: from pmta09.mandrill.prod.suw01.rsglab.com (localhost [127.0.0.1]) by mail187-10.suw11.mandrillapp.com (Mailchimp) with ESMTP id 4W5MjX4ZP0z5QkLm5 for ; Fri, 21 Jun 2024 16:08:48 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id 5926d64768ab4ac794063f0d056f783f; Fri, 21 Jun 2024 16:08:48 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 8bdcac4f-2fe8-11ef-b4bb-af5377834399 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1718986128; x=1719246628; bh=rk8qCgsNjgkmAX/pojUQbJPkI3R/bP9yagRBiEKnV0s=; h=From:Subject:To:Cc:Message-Id:Feedback-ID:Date:MIME-Version: Content-Type:Content-Transfer-Encoding:CC:Date:Subject:From; b=ZWvlm09KloYruoqr5ZxISN1mpApznROnS2qaMjWkjWUgl2WZ6geGeCyHk/WQgutNP Hi0wtebQDwX9ZrYf2Ul9RI3IJRw6PmEOVxqx1gQk0AM24nMat8AQI3/RyZiIaW0x6t F9Y3U72nU0NoMGXGXHUC7FRTGDHEVtr+U/XdIy7Bn4ZcIHDWe4WTQENLH7wYUo6c4M wAAxRFqTM682bOfnu7iDxdBsx7EBRU+oSeUrFtRfVqYh8Ro9Bak/7aJQv7ZKXw9ntR Mn7ro0m/lMNHPjPIv1zrwAtrpzNB9kPxUAUCBkX+qlwBXRdRXRoPEXu4swrN8EBDxa NzMjccKh82OCg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1718986128; x=1719246628; i=teddy.astie@vates.tech; bh=rk8qCgsNjgkmAX/pojUQbJPkI3R/bP9yagRBiEKnV0s=; h=From:Subject:To:Cc:Message-Id:Feedback-ID:Date:MIME-Version: Content-Type:Content-Transfer-Encoding:CC:Date:Subject:From; b=wXBHRiB2j/zmp7/N/IUbIJSCxYxvhsk+nuichfdQ2K6pPG534V3ANh6bsRCZeKkpG IUB/+0XAwWlBNnc8uQNq55iH6coDn4tbvif8gEZBc0iMQY8EVC0Vz4N39HSNZltLcK /6REDWjuOoBf40k7ge1t67+LGc9Ti/2FXWuswHjiHsG+XaULSfC1OrCkVRSmd9pZ/9 BSt+T+mA7GpQBVH1p7S77pBDnifd1x4jpKAFNTzUQvRfoZLPRSbkWfVXZUiyB7jXoM 2+dGht5vmSZ73RU23HJV4HfOvDQhgkNN0dCB2SJ3Wt52/renVsMnwugx30Q8xxqPEC mqJikToK/KBpQ== From: TSnake41 Subject: =?utf-8?Q?[RFC=20PATCH=20v2]=20iommu/xen:=20Add=20Xen=20PV-IOMMU=20driver?= X-Mailer: git-send-email 2.45.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1718986126719 To: xen-devel@lists.xenproject.org, iommu@lists.linux.dev Cc: Teddy Astie , Juergen Gross , Stefano Stabellini , Oleksandr Tyshchenko , Joerg Roedel , Will Deacon , Robin Murphy , =?utf-8?Q?Marek=20Marczykowski-G=C3=B3recki?= Message-Id: <24d7ec005e77e4e0127995ba6f4ad16f33737fa5.1718981216.git.teddy.astie@vates.tech> X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.5926d64768ab4ac794063f0d056f783f?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20240621:md Date: Fri, 21 Jun 2024 16:08:48 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity teddy.astie@vates.tech) (identity @mandrillapp.com) X-ZM-MESSAGEID: 1718986168053100001 Content-Type: text/plain; charset="utf-8" From: Teddy Astie In the context of Xen, Linux runs as Dom0 and doesn't have access to the machine IOMMU. Although, a IOMMU is mandatory to use some kernel features such as VFIO or DMA protection. In Xen, we added a paravirtualized IOMMU with iommu_op hypercall in order to allow Dom0 to implement such feature. This commit introduces a new IOMMU dr= iver that uses this new hypercall interface. Signed-off-by Teddy Astie --- Changes since v1 : * formatting changes * applied Jan Beulich proposed changes : removed vim notes at end of pv-iom= mu.h * applied Jason Gunthorpe proposed changes : use new ops and remove redunda= nt checks --- arch/x86/include/asm/xen/hypercall.h | 6 + drivers/iommu/Kconfig | 9 + drivers/iommu/Makefile | 1 + drivers/iommu/xen-iommu.c | 489 +++++++++++++++++++++++++++ include/xen/interface/memory.h | 33 ++ include/xen/interface/pv-iommu.h | 104 ++++++ include/xen/interface/xen.h | 1 + 7 files changed, 643 insertions(+) create mode 100644 drivers/iommu/xen-iommu.c create mode 100644 include/xen/interface/pv-iommu.h diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xe= n/hypercall.h index a2dd24947eb8..6b1857f27c14 100644 --- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -490,6 +490,12 @@ HYPERVISOR_xenpmu_op(unsigned int op, void *arg) return _hypercall2(int, xenpmu_op, op, arg); } =20 +static inline int +HYPERVISOR_iommu_op(void *arg) +{ + return _hypercall1(int, iommu_op, arg); +} + static inline int HYPERVISOR_dm_op( domid_t dom, unsigned int nr_bufs, struct xen_dm_op_buf *bufs) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 0af39bbbe3a3..242cefac77c9 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -480,6 +480,15 @@ config VIRTIO_IOMMU =20 Say Y here if you intend to run this kernel as a guest. =20 +config XEN_IOMMU + bool "Xen IOMMU driver" + depends on XEN_DOM0 + select IOMMU_API + help + Xen PV-IOMMU driver for Dom0. + + Say Y here if you intend to run this guest as Xen Dom0. + config SPRD_IOMMU tristate "Unisoc IOMMU Support" depends on ARCH_SPRD || COMPILE_TEST diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 542760d963ec..393afe22c901 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -30,3 +30,4 @@ obj-$(CONFIG_IOMMU_SVA) +=3D iommu-sva.o obj-$(CONFIG_IOMMU_IOPF) +=3D io-pgfault.o obj-$(CONFIG_SPRD_IOMMU) +=3D sprd-iommu.o obj-$(CONFIG_APPLE_DART) +=3D apple-dart.o +obj-$(CONFIG_XEN_IOMMU) +=3D xen-iommu.o \ No newline at end of file diff --git a/drivers/iommu/xen-iommu.c b/drivers/iommu/xen-iommu.c new file mode 100644 index 000000000000..b765445d27cd --- /dev/null +++ b/drivers/iommu/xen-iommu.c @@ -0,0 +1,489 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Xen PV-IOMMU driver. + * + * Copyright (C) 2024 Vates SAS + * + * Author: Teddy Astie + * + */ + +#define pr_fmt(fmt) "xen-iommu: " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include + +MODULE_DESCRIPTION("Xen IOMMU driver"); +MODULE_AUTHOR("Teddy Astie "); +MODULE_LICENSE("GPL"); + +#define MSI_RANGE_START (0xfee00000) +#define MSI_RANGE_END (0xfeefffff) + +#define XEN_IOMMU_PGSIZES (0x1000) + +struct xen_iommu_domain { + struct iommu_domain domain; + + u16 ctx_no; /* Xen PV-IOMMU context number */ +}; + +static struct iommu_device xen_iommu_device; + +static uint32_t max_nr_pages; +static uint64_t max_iova_addr; + +static spinlock_t lock; + +static inline struct xen_iommu_domain *to_xen_iommu_domain(struct iommu_do= main *dom) +{ + return container_of(dom, struct xen_iommu_domain, domain); +} + +static inline u64 addr_to_pfn(u64 addr) +{ + return addr >> 12; +} + +static inline u64 pfn_to_addr(u64 pfn) +{ + return pfn << 12; +} + +bool xen_iommu_capable(struct device *dev, enum iommu_cap cap) +{ + switch (cap) { + case IOMMU_CAP_CACHE_COHERENCY: + return true; + + default: + return false; + } +} + +struct iommu_domain *xen_iommu_domain_alloc_paging(struct device *dev) +{ + struct xen_iommu_domain *domain; + int ret; + + struct pv_iommu_op op =3D { + .ctx_no =3D 0, + .flags =3D 0, + .subop_id =3D IOMMUOP_alloc_context + }; + + ret =3D HYPERVISOR_iommu_op(&op); + + if (ret) { + pr_err("Unable to create Xen IOMMU context (%d)", ret); + return ERR_PTR(ret); + } + + domain =3D kzalloc(sizeof(*domain), GFP_KERNEL); + + domain->ctx_no =3D op.ctx_no; + + domain->domain.geometry.aperture_start =3D 0; + domain->domain.geometry.aperture_end =3D max_iova_addr; + domain->domain.geometry.force_aperture =3D true; + + return &domain->domain; +} + +static struct iommu_device *xen_iommu_probe_device(struct device *dev) +{ + if (!dev_is_pci(dev)) + return ERR_PTR(-ENODEV); + + return &xen_iommu_device; +} + +static void xen_iommu_probe_finalize(struct device *dev) +{ + set_dma_ops(dev, NULL); + iommu_setup_dma_ops(dev, 0, max_iova_addr); +} + +static int xen_iommu_map_pages(struct iommu_domain *domain, unsigned long = iova, + phys_addr_t paddr, size_t pgsize, size_t pgcount, + int prot, gfp_t gfp, size_t *mapped) +{ + size_t xen_pg_count =3D (pgsize / XEN_PAGE_SIZE) * pgcount; + struct xen_iommu_domain *dom =3D to_xen_iommu_domain(domain); + struct pv_iommu_op op =3D { + .subop_id =3D IOMMUOP_map_pages, + .flags =3D 0, + .ctx_no =3D dom->ctx_no + }; + /* NOTE: paddr is actually bound to pfn, not gfn */ + uint64_t pfn =3D addr_to_pfn(paddr); + uint64_t dfn =3D addr_to_pfn(iova); + int ret =3D 0; + + //pr_info("Mapping to %lx %zu %zu paddr %x\n", iova, pgsize, pgcount, pad= dr); + + if (prot & IOMMU_READ) + op.flags |=3D IOMMU_OP_readable; + + if (prot & IOMMU_WRITE) + op.flags |=3D IOMMU_OP_writeable; + + while (xen_pg_count) { + size_t to_map =3D min(xen_pg_count, max_nr_pages); + uint64_t gfn =3D pfn_to_gfn(pfn); + + //pr_info("Mapping %lx-%lx at %lx-%lx\n", gfn, gfn + to_map - 1, dfn, df= n + to_map - 1); + + op.map_pages.gfn =3D gfn; + op.map_pages.dfn =3D dfn; + + op.map_pages.nr_pages =3D to_map; + + ret =3D HYPERVISOR_iommu_op(&op); + + //pr_info("map_pages.mapped =3D %u\n", op.map_pages.mapped); + + if (mapped) + *mapped +=3D XEN_PAGE_SIZE * op.map_pages.mapped; + + if (ret) + break; + + xen_pg_count -=3D to_map; + + pfn +=3D to_map; + dfn +=3D to_map; + } + + return ret; +} + +static size_t xen_iommu_unmap_pages(struct iommu_domain *domain, unsigned = long iova, + size_t pgsize, size_t pgcount, + struct iommu_iotlb_gather *iotlb_gather) +{ + size_t xen_pg_count =3D (pgsize / XEN_PAGE_SIZE) * pgcount; + struct xen_iommu_domain *dom =3D to_xen_iommu_domain(domain); + struct pv_iommu_op op =3D { + .subop_id =3D IOMMUOP_unmap_pages, + .ctx_no =3D dom->ctx_no, + .flags =3D 0, + }; + uint64_t dfn =3D addr_to_pfn(iova); + int ret =3D 0; + + if (WARN(!dom->ctx_no, "Tried to unmap page to default context")) + return -EINVAL; + + while (xen_pg_count) { + size_t to_unmap =3D min(xen_pg_count, max_nr_pages); + + //pr_info("Unmapping %lx-%lx\n", dfn, dfn + to_unmap - 1); + + op.unmap_pages.dfn =3D dfn; + op.unmap_pages.nr_pages =3D to_unmap; + + ret =3D HYPERVISOR_iommu_op(&op); + + if (ret) + pr_warn("Unmap failure (%lx-%lx)\n", dfn, dfn + to_unmap - 1); + + xen_pg_count -=3D to_unmap; + + dfn +=3D to_unmap; + } + + return pgcount * pgsize; +} + +int xen_iommu_attach_dev(struct iommu_domain *domain, struct device *dev) +{ + struct pci_dev *pdev; + struct xen_iommu_domain *dom =3D to_xen_iommu_domain(domain); + struct pv_iommu_op op =3D { + .subop_id =3D IOMMUOP_reattach_device, + .flags =3D 0, + .ctx_no =3D dom->ctx_no, + }; + + pdev =3D to_pci_dev(dev); + + op.reattach_device.dev.seg =3D pci_domain_nr(pdev->bus); + op.reattach_device.dev.bus =3D pdev->bus->number; + op.reattach_device.dev.devfn =3D pdev->devfn; + + return HYPERVISOR_iommu_op(&op); +} + +static void xen_iommu_free(struct iommu_domain *domain) +{ + int ret; + struct xen_iommu_domain *dom =3D to_xen_iommu_domain(domain); + + if (dom->ctx_no !=3D 0) { + struct pv_iommu_op op =3D { + .ctx_no =3D dom->ctx_no, + .flags =3D 0, + .subop_id =3D IOMMUOP_free_context + }; + + ret =3D HYPERVISOR_iommu_op(&op); + + if (ret) + pr_err("Context %hu destruction failure\n", dom->ctx_no); + } + + kfree(domain); +} + +static phys_addr_t xen_iommu_iova_to_phys(struct iommu_domain *domain, dma= _addr_t iova) +{ + int ret; + struct xen_iommu_domain *dom =3D to_xen_iommu_domain(domain); + + struct pv_iommu_op op =3D { + .ctx_no =3D dom->ctx_no, + .flags =3D 0, + .subop_id =3D IOMMUOP_lookup_page, + }; + + op.lookup_page.dfn =3D addr_to_pfn(iova); + + ret =3D HYPERVISOR_iommu_op(&op); + + if (ret) + return 0; + + phys_addr_t page_addr =3D pfn_to_addr(gfn_to_pfn(op.lookup_page.gfn)); + + /* Consider non-aligned iova */ + return page_addr + (iova & 0xFFF); +} + +static void xen_iommu_get_resv_regions(struct device *dev, struct list_hea= d *head) +{ + struct iommu_resv_region *reg; + struct xen_reserved_device_memory *entries; + struct xen_reserved_device_memory_map map; + struct pci_dev *pdev; + int ret, i; + + pdev =3D to_pci_dev(dev); + + reg =3D iommu_alloc_resv_region(MSI_RANGE_START, + MSI_RANGE_END - MSI_RANGE_START + 1, + 0, IOMMU_RESV_MSI, GFP_KERNEL); + + if (!reg) + return; + + list_add_tail(®->list, head); + + /* Map xen-specific entries */ + + /* First, get number of entries to map */ + map.buffer =3D NULL; + map.nr_entries =3D 0; + map.flags =3D 0; + + map.dev.pci.seg =3D pci_domain_nr(pdev->bus); + map.dev.pci.bus =3D pdev->bus->number; + map.dev.pci.devfn =3D pdev->devfn; + + ret =3D HYPERVISOR_memory_op(XENMEM_reserved_device_memory_map, &map); + + if (ret =3D=3D 0) + /* No reserved region, nothing to do */ + return; + + if (ret !=3D -ENOBUFS) { + pr_err("Unable to get reserved region count (%d)\n", ret); + return; + } + + /* Assume a reasonable number of entries, otherwise, something is probabl= y wrong */ + if (WARN_ON(map.nr_entries > 256)) + pr_warn("Xen reporting many reserved regions (%u)\n", map.nr_entries); + + /* And finally get actual mappings */ + entries =3D kcalloc(map.nr_entries, sizeof(struct xen_reserved_device_mem= ory), + GFP_KERNEL); + + if (!entries) { + pr_err("No memory for map entries\n"); + return; + } + + map.buffer =3D entries; + + ret =3D HYPERVISOR_memory_op(XENMEM_reserved_device_memory_map, &map); + + if (ret !=3D 0) { + pr_err("Unable to get reserved regions (%d)\n", ret); + kfree(entries); + return; + } + + for (i =3D 0; i < map.nr_entries; i++) { + struct xen_reserved_device_memory entry =3D entries[i]; + + reg =3D iommu_alloc_resv_region(pfn_to_addr(entry.start_pfn), + pfn_to_addr(entry.nr_pages), + 0, IOMMU_RESV_RESERVED, GFP_KERNEL); + + if (!reg) + break; + + list_add_tail(®->list, head); + } + + kfree(entries); +} + +static int default_domain_attach_dev(struct iommu_domain *domain, + struct device *dev) +{ + int ret; + struct pci_dev *pdev; + struct pv_iommu_op op =3D { + .subop_id =3D IOMMUOP_reattach_device, + .flags =3D 0, + .ctx_no =3D 0 /* reattach device back to default context */ + }; + + pdev =3D to_pci_dev(dev); + + op.reattach_device.dev.seg =3D pci_domain_nr(pdev->bus); + op.reattach_device.dev.bus =3D pdev->bus->number; + op.reattach_device.dev.devfn =3D pdev->devfn; + + ret =3D HYPERVISOR_iommu_op(&op); + + if (ret) + pr_warn("Unable to release device %p\n", &op.reattach_device.dev); + + return ret; +} + +static struct iommu_domain default_domain =3D { + .ops =3D &(const struct iommu_domain_ops){ + .attach_dev =3D default_domain_attach_dev + } +}; + +static struct iommu_ops xen_iommu_ops =3D { + .identity_domain =3D &default_domain, + .release_domain =3D &default_domain, + .capable =3D xen_iommu_capable, + .domain_alloc_paging =3D xen_iommu_domain_alloc_paging, + .probe_device =3D xen_iommu_probe_device, + .probe_finalize =3D xen_iommu_probe_finalize, + .device_group =3D pci_device_group, + .get_resv_regions =3D xen_iommu_get_resv_regions, + .pgsize_bitmap =3D XEN_IOMMU_PGSIZES, + .default_domain_ops =3D &(const struct iommu_domain_ops) { + .map_pages =3D xen_iommu_map_pages, + .unmap_pages =3D xen_iommu_unmap_pages, + .attach_dev =3D xen_iommu_attach_dev, + .iova_to_phys =3D xen_iommu_iova_to_phys, + .free =3D xen_iommu_free, + }, +}; + +int __init xen_iommu_init(void) +{ + int ret; + struct pv_iommu_op op =3D { + .subop_id =3D IOMMUOP_query_capabilities + }; + + if (!xen_domain()) + return -ENODEV; + + /* Check if iommu_op is supported */ + if (HYPERVISOR_iommu_op(&op) =3D=3D -ENOSYS) + return -ENODEV; /* No Xen IOMMU hardware */ + + pr_info("Initialising Xen IOMMU driver\n"); + pr_info("max_nr_pages=3D%d\n", op.cap.max_nr_pages); + pr_info("max_ctx_no=3D%d\n", op.cap.max_ctx_no); + pr_info("max_iova_addr=3D%llx\n", op.cap.max_iova_addr); + + if (op.cap.max_ctx_no =3D=3D 0) { + pr_err("Unable to use IOMMU PV driver (no context available)\n"); + return -ENOTSUPP; /* Unable to use IOMMU PV ? */ + } + + if (xen_domain_type =3D=3D XEN_PV_DOMAIN) + /* TODO: In PV domain, due to the existing pfn-gfn mapping we need to + * consider that under certains circonstances, we have : + * pfn_to_gfn(x + 1) !=3D pfn_to_gfn(x) + 1 + * + * In these cases, we would want to separate the subop into several call= s. + * (only doing the grouped operation when the mapping is actually contig= ous) + * Only map operation would be affected, as unmap actually uses dfn which + * doesn't have this kind of mapping. + * + * Force single-page operations to work arround this issue for now. + */ + max_nr_pages =3D 1; + else + /* With HVM domains, pfn_to_gfn is identity, there is no issue regarding= this. */ + max_nr_pages =3D op.cap.max_nr_pages; + + max_iova_addr =3D op.cap.max_iova_addr; + + spin_lock_init(&lock); + + ret =3D iommu_device_sysfs_add(&xen_iommu_device, NULL, NULL, "xen-iommu"= ); + if (ret) { + pr_err("Unable to add Xen IOMMU sysfs\n"); + return ret; + } + + ret =3D iommu_device_register(&xen_iommu_device, &xen_iommu_ops, NULL); + if (ret) { + pr_err("Unable to register Xen IOMMU device %d\n", ret); + iommu_device_sysfs_remove(&xen_iommu_device); + return ret; + } + + /* swiotlb is redundant when IOMMU is active. */ + x86_swiotlb_enable =3D false; + + return 0; +} + +void __exit xen_iommu_fini(void) +{ + pr_info("Unregistering Xen IOMMU driver\n"); + + iommu_device_unregister(&xen_iommu_device); + iommu_device_sysfs_remove(&xen_iommu_device); +} + +module_init(xen_iommu_init); +module_exit(xen_iommu_fini); diff --git a/include/xen/interface/memory.h b/include/xen/interface/memory.h index 1a371a825c55..c860acaf4b0e 100644 --- a/include/xen/interface/memory.h +++ b/include/xen/interface/memory.h @@ -10,6 +10,7 @@ #ifndef __XEN_PUBLIC_MEMORY_H__ #define __XEN_PUBLIC_MEMORY_H__ =20 +#include #include =20 /* @@ -214,6 +215,38 @@ struct xen_add_to_physmap_range { }; DEFINE_GUEST_HANDLE_STRUCT(xen_add_to_physmap_range); =20 +/* + * With some legacy devices, certain guest-physical addresses cannot safely + * be used for other purposes, e.g. to map guest RAM. This hypercall + * enumerates those regions so the toolstack can avoid using them. + */ +#define XENMEM_reserved_device_memory_map 27 +struct xen_reserved_device_memory { + xen_pfn_t start_pfn; + xen_ulong_t nr_pages; +}; +DEFINE_GUEST_HANDLE_STRUCT(xen_reserved_device_memory); + +struct xen_reserved_device_memory_map { +#define XENMEM_RDM_ALL 1 /* Request all regions (ignore dev union). */ + /* IN */ + uint32_t flags; + /* + * IN/OUT + * + * Gets set to the required number of entries when too low, + * signaled by error code -ERANGE. + */ + unsigned int nr_entries; + /* OUT */ + GUEST_HANDLE(xen_reserved_device_memory) buffer; + /* IN */ + union { + struct physdev_pci_device pci; + } dev; +}; +DEFINE_GUEST_HANDLE_STRUCT(xen_reserved_device_memory_map); + /* * Returns the pseudo-physical memory map as it was when the domain * was started (specified by XENMEM_set_memory_map). diff --git a/include/xen/interface/pv-iommu.h b/include/xen/interface/pv-io= mmu.h new file mode 100644 index 000000000000..8a8d366e5f4c --- /dev/null +++ b/include/xen/interface/pv-iommu.h @@ -0,0 +1,104 @@ +/* SPDX-License-Identifier: MIT */ +/*************************************************************************= ***** + * pv-iommu.h + * + * Paravirtualized IOMMU driver interface. + * + * Copyright (c) 2024 Teddy Astie + */ + +#ifndef __XEN_PUBLIC_PV_IOMMU_H__ +#define __XEN_PUBLIC_PV_IOMMU_H__ + +#include "xen.h" +#include "physdev.h" + +#define IOMMU_DEFAULT_CONTEXT (0) + +/** + * Query PV-IOMMU capabilities for this domain. + */ +#define IOMMUOP_query_capabilities 1 + +/** + * Allocate an IOMMU context, the new context handle will be written to ct= x_no. + */ +#define IOMMUOP_alloc_context 2 + +/** + * Destroy a IOMMU context. + * All devices attached to this context are reattached to default context. + * + * The default context can't be destroyed (0). + */ +#define IOMMUOP_free_context 3 + +/** + * Reattach the device to IOMMU context. + */ +#define IOMMUOP_reattach_device 4 + +#define IOMMUOP_map_pages 5 +#define IOMMUOP_unmap_pages 6 + +/** + * Get the GFN associated to a specific DFN. + */ +#define IOMMUOP_lookup_page 7 + +struct pv_iommu_op { + uint16_t subop_id; + uint16_t ctx_no; + +/** + * Create a context that is cloned from default. + * The new context will be populated with 1:1 mappings covering the entire= guest memory. + */ +#define IOMMU_CREATE_clone (1 << 0) + +#define IOMMU_OP_readable (1 << 0) +#define IOMMU_OP_writeable (1 << 1) + uint32_t flags; + + union { + struct { + uint64_t gfn; + uint64_t dfn; + /* Number of pages to map */ + uint32_t nr_pages; + /* Number of pages actually mapped after sub-op */ + uint32_t mapped; + } map_pages; + + struct { + uint64_t dfn; + /* Number of pages to unmap */ + uint32_t nr_pages; + /* Number of pages actually unmapped after sub-op */ + uint32_t unmapped; + } unmap_pages; + + struct { + struct physdev_pci_device dev; + } reattach_device; + + struct { + uint64_t gfn; + uint64_t dfn; + } lookup_page; + + struct { + /* Maximum number of IOMMU context this domain can use. */ + uint16_t max_ctx_no; + /* Maximum number of pages that can be modified in a single ma= p/unmap operation. */ + uint32_t max_nr_pages; + /* Maximum device address (iova) that the guest can use for ma= ppings. */ + uint64_t max_iova_addr; + } cap; + }; +}; + +typedef struct pv_iommu_op pv_iommu_op_t; +DEFINE_GUEST_HANDLE_STRUCT(pv_iommu_op_t); + +#endif diff --git a/include/xen/interface/xen.h b/include/xen/interface/xen.h index 0ca23eca2a9c..8b1daf3fecc6 100644 --- a/include/xen/interface/xen.h +++ b/include/xen/interface/xen.h @@ -65,6 +65,7 @@ #define __HYPERVISOR_xc_reserved_op 39 /* reserved for XenClient */ #define __HYPERVISOR_xenpmu_op 40 #define __HYPERVISOR_dm_op 41 +#define __HYPERVISOR_iommu_op 43 =20 /* Architecture-specific hypercall definitions. */ #define __HYPERVISOR_arch_0 48 --=20 2.45.2 Teddy Astie | Vates XCP-ng Intern XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech