From nobody Sun Feb 8 01:17:05 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=reject dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1763637015; cv=none; d=zohomail.com; s=zohoarc; b=AJ9UVuSF8+Ar9Ar4LBZQ/u9SCgqfYUtjhyvnT/NZ1nAnYFS6X5qX2CPZo8jvgpG6hfQcL8owLWdmeXYJCSzINpLHYTK+v2Qi1l8O8BXp3ersPCRBi4oiWSp/42ogbnqBBhhC3QZ1e7IozLGtSK6epXSHS7bDiCbdVF8q2omEtpY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1763637015; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=udBt1y9h/YNLYss5IYUydlCXTo7q/nhKS0HCbYJbk70=; b=WEjuDQLx/8HvmKydgG4QSypLsQazXTAAg8GsvoWifI4t1i+Mw8bG0zEDlT7bzBWENspkLvRDm0kL+GUrYwenaeDQlCn5r578NrRQhJdJmMN9dFsIjgtA4Gda5UI86dXil6aa6wPVxtOI0WH9HzzG5d7OKqKrKB+CObUS9meCpfg= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1763637014588602.0023241388202; Thu, 20 Nov 2025 03:10:14 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.1166857.1493286 (Exim 4.92) (envelope-from ) id 1vM2Xy-0001dh-0R; Thu, 20 Nov 2025 11:09:54 +0000 Received: by outflank-mailman (output) from mailman id 1166857.1493286; Thu, 20 Nov 2025 11:09:53 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2Xx-0001da-Rh; Thu, 20 Nov 2025 11:09:53 +0000 Received: by outflank-mailman (input) for mailman id 1166857; Thu, 20 Nov 2025 11:09:52 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2Xw-0001PI-Mn for xen-devel@lists.xenproject.org; Thu, 20 Nov 2025 11:09:52 +0000 Received: from mail128-17.atl41.mandrillapp.com (mail128-17.atl41.mandrillapp.com [198.2.128.17]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 6ee32b85-c601-11f0-9d18-b5c5bf9af7f9; Thu, 20 Nov 2025 12:09:51 +0100 (CET) Received: from pmta08.mandrill.prod.atl01.rsglab.com (localhost [127.0.0.1]) by mail128-17.atl41.mandrillapp.com (Mailchimp) with ESMTP id 4dBwby2brDzCf9P1D for ; Thu, 20 Nov 2025 11:09:50 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id a33abe37a54b40cd96edbc62b21e7e78; Thu, 20 Nov 2025 11:09:50 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 6ee32b85-c601-11f0-9d18-b5c5bf9af7f9 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1763636990; x=1763906990; bh=udBt1y9h/YNLYss5IYUydlCXTo7q/nhKS0HCbYJbk70=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=EI3XA86z6I8ZtuNT+rICdeUz9coL0/bK7i6bBiCIJg8nag+FTnqmqRNBPU4nI6K8B blitUNSWGkytWZfpAlxlE8H0ZVHw63GjK/HJRenQ1wuC07fbbf2pUXRuPmveSIEtZv Rqxo1cnxMCvpx2OIwpFnn2i3G/iBTz7LB5tteZg63I8OqElgkF6Z6a5LqAfrIdOgcI FeH+P8KJ6c/KGBDdpjrZQJldf3BVbw/HuJEP+X0GRBk8aNB/AefDKu7Y/KP81q5+2m noIBwDN60sntFi4czCTUJ9E2+eY3dVvpPdDXtE37NamLUUxJuQgLaorCEpelaoEQh3 46Niovirayeew== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1763636990; x=1763897490; i=teddy.astie@vates.tech; bh=udBt1y9h/YNLYss5IYUydlCXTo7q/nhKS0HCbYJbk70=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=U9xdiinJwnoWvbrblp0KXhriS0NfFnL43hahC4SMiX32n58gw1KxjCzYgT9EX6UHY vJGRBTB9CGcVeX2f5FJ/AhCVu9NDDKfBtICKnBcBpW02xu9bZdxiGQcEcWxysc9U2Y kYXtmz3AztzCwh9boTRL/6ceUwFOdhryo9BVb8h37GOD8YNcCHxw0RJgeCP9Od9KPD 8W1AtsSQPqG8JauG/RGx+5GPnphV7e23dOY6agIa/N0SoR3EFC9ImTadRQ/JiKbF3I L6iiE42TW4tE4SzvMoT3mSCdBXF9SeZcYrJTa0wlBs4hLCAnGT2KN3twpB2sQfqVPL DvyLsCL350dAQ== From: "Teddy Astie" Subject: =?utf-8?Q?[RFC=20PATCH=20v7=2001/14]=20docs/designs:=20Add=20a=20design=20document=20for=20IOMMU=20subsystem=20redesign?= X-Mailer: git-send-email 2.51.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1763636986531 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Andrew Cooper" , "Anthony PERARD" , "Michal Orzel" , "Jan Beulich" , "Julien Grall" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" , "Stefano Stabellini" Message-Id: <71a207c3a55036a426381bf9ae3020cf56557ba0.1763569135.git.teddy.astie@vates.tech> In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.a33abe37a54b40cd96edbc62b21e7e78?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20251120:md Date: Thu, 20 Nov 2025 11:09:50 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @mandrillapp.com) (identity teddy.astie@vates.tech) X-ZM-MESSAGEID: 1763637019815018900 Content-Type: text/plain; charset="utf-8" Current IOMMU subsystem has some limitations that make PV-IOMMU practically= impossible. One of them is the assumtion that each domain is bound to a single "IOMMU d= omain", which also causes complications with quarantine implementation. Moreover, current IOMMU subsystem is not entirely well-defined, for instanc= e, the behavior of map_page between ARM SMMUv3 and x86 VT-d/AMD-Vi greatly differs. On ARM,= it can modifies the domain page table while on x86, it may be forbidden (e.g using HAP with= PVH), or only modifying the devices PoV (e.g using PV). The goal of this redesign is to define more explicitely the behavior and in= terface of the IOMMU subsystem while allowing PV-IOMMU to be effectively implemented. Signed-off-by Teddy Astie --- docs/designs/iommu-contexts.md | 403 +++++++++++++++++++++++++++++++++ 1 file changed, 403 insertions(+) create mode 100644 docs/designs/iommu-contexts.md diff --git a/docs/designs/iommu-contexts.md b/docs/designs/iommu-contexts.md new file mode 100644 index 0000000000..d61c5fcde2 --- /dev/null +++ b/docs/designs/iommu-contexts.md @@ -0,0 +1,403 @@ +# IOMMU context management in Xen + +Status: Experimental +Revision: 0 + +# Background + +The design for *IOMMU paravirtualization for Dom0* [1] explains that some = guests may +want to access to IOMMU features. In order to implement this in Xen, sever= al adjustments +needs to be made to the IOMMU subsystem. + +This "hardware IOMMU domain" is currently implemented on a per-domain basi= s such as each +domain actually has a specific *hardware IOMMU domain*, this design aims t= o allow a +single Xen domain to manage several "IOMMU context", and allow some domain= s (e.g Dom0 +[1]) to modify their IOMMU contexts. + +In addition to this, quarantine feature can be refactored into using IOMMU= contexts +to reduce the complexity of platform-specific implementations and ensuring= more +consistency across platforms. + +# IOMMU context + +We define a "IOMMU context" as being a *hardware IOMMU domain*, but named = as a context +to avoid confusion with Xen domains. +It represents some hardware-specific data structure that contains mappings= from a device +frame-number to a machine frame-number (e.g using a pagetable) that can be= applied to +a device using IOMMU hardware. + +This structure is bound to a Xen domain, but a Xen domain may have several= IOMMU context. +These contexts may be modifiable using the interface as defined in [1] asi= de some +specific cases (e.g modifying default context). + +This is implemented in Xen as a new structure that will hold context-speci= fic +data. + +```c +struct iommu_context { + u16 id; /* Context id (0 means default context) */ + struct list_head devices; + + struct arch_iommu_context arch; + + bool opaque; /* context can't be modified nor accessed (e.g HAP) */ +}; +``` + +A context is identified by a number that is domain-specific and may be use= d by IOMMU +users such as PV-IOMMU by the guest. + +struct arch_iommu_context is splited from struct arch_iommu + +```c +struct arch_iommu_context +{ + spinlock_t pgtables_lock; + struct page_list_head pgtables; + + union { + /* Intel VT-d */ + struct { + uint64_t pgd_maddr; /* io page directory machine address */ + domid_t *didmap; /* per-iommu DID */ + unsigned long *iommu_bitmap; /* bitmap of iommu(s) that the co= ntext uses */ + } vtd; + /* AMD IOMMU */ + struct { + struct page_info *root_table; + } amd; + }; +}; + +struct arch_iommu +{ + spinlock_t mapping_lock; /* io page table lock */ + struct { + struct page_list_head list; + spinlock_t lock; + } pgtables; + + struct list_head identity_maps; + + union { + /* Intel VT-d */ + struct { + /* no more context-specific values */ + unsigned int agaw; /* adjusted guest address width, 0 is level= 2 30-bit */ + } vtd; + /* AMD IOMMU */ + struct { + unsigned int paging_mode; + struct guest_iommu *g_iommu; + } amd; + }; +}; +``` + +IOMMU context information is now carried by iommu_context rather than bein= g integrated to +struct arch_iommu. + +# Xen domain IOMMU structure + +`struct domain_iommu` is modified to allow multiples context within a sing= le Xen domain +to exist : + +```c +struct iommu_context_list { + uint16_t count; /* Context count excluding default context */ + + /* if count > 0 */ + + uint64_t *bitmap; /* bitmap of context allocation */ + struct iommu_context *map; /* Map of contexts */ +}; + +struct domain_iommu { + /* ... */ + + struct iommu_context default_ctx; + struct iommu_context_list other_contexts; + + /* ... */ +} +``` + +default_ctx is a special context with id=3D0 that holds the page table map= ping the entire +domain, which basically preserve the previous behavior. All devices are ex= pected to be +bound to this context during initialization. + +Along with this default context that always exist, we use a pool of contex= ts that has a +fixed size at domain initialization, where contexts can be allocated (if p= ossible), and +have a id matching their position in the map (considering that id !=3D 0). +These contexts may be used by IOMMU contexts users such as PV-IOMMU or qua= rantine domain +(DomIO). + +# Platform independent context management interface + +A new platform independant interface is introduced in Xen hypervisor to al= low +IOMMU contexts users to create and manage contexts within domains. + +```c +/* Direct context access functions (not supposed to be used directly) */ +struct iommu_context *iommu_get_context(struct domain *d, u16 ctx_id); +void iommu_put_context(struct iommu_context *ctx); + +/* Flag for default context initialization */ +#define IOMMU_CONTEXT_INIT_default (1 << 0) + +/* Flag for quarantine contexts (scratch page, DMA Abort mode, ...) */ +#define IOMMU_CONTEXT_INIT_quarantine (1 << 1) + +int iommu_context_init(struct domain *d, struct iommu_context *ctx, u16 ct= x_id, u32 flags); + +/* Flag to specify that devices will need to be reattached to default doma= in */ +#define IOMMU_TEARDOWN_REATTACH_DEFAULT (1 << 0) + +/* + * Flag to specify that the context needs to be destroyed preemptively + * (multiple calls to iommu_context_teardown will be required) + */ +#define IOMMU_TEARDOWN_PREEMPT (1 << 1) + +int iommu_context_teardown(struct domain *d, struct iommu_context *ctx, u3= 2 flags); + +/* Allocate a new context, uses CONTEXT_INIT flags */ +int iommu_context_alloc(struct domain *d, u16 *ctx_id, u32 flags); + +/* Free a context, uses CONTEXT_TEARDOWN flags */ +int iommu_context_free(struct domain *d, u16 ctx_id, u32 flags); + +/* Move a device from one context to another, including between different = domains. */ +int iommu_reattach_context(struct domain *prev_dom, struct domain *next_do= m, + device_t *dev, u16 ctx_id); + +/* Add a device to a context for first initialization */ +int iommu_attach_context(struct domain *d, device_t *dev, u16 ctx_id); + +/* Remove a device from a context, effectively removing it from the IOMMU.= */ +int iommu_detach_context(struct domain *d, device_t *dev); +``` + +This interface will use a new interface with drivers to implement these fe= atures. + +Some existing functions will have a new parameter to specify on what conte= xt to do the operation. +- iommu_map (iommu_legacy_map untouched) +- iommu_unmap (iommu_legacy_unmap untouched) +- iommu_lookup_page +- iommu_iotlb_flush + +These functions will modify the iommu_context structure to accomodate with= the +operations applied, these functions will be used to replace some operation= s previously +made in the IOMMU driver. + +# IOMMU platform_ops interface changes + +The IOMMU driver needs to expose a way to create and manage IOMMU contexts= , the approach +taken here is to modify the interface to allow specifying a IOMMU context = on operations, +and at the same time, simplifying the interface by relying more on iommu +platform-independent code. + +Added functions in iommu_ops + +```c +/* Initialize a context (creating page tables, allocating hardware, struct= ures, ...) */ +int (*context_init)(struct domain *d, struct iommu_context *ctx, + u32 flags); +/* Destroy a context, assumes no device is bound to the context. */ +int (*context_teardown)(struct domain *d, struct iommu_context *ctx, + u32 flags); +/* Put a device in a context (assumes the device is not attached to anothe= r context) */ +int (*attach)(struct domain *d, device_t *dev, + struct iommu_context *ctx); +/* Remove a device from a context, and from the IOMMU. */ +int (*detach)(struct domain *d, device_t *dev, + struct iommu_context *prev_ctx); +/* Move the device from a context to another, including if the new context= is in + another domain. d corresponds to the target domain. */ +int (*reattach)(struct domain *d, device_t *dev, + struct iommu_context *prev_ctx, + struct iommu_context *ctx); + +#ifdef CONFIG_HAS_PCI +/* Specific interface for phantom function devices. */ +int (*add_devfn)(struct domain *d, struct pci_dev *pdev, u16 devfn, + struct iommu_context *ctx); +int (*remove_devfn)(struct domain *d, struct pci_dev *pdev, u16 devfn, + struct iommu_context *ctx); +#endif + +/* Changes in existing to use a specified iommu_context. */ +int __must_check (*map_page)(struct domain *d, dfn_t dfn, mfn_t mfn, + unsigned int flags, + unsigned int *flush_flags, + struct iommu_context *ctx); +int __must_check (*unmap_page)(struct domain *d, dfn_t dfn, + unsigned int order, + unsigned int *flush_flags, + struct iommu_context *ctx); +int __must_check (*lookup_page)(struct domain *d, dfn_t dfn, mfn_t *mfn, + unsigned int *flags, + struct iommu_context *ctx); + +int __must_check (*iotlb_flush)(struct domain *d, + struct iommu_context *ctx, dfn_t dfn, + unsigned long page_count, + unsigned int flush_flags); + +void (*clear_root_pgtable)(struct domain *d, struct iommu_context *ctx); +``` + +These functions are redundant with existing functions, therefore, the foll= owing functions +are replaced with new equivalents : +- quarantine_init : platform-independent code and IOMMU_CONTEXT_INIT_quara= ntine flag +- add_device : attach and add_devfn (phantom) +- assign_device : attach and add_devfn (phantom) +- remove_device : detach and remove_devfn (phantom) +- reassign_device : reattach + +Some functionnal differences with previous functions, the following should= be handled +by platform-independent/arch-specific code instead of IOMMU driver : +- identity mappings (unity mappings and rmrr) +- device list in context and domain +- domain of a device +- quarantine + +The idea behind this is to implement IOMMU context features while simplify= ing IOMMU +drivers implementations and ensuring more consistency between IOMMU driver= s. + +## Phantom function handling + +PCI devices may use additionnal devfn to do DMA operations, in order to su= pport such +devices, an interface is added to map specific device functions without im= plying that +the device is mapped to a new context (that may cause duplicates in Xen da= ta structures). + +Functions add_devfn and remove_devfn allows to map a iommu context on spec= ific devfn +for a pci device, without altering platform-independent data structures. + +It is important for the reattach operation to care about these devices, in= order +to prevent devices from being partially reattached to the new context (see= XSA-449 [2]) +by using a all-or-nothing approach for reattaching such devices. + +# Quarantine refactoring using IOMMU contexts + +The quarantine mecanism can be entirely reimplemented using IOMMU context,= making +it simpler, more consistent between platforms, + +Quarantine is currently only supported with x86 platforms and works by cre= ating a +single *hardware IOMMU domain* per quarantined device. All the quarantine = logic is +the implemented in a platform-specific fashion while actually implementing= the same +concepts : + +The *hardware IOMMU context* data structures for quarantine are currently = stored in +the device structure itself (using arch_pci_dev) and IOMMU driver needs to= care about +whether we are dealing with quarantine operations or regular operations (o= ften dealt +using macros such as QUARANTINE_SKIP or DEVICE_PGTABLE). + +The page table that will apply on the quarantined device is created reserv= ed device +regions, and adding mappings to a scratch page if enabled (quarantine=3Dsc= ratch-page). + +A new approach we can use is allowing the quarantine domain (DomIO) to man= age IOMMU +contexts, and implement all the quarantine logic using IOMMU contexts. + +That way, the quarantine implementation can be platform-independent, thus = have a more +consistent implementation between platforms. It will also allows quarantin= e to work +with other IOMMU implementations without having to implement platform-spec= ific behavior. +Moreover, quarantine operations can be implemented using regular context o= perations +instead of relying on driver-specific code. + +Quarantine implementation can be summarised as + +```c +int iommu_quarantine_dev_init(device_t *dev) +{ + int ret; + u16 ctx_id; + + if ( !iommu_quarantine ) + return -EINVAL; + + ret =3D iommu_context_alloc(dom_io, &ctx_id, IOMMU_CONTEXT_INIT_quaran= tine); + + if ( ret ) + return ret; + + /** TODO: Setup scratch page, mappings... */ + + ret =3D iommu_reattach_context(dev->domain, dom_io, dev, ctx_id); + + if ( ret ) + { + ASSERT(!iommu_context_free(dom_io, ctx_id, 0)); + return ret; + } + + return ret; +} +``` + +# Platform-specific considerations + +## Reference counters on target pages + +When mapping a guest page onto a IOMMU context, we need to make sure that +this page is not reused for something else while being actually referenced +by a IOMMU context. One way of doing it is incrementing the reference coun= ter +of each target page we map (excluding reserved regions), and decrementing = it +when the mapping isn't used anymore. + +One consideration to have is when destroying the context while having exis= ting +mappings in it. We can walk through the entire page table and decrement the +reference counter of all mappings. All of that assumes that there is no re= served +region mapped (which should be the case as a requirement of teardown, or a= s a +consequence of REATTACH_DEFAULT flag). + +Another consideration is that the "cleanup mappings" operation may take a = lot +of time depending on the complexity of the page table. Making the teardown= operation preemptable can allow the hypercall to be preempted if needed al= so preventing a malicious +guest from stalling a CPU in a teardown operation with a specially crafted= IOMMU +context (e.g with several 1G superpages). + +## Limit the amount of pages IOMMU contexts can use + +In order to prevent a (eventually malicious) guest from causing too much a= llocations +in Xen, we can enforce limits on the memory the IOMMU subsystem can use fo= r IOMMU context. +A possible implementation can be to preallocate a reasonably large chunk o= f memory +and split it into pages for use by the IOMMU subsystem only for non-defaul= t IOMMU +contexts (e.g PV-IOMMU interface), if this limitation is overcome, some op= erations +may fail from the guest side. These limitations shouldn't impact "usual" o= perations +of the IOMMU subsystem (e.g default context initialization). + +## x86 Architecture + +TODO + +### Intel VT-d + +VT-d uses DID to tag the *IOMMU domain* applied to a device and assumes th= at all entries +with the same DID uses the same page table (i.e same IOMMU context). +Under certain circonstances (e.g DRHD with DID limit below 16-bits), the *= DID* is +transparently converted into a DRHD-specific DID using a map managed inter= nally. + +The current implementation of the code reuses the Xen domain_id as DID. +However, by using multiples IOMMU contexts per domain, we can't use the do= main_id for +contexts (otherwise, different page tables will be mapped with the same DI= D). +The following strategy is used : +- on the default context, reuse the domain_id (the default context is uniq= ue per domain) +- on non-default context, use a id allocated in the pseudo_domid map, (act= ually used by +quarantine) which is a DID outside of Xen domain_id range + +### AMD-Vi + +TODO + +## Device-tree platforms + +### SMMU and SMMUv3 + +TODO + +* * * + +[1] See pv-iommu.md + +[2] pci: phantom functions assigned to incorrect contexts +https://xenbits.xen.org/xsa/advisory-449.html \ No newline at end of file --=20 2.51.2 -- Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sun Feb 8 01:17:05 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=reject dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1763637014; cv=none; d=zohomail.com; s=zohoarc; b=ms8PtfY15pYZir0pdb0iJVTOMXpdstn8VxUQA0n4fouBYluLhy2Qh/ML54XBwei/khLt1sonFOa4s6eUi6yhav7SK46juVemWVZsyA+QOBtrwJXAtGLen6o/y+agyAtRCI6UN0Gy/Miu4syRWsWP2VyQ1/gzDOrcptTpwIelPk8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1763637014; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=GXinnFzT2Dw3AKOWRX54Rw2tJx5c93LXLNjbjR065W4=; b=Nn38urolHWC66hCR1ICr/yLZMF4E1xSXxExjhX0MaLa51divR0HGExA9gS+BAwBP79dGsV7YA+fSdr/4M7jOLUV+an3aHa6PvZOJ6JpGlnJeK+7gvv8IkCq1n/dfZH2ygsRJR9p1t0j72LJWqfwIlv1gWsFZU9FrbcRaYhh3dGI= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1763637014122862.414432316639; Thu, 20 Nov 2025 03:10:14 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.1166858.1493295 (Exim 4.92) (envelope-from ) id 1vM2Xz-0001sG-6k; Thu, 20 Nov 2025 11:09:55 +0000 Received: by outflank-mailman (output) from mailman id 1166858.1493295; Thu, 20 Nov 2025 11:09:55 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2Xz-0001s1-2M; Thu, 20 Nov 2025 11:09:55 +0000 Received: by outflank-mailman (input) for mailman id 1166858; Thu, 20 Nov 2025 11:09:53 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2Xx-0001P8-IV for xen-devel@lists.xenproject.org; Thu, 20 Nov 2025 11:09:53 +0000 Received: from mail128-17.atl41.mandrillapp.com (mail128-17.atl41.mandrillapp.com [198.2.128.17]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id 6f247af5-c601-11f0-980a-7dc792cee155; Thu, 20 Nov 2025 12:09:51 +0100 (CET) Received: from pmta08.mandrill.prod.atl01.rsglab.com (localhost [127.0.0.1]) by mail128-17.atl41.mandrillapp.com (Mailchimp) with ESMTP id 4dBwby304dzCf9P1H for ; Thu, 20 Nov 2025 11:09:50 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id 4085b47ab6ed4a5fa90b0d9e1de06089; Thu, 20 Nov 2025 11:09:50 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 6f247af5-c601-11f0-980a-7dc792cee155 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1763636990; x=1763906990; bh=GXinnFzT2Dw3AKOWRX54Rw2tJx5c93LXLNjbjR065W4=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=WXPBYSuj6i52fQ0MfwmD8mPYEN9b6Tpqf1ueyBthSDr7MN8IG9azC4oLv5jy6Of91 MBjEOuYUyfAgYeNPB8YkkV1tl5sKJMw8sgvcLE/PrGcAkdT4NQwF/wZjw8E5JY+t89 t+lisIROdB+uG2ZOXEdBiiCwy6eQgcbJKzIt+qdoSYHDsVCrCPPYUdOMIiGNrOVHwU bcHGW0aEgviEGe2DgrO4qoMJLBSZl7aBcbbdFZd00AUPHTS2wlfdxxP8xyqVIvsvor xm1oLfcCWQvoah5nMMpbppan7gj+KxYZ3uYsSFfB7k6T/VSQUt4xRYW2cwHnAN21yj adE9qOGRdbvvg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1763636990; x=1763897490; i=teddy.astie@vates.tech; bh=GXinnFzT2Dw3AKOWRX54Rw2tJx5c93LXLNjbjR065W4=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=b623Ziwa2oYEazPE9+4uSXZCewBvlUHRmm2SNZRrH84fjppf2ayzh8xONsxY9Tnq8 sq7rktFs87xKbrBPQu1bgnXi/Uynev7YFZ6vX1WR4hpyId0WHIhOzcKmHvfLrgGQgG ty6KdLtux5puoG643gdLoBs7RYOqZQv0ahFg494fdZBsy3H/64eKmRaujqDtlLC8hY /oT132cLBpJu06njFO1ul6dr4tBHB55pHQP8cB5z7KYm4sWn4I5zEUVaYFDTMTt9C4 XgexWm/4j3CxM0b/Ibxj/WxbpduUn33ON9Tr6znikp2yQuwAR8cqs/CHxCide703ce b7LH47aYG25rQ== From: "Teddy Astie" Subject: =?utf-8?Q?[RFC=20PATCH=20v7=2002/14]=20docs/designs:=20Add=20a=20design=20document=20for=20PV-IOMMU?= X-Mailer: git-send-email 2.51.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1763636989083 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Andrew Cooper" , "Anthony PERARD" , "Michal Orzel" , "Jan Beulich" , "Julien Grall" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" , "Stefano Stabellini" Message-Id: <7d5e25fe9c042e3e58f957c71eb2cdd51b8fc0af.1763569135.git.teddy.astie@vates.tech> In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.4085b47ab6ed4a5fa90b0d9e1de06089?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20251120:md Date: Thu, 20 Nov 2025 11:09:50 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @mandrillapp.com) (identity teddy.astie@vates.tech) X-ZM-MESSAGEID: 1763637015787018900 Content-Type: text/plain; charset="utf-8" Some operating systems want to use IOMMU to implement various features (e.g VFIO) or DMA protection. This patch introduce a proposal for IOMMU paravirtualization for Dom0. Signed-off-by Teddy Astie --- docs/designs/pv-iommu.md | 118 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 118 insertions(+) create mode 100644 docs/designs/pv-iommu.md diff --git a/docs/designs/pv-iommu.md b/docs/designs/pv-iommu.md new file mode 100644 index 0000000000..d2f708b3e8 --- /dev/null +++ b/docs/designs/pv-iommu.md @@ -0,0 +1,118 @@ +# IOMMU paravirtualization for Dom0 + +Status: Experimental + +# Background + +By default, Xen only uses the IOMMU for itself, either to make device addr= ess +space coherent with guest address space (x86 HVM/PVH) or to prevent devices +from doing DMA outside it's expected memory regions including the hypervis= or +(x86 PV). + +A limitation is that guests (especially privileged ones) may want to use +IOMMU hardware in order to implement features such as DMA protection and +VFIO [1] as IOMMU functionality is not available outside of the hypervisor +currently. + +[1] VFIO - "Virtual Function I/O" - https://www.kernel.org/doc/html/latest= /driver-api/vfio.html + +# Design + +The operating system may want to have access to various IOMMU features suc= h as +context management and DMA remapping. We can create a new hypercall that a= llows +the guest to have access to a new paravirtualized IOMMU interface. + +This feature is only meant to be available for the Dom0, as DomU have some +emulated devices that can't be managed on Xen side and are not hardware, we +can't rely on the hardware IOMMU to enforce DMA remapping. + +This interface is exposed under the `iommu_op` hypercall. + +In addition, Xen domains are modified in order to allow existence of sever= al +IOMMU context including a default one that implement default behavior (e.g +hardware assisted paging) and can't be modified by guest. DomU cannot have +contexts, and therefore act as if they only have the default domain. + +Each IOMMU context within a Xen domain is identified using a domain-specif= ic +context number that is used in the Xen IOMMU subsystem and the hypercall +interface. + +The number of IOMMU context a domain is specified by either the toolstack = or +the domain itself. + +# IOMMU operations + +## Initialize PV-IOMMU + +Initialize PV-IOMMU for the domain. +It can only be called once. + +## Alloc context + +Create a new IOMMU context for the guest and return the context number to = the +guest. +Fail if the IOMMU context limit of the guest is reached. + +A flag can be specified to create a identity mapping. + +## Free context + +Destroy a IOMMU context created previously. +It is not possible to free the default context. + +Reattach context devices to default context if specified by the guest. + +Fail if there is a device in the context and reattach-to-default flag is n= ot +specified. + +## Reattach device + +Reattach a device to another IOMMU context (including the default one). +The target IOMMU context number must be valid and the context allocated. + +The guest needs to specify a PCI SBDF of a device he has access to. + +## Map/unmap page + +Map/unmap a page on a context. +The guest needs to specify a gfn and target dfn to map. + +Refuse to create the mapping if one already exist for the same dfn. + +## Lookup page + +Get the gfn mapped by a specific dfn. + +## Remote command + +Make a PV-IOMMU operation on behalf of another domain. +Especially useful for implementing IOMMU emulation (e.g using QEMU) +or initializing PV-IOMMU with enforced limits. + +# Implementation considerations + +## Hypercall batching + +In order to prevent unneeded hypercalls and IOMMU flushing, it is advisabl= e to +be able to batch some critical IOMMU operations (e.g map/unmap multiple pa= ges). +These batched operations should be preemptable/abortable to prevent a large +map/unmap to block execution forever. + +## Hardware without IOMMU support + +Operating system needs to be aware on PV-IOMMU capability, and whether it = is +able to make contexts. However, some operating system may critically fail = in +case they are able to make a new IOMMU context. Which is supposed to happen +if no IOMMU hardware is available. + +The hypercall interface needs a interface to advertise the ability to crea= te +and manage IOMMU contexts including the amount of context the guest is able +to use. Using these informations, the Dom0 may decide whether to use or not +the PV-IOMMU interface. + +## Page pool for contexts + +In order to prevent unexpected starving on the hypervisor memory with a +buggy Dom0. We can preallocate the pages the contexts will use and make +map/unmap use these pages instead of allocating them dynamically. + --=20 2.51.2 -- Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sun Feb 8 01:17:05 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=reject dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1763637013; cv=none; d=zohomail.com; s=zohoarc; b=RjqzBivJ0bynGpi1ayuAN8bbz1t2Yx4yKQ7OxCo2lcKVmI1DOCpSYmH6i9eK2GpbE0qq/rV9P5S6T2KwlKdgWG6e8fo1yAVwnjV0WRSWhTMD7uACuIE7T2CP1++f1ecTnplpmYc/FKjacnSeNrXyD+dOh34FYwDvZDrKTGEFBdU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1763637013; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=0H4/bqjY8W+k0exa1sAH31qgAUJ6pXe42xKLfYS+xWo=; b=k6ZGK7vmtCYpbFXk941u5ni3jspU7P4v+YknR8AKQPy3uMIhzWJ0qrUvpBZebvQ3YnM18r4JSN8L0MMHMoVfcjEzAcBNnNjNFLgwulENqRtmvyPAQ6WLM9AR6t3UH3bO7YbJvy5Kvghb7GpYjdCTevq2BOYg5rgCsiIxqOEec28= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1763637013584540.2182493074973; Thu, 20 Nov 2025 03:10:13 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.1166859.1493305 (Exim 4.92) (envelope-from ) id 1vM2Y0-00027r-JL; Thu, 20 Nov 2025 11:09:56 +0000 Received: by outflank-mailman (output) from mailman id 1166859.1493305; Thu, 20 Nov 2025 11:09:56 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2Y0-00027k-Fg; Thu, 20 Nov 2025 11:09:56 +0000 Received: by outflank-mailman (input) for mailman id 1166859; Thu, 20 Nov 2025 11:09:54 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2Xy-0001P8-IV for xen-devel@lists.xenproject.org; Thu, 20 Nov 2025 11:09:54 +0000 Received: from mail180-15.suw31.mandrillapp.com (mail180-15.suw31.mandrillapp.com [198.2.180.15]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id 6fc724a0-c601-11f0-980a-7dc792cee155; Thu, 20 Nov 2025 12:09:52 +0100 (CET) Received: from pmta11.mandrill.prod.suw01.rsglab.com (localhost [127.0.0.1]) by mail180-15.suw31.mandrillapp.com (Mailchimp) with ESMTP id 4dBwbz373YzPm0r8t for ; Thu, 20 Nov 2025 11:09:51 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id fe5580723b8f4641a8a2c0f50e1932a1; Thu, 20 Nov 2025 11:09:51 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 6fc724a0-c601-11f0-980a-7dc792cee155 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1763636991; x=1763906991; bh=0H4/bqjY8W+k0exa1sAH31qgAUJ6pXe42xKLfYS+xWo=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=msYCO4HDWTM4iOOo+eFWbNxDT82rYCtfdYEvQfVgVOCJV9n7znv9NJ1+IgHVhCX7E tCUbVz57CmWUWs1bpXljd3Ja0ca0TLjBfGr8fMvDssV8LfKy7+Zb2ex9SFyLlTLf6X 5fhePPQtGpX3DN/IkOQUZMX+s3c3GOLs+HSDmHAUFzAudDsUuxQ0Smruu5mc7HO5c2 TTAiqOjk8PSYf/RRLNHFvWuiapGPuD2YTGLEVh2tKhTS7c2ectr1YQvrFtmJt4S/yr 2p2Zdo4LnnqfnUXW4CdXzTBHEAcxIXql+St5R3THqUTJB7ZK73qIaeKVRMyqKAczkG stogBT6ui+FwQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1763636991; x=1763897491; i=teddy.astie@vates.tech; bh=0H4/bqjY8W+k0exa1sAH31qgAUJ6pXe42xKLfYS+xWo=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=0opB1cZySE3tmA0dy/XBB/lebquNL1ppesV6JdpjFMgRKk3XHIEmANUqCkxgRjCpq FhWgUtTh4LT9h/d53avQoATekE4Xegjpf3qDXS1R3UX9BwCo4fu3wu/UFM3hhRWMSw RZXyDn3TarmIHQeyW+7shTNyOvjF+7EtIQqEBmY4D3I/YXuBI1J0REN+z1yRa9NzS1 p7UAszszIReUtoAQ2bMGy96TWegb9MOZhZswd1Pf1TN1neExSKxAtbrsP6cnqmOw1L ShygmRhsVnCYDXNjAomEghJYoAOpxdTeXxU7NOxp6QyQ0pkFBr9/rpEfzGoPzy5ckb wgGzajGgBMjYQ== From: "Teddy Astie" Subject: =?utf-8?Q?[RFC=20PATCH=20v7=2003/14]=20x86/domain:=20Defer=20domain=20iommu=20initialization.?= X-Mailer: git-send-email 2.51.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1763636990528 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Jan Beulich" , "Andrew Cooper" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" Message-Id: In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.fe5580723b8f4641a8a2c0f50e1932a1?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20251120:md Date: Thu, 20 Nov 2025 11:09:51 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity teddy.astie@vates.tech) (identity @mandrillapp.com) X-ZM-MESSAGEID: 1763637015701018900 Content-Type: text/plain; charset="utf-8" For the IOMMU redesign, the iommu context pagetable is defined once during initialization. When reusing P2M pagetable, we want to ensure that this pagetable is properly initialized. Signed-off-by Teddy Astie --- xen/arch/x86/domain.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 66b7412b87..393b0fe27c 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -927,9 +927,6 @@ int arch_domain_create(struct domain *d, if ( (rc =3D init_domain_irq_mapping(d)) !=3D 0 ) goto fail; =20 - if ( (rc =3D iommu_domain_init(d, config->iommu_opts)) !=3D 0 ) - goto fail; - psr_domain_init(d); =20 if ( is_hvm_domain(d) ) @@ -948,6 +945,9 @@ int arch_domain_create(struct domain *d, else ASSERT_UNREACHABLE(); /* Not HVM and not PV? */ =20 + if ( (rc =3D iommu_domain_init(d, config->iommu_opts)) !=3D 0 ) + goto fail; + if ( (rc =3D tsc_set_info(d, XEN_CPUID_TSC_MODE_DEFAULT, 0, 0, 0)) != =3D 0 ) { ASSERT_UNREACHABLE(); --=20 2.51.2 -- Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sun Feb 8 01:17:05 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=reject dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1763637022; cv=none; d=zohomail.com; s=zohoarc; b=jhmx6uL1sJtZhFZX38zcGAJmaCMOaJm7E/AH+Ga4VN26Xcv8KYO8OboJjLGAxUx4AL+dibv0EpAHGF/uHYllVskAZhtjaoz5XFDmHgHJpP2H4XT5qyM2Rou6LNaoGqjbg06Z+z7TAIqh9zwVOfB/O6jc64f2KNny6eXjrOSGsNM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1763637022; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=5zrLUSwtaWCuG9JklxWOwKUmXJhYA8DPaKHZ66yozag=; b=AoOM62yeNrO8TBRz5youSjBmuksG0D8KRxqsbWMNSr6xXKPzdCE1b/+CPYRlVRSOxQwHjedJ7kWe8kFhsWIG7hOQuAPcMEUGQdSvvxVZb4+ooD25DbN/6XihhFnYg+8tNRgjEt1BpEAGok7J9PCLunXgAANEAq3LGUYu/kstGww= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1763637022408589.0807642262372; Thu, 20 Nov 2025 03:10:22 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.1166860.1493310 (Exim 4.92) (envelope-from ) id 1vM2Y0-0002BU-Ug; Thu, 20 Nov 2025 11:09:56 +0000 Received: by outflank-mailman (output) from mailman id 1166860.1493310; Thu, 20 Nov 2025 11:09:56 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2Y0-00029l-Mc; Thu, 20 Nov 2025 11:09:56 +0000 Received: by outflank-mailman (input) for mailman id 1166860; Thu, 20 Nov 2025 11:09:55 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2Xy-0001PI-SG for xen-devel@lists.xenproject.org; Thu, 20 Nov 2025 11:09:55 +0000 Received: from mail128-17.atl41.mandrillapp.com (mail128-17.atl41.mandrillapp.com [198.2.128.17]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 708bc95d-c601-11f0-9d18-b5c5bf9af7f9; Thu, 20 Nov 2025 12:09:53 +0100 (CET) Received: from pmta08.mandrill.prod.atl01.rsglab.com (localhost [127.0.0.1]) by mail128-17.atl41.mandrillapp.com (Mailchimp) with ESMTP id 4dBwc11tHszCf9PNj for ; Thu, 20 Nov 2025 11:09:53 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id c591c0d02f5f4ae3bf67e9fd997ea28c; Thu, 20 Nov 2025 11:09:53 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 708bc95d-c601-11f0-9d18-b5c5bf9af7f9 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1763636993; x=1763906993; bh=5zrLUSwtaWCuG9JklxWOwKUmXJhYA8DPaKHZ66yozag=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=HrGu7jm+0LDV1JZYyFafXewadLTw7yBpPPLBklcq7b4xt5ZZjmsb03EkQUSP+ffOl 1rQ6sO9eoe0x7MVG+Y06N9zI0/DVjFv8FyWKq8m1T4P6xuo5IWgsQ4qYNWJciyjzLY 2SDVqmGqLbkJWTs70a5rORc30MOmWtizhYOen0rGODQXATV5R2N8H7tm2PW4p4i+cP I92pAK3NrY/Ei8nt9HNxllBVIy9VQnZJPAxJQUYznYODZTonTkAwf2N8sF5+V2yAOy SzExvyPQTyppBJA/8HykmDSuV6px/4hE08JL57sMfDBWuB01Pkrr6SnP87nKu60mrC CaL8Rn3CV9Qtw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1763636993; x=1763897493; i=teddy.astie@vates.tech; bh=5zrLUSwtaWCuG9JklxWOwKUmXJhYA8DPaKHZ66yozag=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=J5bo9HACs9XG2cYzarHIItH8XKtgFLm6tqC1cN9rT5scjrzr7Oi6hrpBbowpAEVdL bVsh5nByNjcLxTF4GcX+S76InxHM6gv5TPsRTrum/ZDI2m3RGqtsX5Z//yzjR+Ytp4 FLg8zuPuI32qt6xKWQZQRLj1FNUMIrreEaQdpCbiq3allXi5YqE/vuwb3v6QhWfOyb 5uOn9Zvkl/W4mHjbweebCcazvebEo/0vw2hSoLyI10HHH01vOuxVNimYuwD+D3Ox4p AATtpx3P2h3p8FUzPfkBcCSMgIFJy4k2Z3x78FghSAr380DiyidXgnGXCWX5cpByyk RmnJ9rvEbrLSQ== From: "Teddy Astie" Subject: =?utf-8?Q?[RFC=20PATCH=20v7=2004/14]=20iommu:=20Move=20IOMMU=20domain=20related=20structures=20to=20(arch=5F)iommu=5Fcontext?= X-Mailer: git-send-email 2.51.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1763636992337 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Stefano Stabellini" , "Julien Grall" , "Bertrand Marquis" , "Michal Orzel" , "Volodymyr Babchuk" , "Timothy Pearson" , "Jan Beulich" , "Andrew Cooper" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" , "Lukasz Hawrylko" , "Daniel P. Smith" , "=?utf-8?Q?Mateusz=20M=C3=B3wka?=" , "Jason Andryuk" Message-Id: In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.c591c0d02f5f4ae3bf67e9fd997ea28c?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20251120:md Date: Thu, 20 Nov 2025 11:09:53 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity teddy.astie@vates.tech) (identity @mandrillapp.com) X-ZM-MESSAGEID: 1763637023576018900 Content-Type: text/plain; charset="utf-8" Preparatory work for IOMMU redesign. Introduce a new structure (arch_)iommu_context that will hold all per-IOMMU context related informations for the IOMMU drivers. Signed-off-by Teddy Astie --- xen/arch/arm/include/asm/iommu.h | 4 + xen/arch/ppc/include/asm/iommu.h | 3 + xen/arch/x86/domain.c | 4 +- xen/arch/x86/include/asm/iommu.h | 50 +++-- xen/arch/x86/tboot.c | 3 +- xen/drivers/passthrough/amd/iommu.h | 5 +- xen/drivers/passthrough/amd/iommu_init.c | 6 +- xen/drivers/passthrough/amd/iommu_map.c | 102 +++++----- xen/drivers/passthrough/amd/pci_amd_iommu.c | 81 ++++---- xen/drivers/passthrough/iommu.c | 6 + xen/drivers/passthrough/vtd/extern.h | 4 +- xen/drivers/passthrough/vtd/iommu.c | 206 +++++++++++--------- xen/drivers/passthrough/vtd/quirks.c | 3 +- xen/drivers/passthrough/x86/iommu.c | 62 +++--- xen/include/xen/iommu.h | 10 + 15 files changed, 318 insertions(+), 231 deletions(-) diff --git a/xen/arch/arm/include/asm/iommu.h b/xen/arch/arm/include/asm/io= mmu.h index ad15477e24..41fc676db6 100644 --- a/xen/arch/arm/include/asm/iommu.h +++ b/xen/arch/arm/include/asm/iommu.h @@ -20,6 +20,10 @@ struct arch_iommu void *priv; }; =20 +struct arch_iommu_context +{ +}; + const struct iommu_ops *iommu_get_ops(void); void iommu_set_ops(const struct iommu_ops *ops); =20 diff --git a/xen/arch/ppc/include/asm/iommu.h b/xen/arch/ppc/include/asm/io= mmu.h index 024ead3473..8367505de2 100644 --- a/xen/arch/ppc/include/asm/iommu.h +++ b/xen/arch/ppc/include/asm/iommu.h @@ -5,4 +5,7 @@ struct arch_iommu { }; =20 +struct arch_iommu_context { +}; + #endif /* __ASM_PPC_IOMMU_H__ */ diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 393b0fe27c..35abbfcc82 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -665,7 +665,7 @@ int arch_sanitise_domain_config(struct xen_domctl_creat= edomain *config) if ( nested_virt && !hvm_nested_virt_supported() ) { dprintk(XENLOG_INFO, "Nested virt requested but not available\n"); - return -EINVAL; =20 + return -EINVAL; } =20 if ( nested_virt && !hap ) @@ -2498,7 +2498,7 @@ int domain_relinquish_resources(struct domain *d) =20 PROGRESS(iommu_pagetables): =20 - ret =3D iommu_free_pgtables(d); + ret =3D iommu_free_pgtables(d, iommu_default_context(d)); if ( ret ) return ret; =20 diff --git a/xen/arch/x86/include/asm/iommu.h b/xen/arch/x86/include/asm/io= mmu.h index 8dc464fbd3..94513ba9dc 100644 --- a/xen/arch/x86/include/asm/iommu.h +++ b/xen/arch/x86/include/asm/iommu.h @@ -31,22 +31,21 @@ typedef uint64_t daddr_t; #define dfn_to_daddr(dfn) __dfn_to_daddr(dfn_x(dfn)) #define daddr_to_dfn(daddr) _dfn(__daddr_to_dfn(daddr)) =20 -struct arch_iommu -{ - spinlock_t mapping_lock; /* io page table lock */ - struct { - struct page_list_head list; - spinlock_t lock; - } pgtables; +struct iommu_context; =20 +struct arch_iommu_context +{ + struct page_list_head pgtables; struct list_head identity_maps; =20 + + spinlock_t mapping_lock; /* io page table lock */ + union { /* Intel VT-d */ struct { uint64_t pgd_maddr; /* io page directory machine address */ - unsigned int agaw; /* adjusted guest address width, 0 is level= 2 30-bit */ - unsigned long *iommu_bitmap; /* bitmap of iommu(s) that the do= main uses */ + unsigned long *iommu_bitmap; /* bitmap of iommu(s) that the co= ntext uses */ } vtd; /* AMD IOMMU */ struct { @@ -56,6 +55,24 @@ struct arch_iommu }; }; =20 +struct arch_iommu +{ + /* Queue for freeing pages */ + struct page_list_head free_queue; + + union { + /* Intel VT-d */ + struct { + unsigned int agaw; /* adjusted guest address width, 0 is level= 2 30-bit */ + } vtd; + /* AMD IOMMU */ + struct { + unsigned int paging_mode; + struct guest_iommu *g_iommu; + }; + }; +}; + extern struct iommu_ops iommu_ops; =20 # include @@ -109,10 +126,10 @@ static inline void iommu_disable_x2apic(void) iommu_vcall(&iommu_ops, disable_x2apic); } =20 -int iommu_identity_mapping(struct domain *d, p2m_access_t p2ma, - paddr_t base, paddr_t end, +int iommu_identity_mapping(struct domain *d, struct iommu_context *ctx, + p2m_access_t p2ma, paddr_t base, paddr_t end, unsigned int flag); -void iommu_identity_map_teardown(struct domain *d); +void iommu_identity_map_teardown(struct domain *d, struct iommu_context *c= tx); =20 extern bool untrusted_msi; =20 @@ -128,14 +145,19 @@ unsigned long *iommu_init_domid(domid_t reserve); domid_t iommu_alloc_domid(unsigned long *map); void iommu_free_domid(domid_t domid, unsigned long *map); =20 -int __must_check iommu_free_pgtables(struct domain *d); +int __must_check iommu_free_pgtables(struct domain *d, struct iommu_contex= t *ctx); struct domain_iommu; struct page_info *__must_check iommu_alloc_pgtable(struct domain_iommu *hd, + struct iommu_context *c= tx, uint64_t contig_mask); -void iommu_queue_free_pgtable(struct domain_iommu *hd, struct page_info *p= g); +void iommu_queue_free_pgtable(struct domain *d, struct iommu_context *ctx, + struct page_info *pg); =20 /* Check [start, end] unity map range for correctness. */ bool iommu_unity_region_ok(const char *prefix, mfn_t start, mfn_t end); +int arch_iommu_context_init(struct domain *d, struct iommu_context *ctx, u= 32 flags); +int arch_iommu_context_teardown(struct domain *d, struct iommu_context *ct= x, u32 flags); +int arch_iommu_flush_free_queue(struct domain *d); =20 #endif /* !__ARCH_X86_IOMMU_H__ */ /* diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c index 9d9bb6e7cf..ecc0666334 100644 --- a/xen/arch/x86/tboot.c +++ b/xen/arch/x86/tboot.c @@ -222,7 +222,8 @@ static void tboot_gen_domain_integrity(const uint8_t ke= y[TB_KEY_SIZE], { const struct domain_iommu *dio =3D dom_iommu(d); =20 - update_iommu_mac(&ctx, dio->arch.vtd.pgd_maddr, + update_iommu_mac(&ctx, + iommu_default_context(d)->arch.vtd.pgd_maddr, agaw_to_level(dio->arch.vtd.agaw)); } } diff --git a/xen/drivers/passthrough/amd/iommu.h b/xen/drivers/passthrough/= amd/iommu.h index 52f748310b..4938cc38ed 100644 --- a/xen/drivers/passthrough/amd/iommu.h +++ b/xen/drivers/passthrough/amd/iommu.h @@ -26,6 +26,7 @@ #include #include #include +#include =20 #include #include @@ -199,10 +200,10 @@ int __must_check cf_check amd_iommu_unmap_page( struct domain *d, dfn_t dfn, unsigned int order, unsigned int *flush_flags); int __must_check amd_iommu_alloc_root(struct domain *d); -int amd_iommu_reserve_domain_unity_map(struct domain *d, +int amd_iommu_reserve_domain_unity_map(struct domain *d, struct iommu_cont= ext *ctx, const struct ivrs_unity_map *map, unsigned int flag); -int amd_iommu_reserve_domain_unity_unmap(struct domain *d, +int amd_iommu_reserve_domain_unity_unmap(struct domain *d, struct iommu_co= ntext *ctx, const struct ivrs_unity_map *map); int cf_check amd_iommu_get_reserved_device_memory( iommu_grdm_t *func, void *ctxt); diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthr= ough/amd/iommu_init.c index 00d2c46cbc..56b5c2c6ec 100644 --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -608,7 +608,6 @@ static void iommu_check_event_log(struct amd_iommu *iom= mu) sizeof(event_entry_t), parse_event_log_entry); =20 spin_lock_irqsave(&iommu->lock, flags); - =20 /* Check event overflow. */ entry =3D readl(iommu->mmio_base + IOMMU_STATUS_MMIO_OFFSET); if ( entry & IOMMU_STATUS_EVENT_LOG_OVERFLOW ) @@ -664,9 +663,8 @@ static void iommu_check_ppr_log(struct amd_iommu *iommu) =20 iommu_read_log(iommu, &iommu->ppr_log, sizeof(ppr_entry_t), parse_ppr_log_entry); - =20 - spin_lock_irqsave(&iommu->lock, flags); =20 + spin_lock_irqsave(&iommu->lock, flags); /* Check event overflow. */ entry =3D readl(iommu->mmio_base + IOMMU_STATUS_MMIO_OFFSET); if ( entry & IOMMU_STATUS_PPR_LOG_OVERFLOW ) @@ -1595,7 +1593,7 @@ void cf_check amd_iommu_resume(void) for_each_amd_iommu ( iommu ) { /* - * To make sure that iommus have not been touched=20 + * To make sure that iommus have not been touched * before re-enablement */ disable_iommu(iommu); diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthro= ugh/amd/iommu_map.c index 320a2dc64c..81a63cce8e 100644 --- a/xen/drivers/passthrough/amd/iommu_map.c +++ b/xen/drivers/passthrough/amd/iommu_map.c @@ -18,6 +18,7 @@ */ =20 #include +#include =20 #include "iommu.h" =20 @@ -264,9 +265,9 @@ void __init iommu_dte_add_device_entry(struct amd_iommu= _dte *dte, * {Re, un}mapping super page frames causes re-allocation of io * page tables. */ -static int iommu_pde_from_dfn(struct domain *d, unsigned long dfn, - unsigned int target, unsigned long *pt_mfn, - unsigned int *flush_flags, bool map) +static int iommu_pde_from_dfn(struct domain *d, struct iommu_context *ctx, + unsigned long dfn, unsigned int target, + unsigned long *pt_mfn, unsigned int *flush_f= lags, bool map) { union amd_iommu_pte *pde, *next_table_vaddr; unsigned long next_table_mfn; @@ -274,8 +275,8 @@ static int iommu_pde_from_dfn(struct domain *d, unsigne= d long dfn, struct page_info *table; struct domain_iommu *hd =3D dom_iommu(d); =20 - table =3D hd->arch.amd.root_table; - level =3D hd->arch.amd.paging_mode; + table =3D ctx->arch.amd.root_table; + level =3D ctx->arch.amd.paging_mode; =20 if ( !table || target < 1 || level < target || level > 6 ) { @@ -311,7 +312,7 @@ static int iommu_pde_from_dfn(struct domain *d, unsigne= d long dfn, mfn =3D next_table_mfn; =20 /* allocate lower level page table */ - table =3D iommu_alloc_pgtable(hd, IOMMU_PTE_CONTIG_MASK); + table =3D iommu_alloc_pgtable(hd, ctx, IOMMU_PTE_CONTIG_MASK); if ( table =3D=3D NULL ) { AMD_IOMMU_ERROR("cannot allocate I/O page table\n"); @@ -346,7 +347,7 @@ static int iommu_pde_from_dfn(struct domain *d, unsigne= d long dfn, =20 if ( next_table_mfn =3D=3D 0 ) { - table =3D iommu_alloc_pgtable(hd, IOMMU_PTE_CONTIG_MASK); + table =3D iommu_alloc_pgtable(hd, ctx, IOMMU_PTE_CONTIG_MA= SK); if ( table =3D=3D NULL ) { AMD_IOMMU_ERROR("cannot allocate I/O page table\n"); @@ -376,7 +377,8 @@ static int iommu_pde_from_dfn(struct domain *d, unsigne= d long dfn, return 0; } =20 -static void queue_free_pt(struct domain_iommu *hd, mfn_t mfn, unsigned int= level) +static void queue_free_pt(struct domain *d, struct iommu_context *ctx, mfn= _t mfn, + unsigned int level) { if ( level > 1 ) { @@ -387,13 +389,13 @@ static void queue_free_pt(struct domain_iommu *hd, mf= n_t mfn, unsigned int level if ( pt[i].pr && pt[i].next_level ) { ASSERT(pt[i].next_level < level); - queue_free_pt(hd, _mfn(pt[i].mfn), pt[i].next_level); + queue_free_pt(d, ctx, _mfn(pt[i].mfn), pt[i].next_level); } =20 unmap_domain_page(pt); } =20 - iommu_queue_free_pgtable(hd, mfn_to_page(mfn)); + iommu_queue_free_pgtable(d, ctx, mfn_to_page(mfn)); } =20 int cf_check amd_iommu_map_page( @@ -401,6 +403,7 @@ int cf_check amd_iommu_map_page( unsigned int *flush_flags) { struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); unsigned int level =3D (IOMMUF_order(flags) / PTE_PER_TABLE_SHIFT) + 1; bool contig; int rc; @@ -410,7 +413,7 @@ int cf_check amd_iommu_map_page( ASSERT((hd->platform_ops->page_sizes >> IOMMUF_order(flags)) & PAGE_SIZE_4K); =20 - spin_lock(&hd->arch.mapping_lock); + spin_lock(&ctx->arch.mapping_lock); =20 /* * IOMMU mapping request can be safely ignored when the domain is dyin= g. @@ -420,24 +423,24 @@ int cf_check amd_iommu_map_page( */ if ( d->is_dying ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); return 0; } =20 rc =3D amd_iommu_alloc_root(d); if ( rc ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); AMD_IOMMU_ERROR("root table alloc failed, dfn =3D %"PRI_dfn"\n", dfn_x(dfn)); domain_crash(d); return rc; } =20 - if ( iommu_pde_from_dfn(d, dfn_x(dfn), level, &pt_mfn, flush_flags, tr= ue) || + if ( iommu_pde_from_dfn(d, ctx, dfn_x(dfn), level, &pt_mfn, flush_flag= s, true) || !pt_mfn ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); AMD_IOMMU_ERROR("invalid IO pagetable entry dfn =3D %"PRI_dfn"\n", dfn_x(dfn)); domain_crash(d); @@ -449,12 +452,12 @@ int cf_check amd_iommu_map_page( flags & IOMMUF_writable, flags & IOMMUF_readable, &contig); =20 - while ( unlikely(contig) && ++level < hd->arch.amd.paging_mode ) + while ( unlikely(contig) && ++level < ctx->arch.amd.paging_mode ) { struct page_info *pg =3D mfn_to_page(_mfn(pt_mfn)); unsigned long next_mfn; =20 - if ( iommu_pde_from_dfn(d, dfn_x(dfn), level, &pt_mfn, flush_flags, + if ( iommu_pde_from_dfn(d, ctx, dfn_x(dfn), level, &pt_mfn, flush_= flags, false) ) BUG(); BUG_ON(!pt_mfn); @@ -464,11 +467,11 @@ int cf_check amd_iommu_map_page( flags & IOMMUF_writable, flags & IOMMUF_readable, &contig); *flush_flags |=3D IOMMU_FLUSHF_modified | IOMMU_FLUSHF_all; - iommu_queue_free_pgtable(hd, pg); + iommu_queue_free_pgtable(d, ctx, pg); perfc_incr(iommu_pt_coalesces); } =20 - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); =20 *flush_flags |=3D IOMMU_FLUSHF_added; if ( old.pr ) @@ -476,7 +479,7 @@ int cf_check amd_iommu_map_page( *flush_flags |=3D IOMMU_FLUSHF_modified; =20 if ( IOMMUF_order(flags) && old.next_level ) - queue_free_pt(hd, _mfn(old.mfn), old.next_level); + queue_free_pt(d, ctx, _mfn(old.mfn), old.next_level); } =20 return 0; @@ -487,6 +490,7 @@ int cf_check amd_iommu_unmap_page( { unsigned long pt_mfn =3D 0; struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); unsigned int level =3D (order / PTE_PER_TABLE_SHIFT) + 1; union amd_iommu_pte old =3D {}; =20 @@ -496,17 +500,17 @@ int cf_check amd_iommu_unmap_page( */ ASSERT((hd->platform_ops->page_sizes >> order) & PAGE_SIZE_4K); =20 - spin_lock(&hd->arch.mapping_lock); + spin_lock(&ctx->arch.mapping_lock); =20 - if ( !hd->arch.amd.root_table ) + if ( !ctx->arch.amd.root_table ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); return 0; } =20 - if ( iommu_pde_from_dfn(d, dfn_x(dfn), level, &pt_mfn, flush_flags, fa= lse) ) + if ( iommu_pde_from_dfn(d, ctx, dfn_x(dfn), level, &pt_mfn, flush_flag= s, false) ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); AMD_IOMMU_ERROR("invalid IO pagetable entry dfn =3D %"PRI_dfn"\n", dfn_x(dfn)); domain_crash(d); @@ -520,30 +524,30 @@ int cf_check amd_iommu_unmap_page( /* Mark PTE as 'page not present'. */ old =3D clear_iommu_pte_present(pt_mfn, dfn_x(dfn), level, &free); =20 - while ( unlikely(free) && ++level < hd->arch.amd.paging_mode ) + while ( unlikely(free) && ++level < ctx->arch.amd.paging_mode ) { struct page_info *pg =3D mfn_to_page(_mfn(pt_mfn)); =20 - if ( iommu_pde_from_dfn(d, dfn_x(dfn), level, &pt_mfn, + if ( iommu_pde_from_dfn(d, ctx, dfn_x(dfn), level, &pt_mfn, flush_flags, false) ) BUG(); BUG_ON(!pt_mfn); =20 clear_iommu_pte_present(pt_mfn, dfn_x(dfn), level, &free); *flush_flags |=3D IOMMU_FLUSHF_all; - iommu_queue_free_pgtable(hd, pg); + iommu_queue_free_pgtable(d, ctx, pg); perfc_incr(iommu_pt_coalesces); } } =20 - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); =20 if ( old.pr ) { *flush_flags |=3D IOMMU_FLUSHF_modified; =20 if ( order && old.next_level ) - queue_free_pt(hd, _mfn(old.mfn), old.next_level); + queue_free_pt(d, ctx, _mfn(old.mfn), old.next_level); } =20 return 0; @@ -646,7 +650,7 @@ int cf_check amd_iommu_flush_iotlb_pages( return 0; } =20 -int amd_iommu_reserve_domain_unity_map(struct domain *d, +int amd_iommu_reserve_domain_unity_map(struct domain *d, struct iommu_cont= ext *ctx, const struct ivrs_unity_map *map, unsigned int flag) { @@ -664,14 +668,14 @@ int amd_iommu_reserve_domain_unity_map(struct domain = *d, if ( map->write ) p2ma |=3D p2m_access_w; =20 - rc =3D iommu_identity_mapping(d, p2ma, map->addr, + rc =3D iommu_identity_mapping(d, ctx, p2ma, map->addr, map->addr + map->length - 1, flag); } =20 return rc; } =20 -int amd_iommu_reserve_domain_unity_unmap(struct domain *d, +int amd_iommu_reserve_domain_unity_unmap(struct domain *d, struct iommu_co= ntext *ctx, const struct ivrs_unity_map *map) { int rc; @@ -681,7 +685,7 @@ int amd_iommu_reserve_domain_unity_unmap(struct domain = *d, =20 for ( rc =3D 0; map; map =3D map->next ) { - int ret =3D iommu_identity_mapping(d, p2m_access_x, map->addr, + int ret =3D iommu_identity_mapping(d, ctx, p2m_access_x, map->addr, map->addr + map->length - 1, 0); =20 if ( ret && ret !=3D -ENOENT && !rc ) @@ -771,6 +775,7 @@ static int fill_qpt(union amd_iommu_pte *this, unsigned= int level, struct page_info *pgs[IOMMU_MAX_PT_LEVELS]) { struct domain_iommu *hd =3D dom_iommu(dom_io); + struct iommu_context *ctx =3D iommu_default_context(dom_io); unsigned int i; int rc =3D 0; =20 @@ -787,7 +792,7 @@ static int fill_qpt(union amd_iommu_pte *this, unsigned= int level, * page table pages, and the resulting allocations are alw= ays * zeroed. */ - pgs[level] =3D iommu_alloc_pgtable(hd, 0); + pgs[level] =3D iommu_alloc_pgtable(hd, ctx, 0); if ( !pgs[level] ) { rc =3D -ENOMEM; @@ -823,14 +828,15 @@ static int fill_qpt(union amd_iommu_pte *this, unsign= ed int level, int cf_check amd_iommu_quarantine_init(struct pci_dev *pdev, bool scratch_= page) { struct domain_iommu *hd =3D dom_iommu(dom_io); - unsigned int level =3D hd->arch.amd.paging_mode; + struct iommu_context *ctx =3D iommu_default_context(dom_io); + unsigned int level =3D ctx->arch.amd.paging_mode; unsigned int req_id =3D get_dma_requestor_id(pdev->seg, pdev->sbdf.bdf= ); const struct ivrs_mappings *ivrs_mappings =3D get_ivrs_mappings(pdev->= seg); int rc; =20 ASSERT(pcidevs_locked()); - ASSERT(!hd->arch.amd.root_table); - ASSERT(page_list_empty(&hd->arch.pgtables.list)); + ASSERT(!ctx->arch.amd.root_table); + ASSERT(page_list_empty(&ctx->arch.pgtables)); =20 if ( !scratch_page && !ivrs_mappings[req_id].unity_map ) return 0; @@ -843,19 +849,19 @@ int cf_check amd_iommu_quarantine_init(struct pci_dev= *pdev, bool scratch_page) return 0; } =20 - pdev->arch.amd.root_table =3D iommu_alloc_pgtable(hd, 0); + pdev->arch.amd.root_table =3D iommu_alloc_pgtable(hd, ctx, 0); if ( !pdev->arch.amd.root_table ) return -ENOMEM; =20 /* Transiently install the root into DomIO, for iommu_identity_mapping= (). */ - hd->arch.amd.root_table =3D pdev->arch.amd.root_table; + ctx->arch.amd.root_table =3D pdev->arch.amd.root_table; =20 - rc =3D amd_iommu_reserve_domain_unity_map(dom_io, + rc =3D amd_iommu_reserve_domain_unity_map(dom_io, ctx, ivrs_mappings[req_id].unity_ma= p, 0); =20 - iommu_identity_map_teardown(dom_io); - hd->arch.amd.root_table =3D NULL; + iommu_identity_map_teardown(dom_io, ctx); + ctx->arch.amd.root_table =3D NULL; =20 if ( rc ) AMD_IOMMU_WARN("%pp: quarantine unity mapping failed\n", &pdev->sb= df); @@ -871,7 +877,7 @@ int cf_check amd_iommu_quarantine_init(struct pci_dev *= pdev, bool scratch_page) pdev->arch.leaf_mfn =3D page_to_mfn(pgs[0]); } =20 - page_list_move(&pdev->arch.pgtables_list, &hd->arch.pgtables.list); + page_list_move(&pdev->arch.pgtables_list, &ctx->arch.pgtables); =20 if ( rc ) amd_iommu_quarantine_teardown(pdev); @@ -881,16 +887,16 @@ int cf_check amd_iommu_quarantine_init(struct pci_dev= *pdev, bool scratch_page) =20 void amd_iommu_quarantine_teardown(struct pci_dev *pdev) { - struct domain_iommu *hd =3D dom_iommu(dom_io); + struct iommu_context *ctx =3D iommu_default_context(dom_io); =20 ASSERT(pcidevs_locked()); =20 if ( !pdev->arch.amd.root_table ) return; =20 - ASSERT(page_list_empty(&hd->arch.pgtables.list)); - page_list_move(&hd->arch.pgtables.list, &pdev->arch.pgtables_list); - while ( iommu_free_pgtables(dom_io) =3D=3D -ERESTART ) + ASSERT(page_list_empty(&ctx->arch.pgtables)); + page_list_move(&ctx->arch.pgtables, &pdev->arch.pgtables_list); + while ( iommu_free_pgtables(dom_io, ctx) =3D=3D -ERESTART ) /* nothing */; pdev->arch.amd.root_table =3D NULL; } diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/pass= through/amd/pci_amd_iommu.c index 3a14770855..964f6b47db 100644 --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -19,6 +19,7 @@ =20 #include #include +#include =20 #include =20 @@ -86,12 +87,12 @@ int get_dma_requestor_id(uint16_t seg, uint16_t bdf) =20 static int __must_check allocate_domain_resources(struct domain *d) { - struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); int rc; =20 - spin_lock(&hd->arch.mapping_lock); + spin_lock(&ctx->arch.mapping_lock); rc =3D amd_iommu_alloc_root(d); - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); =20 return rc; } @@ -125,7 +126,7 @@ static bool use_ats( } =20 static int __must_check amd_iommu_setup_domain_device( - struct domain *domain, struct amd_iommu *iommu, + struct domain *domain, struct iommu_context *ctx, struct amd_iommu *io= mmu, uint8_t devfn, struct pci_dev *pdev) { struct amd_iommu_dte *table, *dte; @@ -133,7 +134,6 @@ static int __must_check amd_iommu_setup_domain_device( unsigned int req_id, sr_flags; int rc; u8 bus =3D pdev->bus; - struct domain_iommu *hd =3D dom_iommu(domain); const struct ivrs_mappings *ivrs_dev; const struct page_info *root_pg; domid_t domid; @@ -141,7 +141,7 @@ static int __must_check amd_iommu_setup_domain_device( if ( QUARANTINE_SKIP(domain, pdev) ) return 0; =20 - BUG_ON(!hd->arch.amd.paging_mode || !iommu->dev_table.buffer); + BUG_ON(!ctx->arch.amd.paging_mode || !iommu->dev_table.buffer); =20 rc =3D allocate_domain_resources(domain); if ( rc ) @@ -161,7 +161,7 @@ static int __must_check amd_iommu_setup_domain_device( =20 if ( domain !=3D dom_io ) { - root_pg =3D hd->arch.amd.root_table; + root_pg =3D ctx->arch.amd.root_table; domid =3D domain->domain_id; } else @@ -177,7 +177,7 @@ static int __must_check amd_iommu_setup_domain_device( /* bind DTE to domain page-tables */ rc =3D amd_iommu_set_root_page_table( dte, page_to_maddr(root_pg), domid, - hd->arch.amd.paging_mode, sr_flags); + ctx->arch.amd.paging_mode, sr_flags); if ( rc ) { ASSERT(rc < 0); @@ -219,7 +219,7 @@ static int __must_check amd_iommu_setup_domain_device( else rc =3D amd_iommu_set_root_page_table( dte, page_to_maddr(root_pg), domid, - hd->arch.amd.paging_mode, sr_flags); + ctx->arch.amd.paging_mode, sr_flags); if ( rc < 0 ) { spin_unlock_irqrestore(&iommu->lock, flags); @@ -270,7 +270,7 @@ static int __must_check amd_iommu_setup_domain_device( "root table =3D %#"PRIx64", " "domain =3D %d, paging mode =3D %d\n", req_id, pdev->type, page_to_maddr(root_pg), - domid, hd->arch.amd.paging_mode); + domid, ctx->arch.amd.paging_mode); =20 ASSERT(pcidevs_locked()); =20 @@ -352,11 +352,12 @@ static int cf_check iov_enable_xt(void) int amd_iommu_alloc_root(struct domain *d) { struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); =20 - if ( unlikely(!hd->arch.amd.root_table) && d !=3D dom_io ) + if ( unlikely(!ctx->arch.amd.root_table) && d !=3D dom_io ) { - hd->arch.amd.root_table =3D iommu_alloc_pgtable(hd, 0); - if ( !hd->arch.amd.root_table ) + ctx->arch.amd.root_table =3D iommu_alloc_pgtable(hd, ctx, 0); + if ( !ctx->arch.amd.root_table ) return -ENOMEM; } =20 @@ -368,7 +369,7 @@ int __read_mostly amd_iommu_min_paging_mode =3D 1; =20 static int cf_check amd_iommu_domain_init(struct domain *d) { - struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); int pglvl =3D amd_iommu_get_paging_mode( 1UL << (domain_max_paddr_bits(d) - PAGE_SHIFT)); =20 @@ -379,7 +380,7 @@ static int cf_check amd_iommu_domain_init(struct domain= *d) * Choose the number of levels for the IOMMU page tables, taking into * account unity maps. */ - hd->arch.amd.paging_mode =3D max(pglvl, amd_iommu_min_paging_mode); + ctx->arch.amd.paging_mode =3D max(pglvl, amd_iommu_min_paging_mode); =20 return 0; } @@ -455,7 +456,7 @@ static void amd_iommu_disable_domain_device(const struc= t domain *domain, AMD_IOMMU_DEBUG("Disable: device id =3D %#x, " "domain =3D %d, paging mode =3D %d\n", req_id, dte->domain_id, - dom_iommu(domain)->arch.amd.paging_mode); + iommu_default_context(domain)->arch.amd.paging_mod= e); } else spin_unlock_irqrestore(&iommu->lock, flags); @@ -466,6 +467,8 @@ static int cf_check reassign_device( struct pci_dev *pdev) { struct amd_iommu *iommu; + struct iommu_context *target_ctx =3D iommu_default_context(target); + struct iommu_context *source_ctx =3D iommu_default_context(source); int rc; =20 iommu =3D find_iommu_for_device(pdev->sbdf); @@ -478,7 +481,7 @@ static int cf_check reassign_device( =20 if ( !QUARANTINE_SKIP(target, pdev) ) { - rc =3D amd_iommu_setup_domain_device(target, iommu, devfn, pdev); + rc =3D amd_iommu_setup_domain_device(target, target_ctx, iommu, de= vfn, pdev); if ( rc ) return rc; } @@ -509,7 +512,7 @@ static int cf_check reassign_device( unsigned int bdf =3D PCI_BDF(pdev->bus, devfn); =20 rc =3D amd_iommu_reserve_domain_unity_unmap( - source, + source, source_ctx, ivrs_mappings[get_dma_requestor_id(pdev->seg, bdf)].unity= _map); if ( rc ) return rc; @@ -528,7 +531,8 @@ static int cf_check amd_iommu_assign_device( unsigned int bdf =3D PCI_BDF(pdev->bus, devfn); int req_id =3D get_dma_requestor_id(pdev->seg, bdf); int rc =3D amd_iommu_reserve_domain_unity_map( - d, ivrs_mappings[req_id].unity_map, flag); + d, iommu_default_context(d), + ivrs_mappings[req_id].unity_map, flag); =20 if ( !rc ) rc =3D reassign_device(pdev->domain, d, devfn, pdev); @@ -536,7 +540,8 @@ static int cf_check amd_iommu_assign_device( if ( rc && !is_hardware_domain(d) ) { int ret =3D amd_iommu_reserve_domain_unity_unmap( - d, ivrs_mappings[req_id].unity_map); + d, iommu_default_context(d), + ivrs_mappings[req_id].unity_map); =20 if ( ret ) { @@ -553,22 +558,25 @@ static int cf_check amd_iommu_assign_device( =20 static void cf_check amd_iommu_clear_root_pgtable(struct domain *d) { - struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); =20 - spin_lock(&hd->arch.mapping_lock); - hd->arch.amd.root_table =3D NULL; - spin_unlock(&hd->arch.mapping_lock); + spin_lock(&ctx->arch.mapping_lock); + ctx->arch.amd.root_table =3D NULL; + spin_unlock(&ctx->arch.mapping_lock); } =20 static void cf_check amd_iommu_domain_destroy(struct domain *d) { - iommu_identity_map_teardown(d); - ASSERT(!dom_iommu(d)->arch.amd.root_table); + struct iommu_context *ctx =3D iommu_default_context(d); + + iommu_identity_map_teardown(d, ctx); + ASSERT(!ctx->arch.amd.root_table); } =20 static int cf_check amd_iommu_add_device(u8 devfn, struct pci_dev *pdev) { struct amd_iommu *iommu; + struct iommu_context *ctx; u16 bdf; struct ivrs_mappings *ivrs_mappings; bool fresh_domid =3D false; @@ -577,6 +585,8 @@ static int cf_check amd_iommu_add_device(u8 devfn, stru= ct pci_dev *pdev) if ( !pdev->domain ) return -EINVAL; =20 + ctx =3D iommu_default_context(pdev->domain); + for_each_amd_iommu(iommu) if ( pdev->sbdf.sbdf =3D=3D iommu->sbdf.sbdf ) return is_hardware_domain(pdev->domain) ? 0 : -ENODEV; @@ -633,7 +643,7 @@ static int cf_check amd_iommu_add_device(u8 devfn, stru= ct pci_dev *pdev) } =20 if ( amd_iommu_reserve_domain_unity_map( - pdev->domain, + pdev->domain, ctx, ivrs_mappings[ivrs_mappings[bdf].dte_requestor_id].unity_map, 0) ) AMD_IOMMU_WARN("%pd: unity mapping failed for %pp\n", @@ -647,7 +657,7 @@ static int cf_check amd_iommu_add_device(u8 devfn, stru= ct pci_dev *pdev) fresh_domid =3D true; } =20 - ret =3D amd_iommu_setup_domain_device(pdev->domain, iommu, devfn, pdev= ); + ret =3D amd_iommu_setup_domain_device(pdev->domain, ctx, iommu, devfn,= pdev); if ( ret && fresh_domid ) { iommu_free_domid(pdev->arch.pseudo_domid, iommu->domid_map); @@ -660,12 +670,15 @@ static int cf_check amd_iommu_add_device(u8 devfn, st= ruct pci_dev *pdev) static int cf_check amd_iommu_remove_device(u8 devfn, struct pci_dev *pdev) { struct amd_iommu *iommu; + struct iommu_context *ctx; u16 bdf; struct ivrs_mappings *ivrs_mappings; =20 if ( !pdev->domain ) return -EINVAL; =20 + ctx =3D iommu_default_context(pdev->domain); + iommu =3D find_iommu_for_device(pdev->sbdf); if ( !iommu ) { @@ -680,7 +693,7 @@ static int cf_check amd_iommu_remove_device(u8 devfn, s= truct pci_dev *pdev) bdf =3D PCI_BDF(pdev->bus, devfn); =20 if ( amd_iommu_reserve_domain_unity_unmap( - pdev->domain, + pdev->domain, ctx, ivrs_mappings[ivrs_mappings[bdf].dte_requestor_id].unity_map)= ) AMD_IOMMU_WARN("%pd: unity unmapping failed for %pp\n", pdev->domain, &PCI_SBDF(pdev->seg, bdf)); @@ -755,14 +768,14 @@ static void amd_dump_page_table_level(struct page_inf= o *pg, int level, =20 static void cf_check amd_dump_page_tables(struct domain *d) { - const struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); =20 - if ( !hd->arch.amd.root_table ) + if ( !ctx->arch.amd.root_table ) return; =20 - printk("AMD IOMMU %pd table has %u levels\n", d, hd->arch.amd.paging_m= ode); - amd_dump_page_table_level(hd->arch.amd.root_table, - hd->arch.amd.paging_mode, 0, 0); + printk("AMD IOMMU %pd table has %u levels\n", d, ctx->arch.amd.paging_= mode); + amd_dump_page_table_level(ctx->arch.amd.root_table, + ctx->arch.amd.paging_mode, 0, 0); } =20 static const struct iommu_ops __initconst_cf_clobber _iommu_ops =3D { diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iomm= u.c index c9425d6971..32c5011820 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -403,12 +403,15 @@ long iommu_unmap(struct domain *d, dfn_t dfn0, unsign= ed long page_count, unsigned long i; unsigned int order, j =3D 0; int rc =3D 0; + struct iommu_context *ctx; =20 if ( !is_iommu_enabled(d) ) return 0; =20 ASSERT(!(flags & ~IOMMUF_preempt)); =20 + ctx =3D iommu_default_context(d); + for ( i =3D 0; i < page_count; i +=3D 1UL << order ) { dfn_t dfn =3D dfn_add(dfn0, i); @@ -468,10 +471,13 @@ int iommu_lookup_page(struct domain *d, dfn_t dfn, mf= n_t *mfn, unsigned int *flags) { const struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx; =20 if ( !is_iommu_enabled(d) || !hd->platform_ops->lookup_page ) return -EOPNOTSUPP; =20 + ctx =3D iommu_default_context(d); + return iommu_call(hd->platform_ops, lookup_page, d, dfn, mfn, flags); } =20 diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough= /vtd/extern.h index c16583c951..3dcb77c711 100644 --- a/xen/drivers/passthrough/vtd/extern.h +++ b/xen/drivers/passthrough/vtd/extern.h @@ -80,8 +80,8 @@ uint64_t alloc_pgtable_maddr(unsigned long npages, nodeid= _t node); void free_pgtable_maddr(u64 maddr); void *map_vtd_domain_page(u64 maddr); void unmap_vtd_domain_page(const void *va); -int domain_context_mapping_one(struct domain *domain, struct vtd_iommu *io= mmu, - uint8_t bus, uint8_t devfn, +int domain_context_mapping_one(struct domain *domain, struct iommu_context= *ctx, + struct vtd_iommu *iommu, uint8_t bus, uint8= _t devfn, const struct pci_dev *pdev, domid_t domid, paddr_t pgd_maddr, unsigned int mode); int domain_context_unmap_one(struct domain *domain, struct vtd_iommu *iomm= u, diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/= vtd/iommu.c index 90f36ac22b..9252c3e0f3 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -54,7 +54,7 @@ #define DEVICE_DOMID(d, pdev) ((d) !=3D dom_io ? (d)->domain_id \ : (pdev)->arch.pseudo_domid) #define DEVICE_PGTABLE(d, pdev) ((d) !=3D dom_io \ - ? dom_iommu(d)->arch.vtd.pgd_maddr \ + ? iommu_default_context(d)->arch.vtd.pgd_= maddr \ : (pdev)->arch.vtd.pgd_maddr) =20 bool __read_mostly iommu_igfx =3D true; @@ -227,7 +227,7 @@ static void check_cleanup_domid_map(const struct domain= *d, =20 if ( !found ) { - clear_bit(iommu->index, dom_iommu(d)->arch.vtd.iommu_bitmap); + clear_bit(iommu->index, iommu_default_context(d)->arch.vtd.iommu_b= itmap); cleanup_domid_map(d->domain_id, iommu); } } @@ -315,8 +315,9 @@ static u64 bus_to_context_maddr(struct vtd_iommu *iommu= , u8 bus) * PTE for the requested address, * - for target =3D=3D 0 the full PTE contents below PADDR_BITS limit. */ -static uint64_t addr_to_dma_page_maddr(struct domain *domain, daddr_t addr, - unsigned int target, +static uint64_t addr_to_dma_page_maddr(struct domain *domain, + struct iommu_context *ctx, + daddr_t addr, unsigned int target, unsigned int *flush_flags, bool all= oc) { struct domain_iommu *hd =3D dom_iommu(domain); @@ -326,10 +327,10 @@ static uint64_t addr_to_dma_page_maddr(struct domain = *domain, daddr_t addr, u64 pte_maddr =3D 0; =20 addr &=3D (((u64)1) << addr_width) - 1; - ASSERT(spin_is_locked(&hd->arch.mapping_lock)); + ASSERT(spin_is_locked(&ctx->arch.mapping_lock)); ASSERT(target || !alloc); =20 - if ( !hd->arch.vtd.pgd_maddr ) + if ( !ctx->arch.vtd.pgd_maddr ) { struct page_info *pg; =20 @@ -337,13 +338,13 @@ static uint64_t addr_to_dma_page_maddr(struct domain = *domain, daddr_t addr, goto out; =20 pte_maddr =3D level; - if ( !(pg =3D iommu_alloc_pgtable(hd, 0)) ) + if ( !(pg =3D iommu_alloc_pgtable(hd, ctx, 0)) ) goto out; =20 - hd->arch.vtd.pgd_maddr =3D page_to_maddr(pg); + ctx->arch.vtd.pgd_maddr =3D page_to_maddr(pg); } =20 - pte_maddr =3D hd->arch.vtd.pgd_maddr; + pte_maddr =3D ctx->arch.vtd.pgd_maddr; parent =3D map_vtd_domain_page(pte_maddr); while ( level > target ) { @@ -379,7 +380,7 @@ static uint64_t addr_to_dma_page_maddr(struct domain *d= omain, daddr_t addr, } =20 pte_maddr =3D level - 1; - pg =3D iommu_alloc_pgtable(hd, DMA_PTE_CONTIG_MASK); + pg =3D iommu_alloc_pgtable(hd, ctx, DMA_PTE_CONTIG_MASK); if ( !pg ) break; =20 @@ -431,13 +432,12 @@ static uint64_t addr_to_dma_page_maddr(struct domain = *domain, daddr_t addr, return pte_maddr; } =20 -static paddr_t domain_pgd_maddr(struct domain *d, paddr_t pgd_maddr, - unsigned int nr_pt_levels) +static paddr_t domain_pgd_maddr(struct domain *d, struct iommu_context *ct= x, + paddr_t pgd_maddr, unsigned int nr_pt_leve= ls) { - struct domain_iommu *hd =3D dom_iommu(d); unsigned int agaw; =20 - ASSERT(spin_is_locked(&hd->arch.mapping_lock)); + ASSERT(spin_is_locked(&ctx->arch.mapping_lock)); =20 if ( pgd_maddr ) /* nothing */; @@ -449,19 +449,19 @@ static paddr_t domain_pgd_maddr(struct domain *d, pad= dr_t pgd_maddr, } else { - if ( !hd->arch.vtd.pgd_maddr ) + if ( !ctx->arch.vtd.pgd_maddr ) { /* * Ensure we have pagetables allocated down to the smallest * level the loop below may need to run to. */ - addr_to_dma_page_maddr(d, 0, min_pt_levels, NULL, true); + addr_to_dma_page_maddr(d, ctx, 0, min_pt_levels, NULL, true); =20 - if ( !hd->arch.vtd.pgd_maddr ) + if ( !ctx->arch.vtd.pgd_maddr ) return 0; } =20 - pgd_maddr =3D hd->arch.vtd.pgd_maddr; + pgd_maddr =3D ctx->arch.vtd.pgd_maddr; } =20 /* Skip top level(s) of page tables for less-than-maximum level DRHDs.= */ @@ -735,7 +735,7 @@ static int __must_check cf_check iommu_flush_iotlb(stru= ct domain *d, dfn_t dfn, unsigned long page_coun= t, unsigned int flush_flag= s) { - struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); struct acpi_drhd_unit *drhd; struct vtd_iommu *iommu; bool flush_dev_iotlb; @@ -763,7 +763,7 @@ static int __must_check cf_check iommu_flush_iotlb(stru= ct domain *d, dfn_t dfn, =20 iommu =3D drhd->iommu; =20 - if ( !test_bit(iommu->index, hd->arch.vtd.iommu_bitmap) ) + if ( !test_bit(iommu->index, ctx->arch.vtd.iommu_bitmap) ) continue; =20 flush_dev_iotlb =3D !!find_ats_dev_drhd(iommu); @@ -791,7 +791,8 @@ static int __must_check cf_check iommu_flush_iotlb(stru= ct domain *d, dfn_t dfn, return ret; } =20 -static void queue_free_pt(struct domain_iommu *hd, mfn_t mfn, unsigned int= level) +static void queue_free_pt(struct domain *d, struct iommu_context *ctx, mfn= _t mfn, + unsigned int level) { if ( level > 1 ) { @@ -800,13 +801,13 @@ static void queue_free_pt(struct domain_iommu *hd, mf= n_t mfn, unsigned int level =20 for ( i =3D 0; i < PTE_NUM; ++i ) if ( dma_pte_present(pt[i]) && !dma_pte_superpage(pt[i]) ) - queue_free_pt(hd, maddr_to_mfn(dma_pte_addr(pt[i])), + queue_free_pt(d, ctx, maddr_to_mfn(dma_pte_addr(pt[i])), level - 1); =20 unmap_domain_page(pt); } =20 - iommu_queue_free_pgtable(hd, mfn_to_page(mfn)); + iommu_queue_free_pgtable(d, ctx, mfn_to_page(mfn)); } =20 static int iommu_set_root_entry(struct vtd_iommu *iommu) @@ -1436,10 +1437,11 @@ void __init iommu_free(struct acpi_drhd_unit *drhd) static int cf_check intel_iommu_domain_init(struct domain *d) { struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); =20 - hd->arch.vtd.iommu_bitmap =3D xzalloc_array(unsigned long, + ctx->arch.vtd.iommu_bitmap =3D xzalloc_array(unsigned long, BITS_TO_LONGS(nr_iommus)); - if ( !hd->arch.vtd.iommu_bitmap ) + if ( !ctx->arch.vtd.iommu_bitmap ) return -ENOMEM; =20 hd->arch.vtd.agaw =3D width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH); @@ -1480,11 +1482,11 @@ static void __hwdom_init cf_check intel_iommu_hwdom= _init(struct domain *d) */ int domain_context_mapping_one( struct domain *domain, + struct iommu_context *ctx, struct vtd_iommu *iommu, uint8_t bus, uint8_t devfn, const struct pci_dev *pdev, domid_t domid, paddr_t pgd_maddr, unsigned int mode) { - struct domain_iommu *hd =3D dom_iommu(domain); struct context_entry *context, *context_entries, lctxt; __uint128_t res, old; uint64_t maddr; @@ -1537,12 +1539,12 @@ int domain_context_mapping_one( { paddr_t root; =20 - spin_lock(&hd->arch.mapping_lock); + spin_lock(&ctx->arch.mapping_lock); =20 - root =3D domain_pgd_maddr(domain, pgd_maddr, iommu->nr_pt_levels); + root =3D domain_pgd_maddr(domain, ctx, pgd_maddr, iommu->nr_pt_lev= els); if ( !root ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); spin_unlock(&iommu->lock); unmap_vtd_domain_page(context_entries); if ( prev_dom ) @@ -1556,7 +1558,7 @@ int domain_context_mapping_one( else context_set_translation_type(lctxt, CONTEXT_TT_MULTI_LEVEL); =20 - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); } =20 rc =3D context_set_domain_id(&lctxt, domid, iommu); @@ -1630,7 +1632,7 @@ int domain_context_mapping_one( if ( rc > 0 ) rc =3D 0; =20 - set_bit(iommu->index, hd->arch.vtd.iommu_bitmap); + set_bit(iommu->index, ctx->arch.vtd.iommu_bitmap); =20 unmap_vtd_domain_page(context_entries); =20 @@ -1648,7 +1650,7 @@ int domain_context_mapping_one( (prev_dom =3D=3D dom_io && !pdev) ) ret =3D domain_context_unmap_one(domain, iommu, bus, devfn); else - ret =3D domain_context_mapping_one(prev_dom, iommu, bus, devfn= , pdev, + ret =3D domain_context_mapping_one(prev_dom, ctx, iommu, bus, = devfn, pdev, DEVICE_DOMID(prev_dom, pdev), DEVICE_PGTABLE(prev_dom, pdev= ), (mode & MAP_WITH_RMRR) | @@ -1667,8 +1669,8 @@ int domain_context_mapping_one( static const struct acpi_drhd_unit *domain_context_unmap( struct domain *d, uint8_t devfn, struct pci_dev *pdev); =20 -static int domain_context_mapping(struct domain *domain, u8 devfn, - struct pci_dev *pdev) +static int domain_context_mapping(struct domain *domain, struct iommu_cont= ext *ctx, + u8 devfn, struct pci_dev *pdev) { const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); const struct acpi_rmrr_unit *rmrr; @@ -1737,7 +1739,7 @@ static int domain_context_mapping(struct domain *doma= in, u8 devfn, if ( iommu_debug ) printk(VTDPREFIX "%pd:PCIe: map %pp\n", domain, &PCI_SBDF(seg, bus, devfn)); - ret =3D domain_context_mapping_one(domain, drhd->iommu, bus, devfn= , pdev, + ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, bus, = devfn, pdev, DEVICE_DOMID(domain, pdev), pgd_m= addr, mode); if ( ret > 0 ) @@ -1763,7 +1765,7 @@ static int domain_context_mapping(struct domain *doma= in, u8 devfn, printk(VTDPREFIX "%pd:PCI: map %pp\n", domain, &PCI_SBDF(seg, bus, devfn)); =20 - ret =3D domain_context_mapping_one(domain, drhd->iommu, bus, devfn, + ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, bus, = devfn, pdev, DEVICE_DOMID(domain, pdev), pgd_maddr, mode); if ( ret < 0 ) @@ -1794,7 +1796,7 @@ static int domain_context_mapping(struct domain *doma= in, u8 devfn, * their owner would be the wrong one. Pass NULL instead. */ if ( ret >=3D 0 ) - ret =3D domain_context_mapping_one(domain, drhd->iommu, bus, d= evfn, + ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, b= us, devfn, NULL, DEVICE_DOMID(domain, pd= ev), pgd_maddr, mode); =20 @@ -1810,7 +1812,7 @@ static int domain_context_mapping(struct domain *doma= in, u8 devfn, */ if ( !ret && pdev_type(seg, bus, devfn) =3D=3D DEV_TYPE_PCIe2PCI_B= RIDGE && (secbus !=3D pdev->bus || pdev->devfn !=3D 0) ) - ret =3D domain_context_mapping_one(domain, drhd->iommu, secbus= , 0, + ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, s= ecbus, 0, NULL, DEVICE_DOMID(domain, pd= ev), pgd_maddr, mode); =20 @@ -1819,7 +1821,7 @@ static int domain_context_mapping(struct domain *doma= in, u8 devfn, if ( !prev_present ) domain_context_unmap(domain, devfn, pdev); else if ( pdev->domain !=3D domain ) /* Avoid infinite recursi= on. */ - domain_context_mapping(pdev->domain, devfn, pdev); + domain_context_mapping(pdev->domain, ctx, devfn, pdev); } =20 break; @@ -2013,44 +2015,44 @@ static const struct acpi_drhd_unit *domain_context_= unmap( =20 static void cf_check iommu_clear_root_pgtable(struct domain *d) { - struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); =20 - spin_lock(&hd->arch.mapping_lock); - hd->arch.vtd.pgd_maddr =3D 0; - spin_unlock(&hd->arch.mapping_lock); + spin_lock(&ctx->arch.mapping_lock); + ctx->arch.vtd.pgd_maddr =3D 0; + spin_unlock(&ctx->arch.mapping_lock); } =20 static void cf_check iommu_domain_teardown(struct domain *d) { - struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); const struct acpi_drhd_unit *drhd; =20 if ( list_empty(&acpi_drhd_units) ) return; =20 - iommu_identity_map_teardown(d); + iommu_identity_map_teardown(d, ctx); =20 - ASSERT(!hd->arch.vtd.pgd_maddr); + ASSERT(!ctx->arch.vtd.pgd_maddr); =20 for_each_drhd_unit ( drhd ) cleanup_domid_map(d->domain_id, drhd->iommu); =20 - XFREE(hd->arch.vtd.iommu_bitmap); + XFREE(ctx->arch.vtd.iommu_bitmap); } =20 static void quarantine_teardown(struct pci_dev *pdev, const struct acpi_drhd_unit *drhd) { - struct domain_iommu *hd =3D dom_iommu(dom_io); + struct iommu_context *ctx =3D iommu_default_context(dom_io); =20 ASSERT(pcidevs_locked()); =20 if ( !pdev->arch.vtd.pgd_maddr ) return; =20 - ASSERT(page_list_empty(&hd->arch.pgtables.list)); - page_list_move(&hd->arch.pgtables.list, &pdev->arch.pgtables_list); - while ( iommu_free_pgtables(dom_io) =3D=3D -ERESTART ) + ASSERT(page_list_empty(&ctx->arch.pgtables)); + page_list_move(&ctx->arch.pgtables, &pdev->arch.pgtables_list); + while ( iommu_free_pgtables(dom_io, ctx) =3D=3D -ERESTART ) /* nothing */; pdev->arch.vtd.pgd_maddr =3D 0; =20 @@ -2063,6 +2065,7 @@ static int __must_check cf_check intel_iommu_map_page( unsigned int *flush_flags) { struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); struct dma_pte *page, *pte, old, new =3D {}; u64 pg_maddr; unsigned int level =3D (IOMMUF_order(flags) / LEVEL_STRIDE) + 1; @@ -2079,7 +2082,7 @@ static int __must_check cf_check intel_iommu_map_page( if ( iommu_hwdom_passthrough && is_hardware_domain(d) ) return 0; =20 - spin_lock(&hd->arch.mapping_lock); + spin_lock(&ctx->arch.mapping_lock); =20 /* * IOMMU mapping request can be safely ignored when the domain is dyin= g. @@ -2089,15 +2092,15 @@ static int __must_check cf_check intel_iommu_map_pa= ge( */ if ( d->is_dying ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); return 0; } =20 - pg_maddr =3D addr_to_dma_page_maddr(d, dfn_to_daddr(dfn), level, flush= _flags, + pg_maddr =3D addr_to_dma_page_maddr(d, ctx, dfn_to_daddr(dfn), level, = flush_flags, true); if ( pg_maddr < PAGE_SIZE ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); return -ENOMEM; } =20 @@ -2118,7 +2121,7 @@ static int __must_check cf_check intel_iommu_map_page( =20 if ( !((old.val ^ new.val) & ~DMA_PTE_CONTIG_MASK) ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); unmap_vtd_domain_page(page); return 0; } @@ -2147,7 +2150,7 @@ static int __must_check cf_check intel_iommu_map_page( new.val &=3D ~(LEVEL_MASK << level_to_offset_bits(level)); dma_set_pte_superpage(new); =20 - pg_maddr =3D addr_to_dma_page_maddr(d, dfn_to_daddr(dfn), ++level, + pg_maddr =3D addr_to_dma_page_maddr(d, ctx, dfn_to_daddr(dfn), ++l= evel, flush_flags, false); BUG_ON(pg_maddr < PAGE_SIZE); =20 @@ -2157,11 +2160,11 @@ static int __must_check cf_check intel_iommu_map_pa= ge( iommu_sync_cache(pte, sizeof(*pte)); =20 *flush_flags |=3D IOMMU_FLUSHF_modified | IOMMU_FLUSHF_all; - iommu_queue_free_pgtable(hd, pg); + iommu_queue_free_pgtable(d, ctx, pg); perfc_incr(iommu_pt_coalesces); } =20 - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); unmap_vtd_domain_page(page); =20 *flush_flags |=3D IOMMU_FLUSHF_added; @@ -2170,7 +2173,7 @@ static int __must_check cf_check intel_iommu_map_page( *flush_flags |=3D IOMMU_FLUSHF_modified; =20 if ( IOMMUF_order(flags) && !dma_pte_superpage(old) ) - queue_free_pt(hd, maddr_to_mfn(dma_pte_addr(old)), + queue_free_pt(d, ctx, maddr_to_mfn(dma_pte_addr(old)), IOMMUF_order(flags) / LEVEL_STRIDE); } =20 @@ -2181,6 +2184,7 @@ static int __must_check cf_check intel_iommu_unmap_pa= ge( struct domain *d, dfn_t dfn, unsigned int order, unsigned int *flush_f= lags) { struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); daddr_t addr =3D dfn_to_daddr(dfn); struct dma_pte *page =3D NULL, *pte =3D NULL, old; uint64_t pg_maddr; @@ -2200,12 +2204,12 @@ static int __must_check cf_check intel_iommu_unmap_= page( if ( iommu_hwdom_passthrough && is_hardware_domain(d) ) return 0; =20 - spin_lock(&hd->arch.mapping_lock); + spin_lock(&ctx->arch.mapping_lock); /* get target level pte */ - pg_maddr =3D addr_to_dma_page_maddr(d, addr, level, flush_flags, false= ); + pg_maddr =3D addr_to_dma_page_maddr(d, ctx, addr, level, flush_flags, = false); if ( pg_maddr < PAGE_SIZE ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); return pg_maddr ? -ENOMEM : 0; } =20 @@ -2214,7 +2218,7 @@ static int __must_check cf_check intel_iommu_unmap_pa= ge( =20 if ( !dma_pte_present(*pte) ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); unmap_vtd_domain_page(page); return 0; } @@ -2232,7 +2236,7 @@ static int __must_check cf_check intel_iommu_unmap_pa= ge( =20 unmap_vtd_domain_page(page); =20 - pg_maddr =3D addr_to_dma_page_maddr(d, addr, level, flush_flags, f= alse); + pg_maddr =3D addr_to_dma_page_maddr(d, ctx, addr, level, flush_fla= gs, false); BUG_ON(pg_maddr < PAGE_SIZE); =20 page =3D map_vtd_domain_page(pg_maddr); @@ -2241,18 +2245,18 @@ static int __must_check cf_check intel_iommu_unmap_= page( iommu_sync_cache(pte, sizeof(*pte)); =20 *flush_flags |=3D IOMMU_FLUSHF_all; - iommu_queue_free_pgtable(hd, pg); + iommu_queue_free_pgtable(d, ctx, pg); perfc_incr(iommu_pt_coalesces); } =20 - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); =20 unmap_vtd_domain_page(page); =20 *flush_flags |=3D IOMMU_FLUSHF_modified; =20 if ( order && !dma_pte_superpage(old) ) - queue_free_pt(hd, maddr_to_mfn(dma_pte_addr(old)), + queue_free_pt(d, ctx, maddr_to_mfn(dma_pte_addr(old)), order / LEVEL_STRIDE); =20 return 0; @@ -2261,7 +2265,7 @@ static int __must_check cf_check intel_iommu_unmap_pa= ge( static int cf_check intel_iommu_lookup_page( struct domain *d, dfn_t dfn, mfn_t *mfn, unsigned int *flags) { - struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); uint64_t val; =20 /* @@ -2272,11 +2276,11 @@ static int cf_check intel_iommu_lookup_page( (iommu_hwdom_passthrough && is_hardware_domain(d)) ) return -EOPNOTSUPP; =20 - spin_lock(&hd->arch.mapping_lock); + spin_lock(&ctx->arch.mapping_lock); =20 - val =3D addr_to_dma_page_maddr(d, dfn_to_daddr(dfn), 0, NULL, false); + val =3D addr_to_dma_page_maddr(d, ctx, dfn_to_daddr(dfn), 0, NULL, fal= se); =20 - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); =20 if ( val < PAGE_SIZE ) return -ENOENT; @@ -2309,6 +2313,7 @@ static bool __init vtd_ept_page_compatible(const stru= ct vtd_iommu *iommu) static int cf_check intel_iommu_add_device(u8 devfn, struct pci_dev *pdev) { struct acpi_rmrr_unit *rmrr; + struct iommu_context *ctx; u16 bdf; int ret, i; =20 @@ -2317,6 +2322,8 @@ static int cf_check intel_iommu_add_device(u8 devfn, = struct pci_dev *pdev) if ( !pdev->domain ) return -EINVAL; =20 + ctx =3D iommu_default_context(pdev->domain); + for_each_rmrr_device ( rmrr, bdf, i ) { if ( rmrr->segment =3D=3D pdev->seg && bdf =3D=3D PCI_BDF(pdev->bu= s, devfn) ) @@ -2327,7 +2334,7 @@ static int cf_check intel_iommu_add_device(u8 devfn, = struct pci_dev *pdev) * Since RMRRs are always reserved in the e820 map for the har= dware * domain, there shouldn't be a conflict. */ - ret =3D iommu_identity_mapping(pdev->domain, p2m_access_rw, + ret =3D iommu_identity_mapping(pdev->domain, ctx, p2m_access_r= w, rmrr->base_address, rmrr->end_add= ress, 0); if ( ret ) @@ -2336,7 +2343,7 @@ static int cf_check intel_iommu_add_device(u8 devfn, = struct pci_dev *pdev) } } =20 - ret =3D domain_context_mapping(pdev->domain, devfn, pdev); + ret =3D domain_context_mapping(pdev->domain, ctx, devfn, pdev); if ( ret ) dprintk(XENLOG_ERR VTDPREFIX, "%pd: context mapping failed\n", pdev->domain); @@ -2365,10 +2372,13 @@ static int cf_check intel_iommu_remove_device(u8 de= vfn, struct pci_dev *pdev) struct acpi_rmrr_unit *rmrr; u16 bdf; unsigned int i; + struct iommu_context *ctx; =20 if ( !pdev->domain ) return -EINVAL; =20 + ctx =3D iommu_default_context(pdev->domain); + drhd =3D domain_context_unmap(pdev->domain, devfn, pdev); if ( IS_ERR(drhd) ) return PTR_ERR(drhd); @@ -2382,7 +2392,7 @@ static int cf_check intel_iommu_remove_device(u8 devf= n, struct pci_dev *pdev) * Any flag is nothing to clear these mappings but here * its always safe and strict to set 0. */ - iommu_identity_mapping(pdev->domain, p2m_access_x, rmrr->base_addr= ess, + iommu_identity_mapping(pdev->domain, ctx, p2m_access_x, rmrr->base= _address, rmrr->end_address, 0); } =20 @@ -2401,7 +2411,9 @@ static int cf_check intel_iommu_remove_device(u8 devf= n, struct pci_dev *pdev) static int __hwdom_init cf_check setup_hwdom_device( u8 devfn, struct pci_dev *pdev) { - return domain_context_mapping(pdev->domain, devfn, pdev); + struct iommu_context *ctx =3D iommu_default_context(pdev->domain); + + return domain_context_mapping(pdev->domain, ctx, devfn, pdev); } =20 void clear_fault_bits(struct vtd_iommu *iommu) @@ -2495,7 +2507,7 @@ static int __must_check init_vtd_hw(bool resume) =20 /* * Enable queue invalidation - */ =20 + */ for_each_drhd_unit ( drhd ) { iommu =3D drhd->iommu; @@ -2516,7 +2528,7 @@ static int __must_check init_vtd_hw(bool resume) =20 /* * Enable interrupt remapping - */ =20 + */ if ( iommu_intremap !=3D iommu_intremap_off ) { int apic; @@ -2573,6 +2585,7 @@ static int __must_check init_vtd_hw(bool resume) =20 static void __hwdom_init setup_hwdom_rmrr(struct domain *d) { + struct iommu_context *ctx =3D iommu_default_context(d); struct acpi_rmrr_unit *rmrr; u16 bdf; int ret, i; @@ -2586,7 +2599,7 @@ static void __hwdom_init setup_hwdom_rmrr(struct doma= in *d) * domain, there shouldn't be a conflict. So its always safe and * strict to set 0. */ - ret =3D iommu_identity_mapping(d, p2m_access_rw, rmrr->base_addres= s, + ret =3D iommu_identity_mapping(d, ctx, p2m_access_rw, rmrr->base_a= ddress, rmrr->end_address, 0); if ( ret ) dprintk(XENLOG_ERR VTDPREFIX, @@ -2751,6 +2764,8 @@ static int cf_check reassign_device_ownership( =20 if ( !QUARANTINE_SKIP(target, pdev->arch.vtd.pgd_maddr) ) { + struct iommu_context *target_ctx =3D iommu_default_context(target); + if ( !has_arch_pdevs(target) ) vmx_pi_hooks_assign(target); =20 @@ -2765,7 +2780,7 @@ static int cf_check reassign_device_ownership( untrusted_msi =3D true; #endif =20 - ret =3D domain_context_mapping(target, devfn, pdev); + ret =3D domain_context_mapping(target, target_ctx, devfn, pdev); =20 if ( !ret && pdev->devfn =3D=3D devfn && !QUARANTINE_SKIP(source, pdev->arch.vtd.pgd_maddr) ) @@ -2814,6 +2829,7 @@ static int cf_check reassign_device_ownership( if ( !is_hardware_domain(source) ) { const struct acpi_rmrr_unit *rmrr; + struct iommu_context *ctx =3D iommu_default_context(source); u16 bdf; unsigned int i; =20 @@ -2825,7 +2841,7 @@ static int cf_check reassign_device_ownership( * Any RMRR flag is always ignored when remove a device, * but its always safe and strict to set 0. */ - ret =3D iommu_identity_mapping(source, p2m_access_x, + ret =3D iommu_identity_mapping(source, ctx, p2m_access_x, rmrr->base_address, rmrr->end_address, 0); if ( ret && ret !=3D -ENOENT ) @@ -2840,6 +2856,7 @@ static int cf_check intel_iommu_assign_device( struct domain *d, u8 devfn, struct pci_dev *pdev, u32 flag) { struct domain *s =3D pdev->domain; + struct iommu_context *ctx =3D iommu_default_context(d); struct acpi_rmrr_unit *rmrr; int ret =3D 0, i; u16 bdf, seg; @@ -2887,7 +2904,7 @@ static int cf_check intel_iommu_assign_device( { if ( rmrr->segment =3D=3D seg && bdf =3D=3D PCI_BDF(bus, devfn) ) { - ret =3D iommu_identity_mapping(d, p2m_access_rw, rmrr->base_ad= dress, + ret =3D iommu_identity_mapping(d, ctx, p2m_access_rw, rmrr->ba= se_address, rmrr->end_address, flag); if ( ret ) { @@ -2910,7 +2927,7 @@ static int cf_check intel_iommu_assign_device( { if ( rmrr->segment =3D=3D seg && bdf =3D=3D PCI_BDF(bus, devfn) ) { - int rc =3D iommu_identity_mapping(d, p2m_access_x, + int rc =3D iommu_identity_mapping(d, ctx, p2m_access_x, rmrr->base_address, rmrr->end_address, 0); =20 @@ -3083,10 +3100,11 @@ static void vtd_dump_page_table_level(paddr_t pt_ma= ddr, int level, paddr_t gpa, static void cf_check vtd_dump_page_tables(struct domain *d) { const struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); =20 printk(VTDPREFIX" %pd table has %d levels\n", d, agaw_to_level(hd->arch.vtd.agaw)); - vtd_dump_page_table_level(hd->arch.vtd.pgd_maddr, + vtd_dump_page_table_level(ctx->arch.vtd.pgd_maddr, agaw_to_level(hd->arch.vtd.agaw), 0, 0); } =20 @@ -3094,6 +3112,7 @@ static int fill_qpt(struct dma_pte *this, unsigned in= t level, struct page_info *pgs[6]) { struct domain_iommu *hd =3D dom_iommu(dom_io); + struct iommu_context *ctx =3D iommu_default_context(dom_io); unsigned int i; int rc =3D 0; =20 @@ -3110,7 +3129,7 @@ static int fill_qpt(struct dma_pte *this, unsigned in= t level, * page table pages, and the resulting allocations are alw= ays * zeroed. */ - pgs[level] =3D iommu_alloc_pgtable(hd, 0); + pgs[level] =3D iommu_alloc_pgtable(hd, ctx, 0); if ( !pgs[level] ) { rc =3D -ENOMEM; @@ -3144,6 +3163,7 @@ static int cf_check intel_iommu_quarantine_init(struc= t pci_dev *pdev, bool scratch_page) { struct domain_iommu *hd =3D dom_iommu(dom_io); + struct iommu_context *ctx =3D iommu_default_context(dom_io); struct page_info *pg; unsigned int agaw =3D hd->arch.vtd.agaw; unsigned int level =3D agaw_to_level(agaw); @@ -3154,8 +3174,8 @@ static int cf_check intel_iommu_quarantine_init(struc= t pci_dev *pdev, int rc; =20 ASSERT(pcidevs_locked()); - ASSERT(!hd->arch.vtd.pgd_maddr); - ASSERT(page_list_empty(&hd->arch.pgtables.list)); + ASSERT(!ctx->arch.vtd.pgd_maddr); + ASSERT(page_list_empty(&ctx->arch.pgtables)); =20 if ( pdev->arch.vtd.pgd_maddr ) { @@ -3167,14 +3187,14 @@ static int cf_check intel_iommu_quarantine_init(str= uct pci_dev *pdev, if ( !drhd ) return -ENODEV; =20 - pg =3D iommu_alloc_pgtable(hd, 0); + pg =3D iommu_alloc_pgtable(hd, ctx, 0); if ( !pg ) return -ENOMEM; =20 rc =3D context_set_domain_id(NULL, pdev->arch.pseudo_domid, drhd->iomm= u); =20 /* Transiently install the root into DomIO, for iommu_identity_mapping= (). */ - hd->arch.vtd.pgd_maddr =3D page_to_maddr(pg); + ctx->arch.vtd.pgd_maddr =3D page_to_maddr(pg); =20 for_each_rmrr_device ( rmrr, bdf, i ) { @@ -3185,7 +3205,7 @@ static int cf_check intel_iommu_quarantine_init(struc= t pci_dev *pdev, { rmrr_found =3D true; =20 - rc =3D iommu_identity_mapping(dom_io, p2m_access_rw, + rc =3D iommu_identity_mapping(dom_io, ctx, p2m_access_rw, rmrr->base_address, rmrr->end_addr= ess, 0); if ( rc ) @@ -3195,8 +3215,8 @@ static int cf_check intel_iommu_quarantine_init(struc= t pci_dev *pdev, } } =20 - iommu_identity_map_teardown(dom_io); - hd->arch.vtd.pgd_maddr =3D 0; + iommu_identity_map_teardown(dom_io, ctx); + ctx->arch.vtd.pgd_maddr =3D 0; pdev->arch.vtd.pgd_maddr =3D page_to_maddr(pg); =20 if ( !rc && scratch_page ) @@ -3211,7 +3231,7 @@ static int cf_check intel_iommu_quarantine_init(struc= t pci_dev *pdev, pdev->arch.leaf_mfn =3D page_to_mfn(pgs[0]); } =20 - page_list_move(&pdev->arch.pgtables_list, &hd->arch.pgtables.list); + page_list_move(&pdev->arch.pgtables_list, &ctx->arch.pgtables); =20 if ( rc || (!scratch_page && !rmrr_found) ) quarantine_teardown(pdev, drhd); diff --git a/xen/drivers/passthrough/vtd/quirks.c b/xen/drivers/passthrough= /vtd/quirks.c index 0a10a46d90..195fc5c08f 100644 --- a/xen/drivers/passthrough/vtd/quirks.c +++ b/xen/drivers/passthrough/vtd/quirks.c @@ -423,7 +423,8 @@ static int __must_check map_me_phantom_function(struct = domain *domain, =20 /* map or unmap ME phantom function */ if ( !(mode & UNMAP_ME_PHANTOM_FUNC) ) - rc =3D domain_context_mapping_one(domain, drhd->iommu, 0, + rc =3D domain_context_mapping_one(domain, iommu_default_context(do= main), + drhd->iommu, 0, PCI_DEVFN(dev, 7), NULL, domid, pgd_maddr, mode); else diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/= x86/iommu.c index 0954cc4922..3bc8aa3e09 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -185,26 +186,31 @@ void __hwdom_init arch_iommu_check_autotranslated_hwd= om(struct domain *d) =20 int arch_iommu_domain_init(struct domain *d) { - struct domain_iommu *hd =3D dom_iommu(d); + INIT_PAGE_LIST_HEAD(&dom_iommu(d)->arch.free_queue); + return 0; +} =20 - spin_lock_init(&hd->arch.mapping_lock); +int arch_iommu_context_init(struct domain *d, struct iommu_context *ctx, u= 32 flags) +{ + spin_lock_init(&ctx->arch.mapping_lock); =20 - INIT_PAGE_LIST_HEAD(&hd->arch.pgtables.list); - spin_lock_init(&hd->arch.pgtables.lock); - INIT_LIST_HEAD(&hd->arch.identity_maps); + INIT_PAGE_LIST_HEAD(&ctx->arch.pgtables); + INIT_LIST_HEAD(&ctx->arch.identity_maps); + + return 0; +} + +int arch_iommu_context_teardown(struct domain *d, struct iommu_context *ct= x, u32 flags) +{ + /* Cleanup all page tables */ + while ( iommu_free_pgtables(d, ctx) =3D=3D -ERESTART ) + /* nothing */; =20 return 0; } =20 void arch_iommu_domain_destroy(struct domain *d) { - /* - * There should be not page-tables left allocated by the time the - * domain is destroyed. Note that arch_iommu_domain_destroy() is - * called unconditionally, so pgtables may be uninitialized. - */ - ASSERT(!dom_iommu(d)->platform_ops || - page_list_empty(&dom_iommu(d)->arch.pgtables.list)); } =20 struct identity_map { @@ -214,14 +220,13 @@ struct identity_map { unsigned int count; }; =20 -int iommu_identity_mapping(struct domain *d, p2m_access_t p2ma, - paddr_t base, paddr_t end, +int iommu_identity_mapping(struct domain *d, struct iommu_context *ctx, + p2m_access_t p2ma, paddr_t base, paddr_t end, unsigned int flag) { unsigned long base_pfn =3D base >> PAGE_SHIFT_4K; unsigned long end_pfn =3D PAGE_ALIGN_4K(end) >> PAGE_SHIFT_4K; struct identity_map *map; - struct domain_iommu *hd =3D dom_iommu(d); =20 ASSERT(pcidevs_locked()); ASSERT(base < end); @@ -230,7 +235,7 @@ int iommu_identity_mapping(struct domain *d, p2m_access= _t p2ma, * No need to acquire hd->arch.mapping_lock: Both insertion and removal * get done while holding pcidevs_lock. */ - list_for_each_entry( map, &hd->arch.identity_maps, list ) + list_for_each_entry( map, &ctx->arch.identity_maps, list ) { if ( map->base =3D=3D base && map->end =3D=3D end ) { @@ -280,7 +285,7 @@ int iommu_identity_mapping(struct domain *d, p2m_access= _t p2ma, * Insert into list ahead of mapping, so the range can be found when * trying to clean up. */ - list_add_tail(&map->list, &hd->arch.identity_maps); + list_add_tail(&map->list, &ctx->arch.identity_maps); =20 for ( ; base_pfn < end_pfn; ++base_pfn ) { @@ -300,12 +305,11 @@ int iommu_identity_mapping(struct domain *d, p2m_acce= ss_t p2ma, return 0; } =20 -void iommu_identity_map_teardown(struct domain *d) +void iommu_identity_map_teardown(struct domain *d, struct iommu_context *c= tx) { - struct domain_iommu *hd =3D dom_iommu(d); struct identity_map *map, *tmp; =20 - list_for_each_entry_safe ( map, tmp, &hd->arch.identity_maps, list ) + list_for_each_entry_safe ( map, tmp, &ctx->arch.identity_maps, list ) { list_del(&map->list); xfree(map); @@ -582,7 +586,7 @@ void iommu_free_domid(domid_t domid, unsigned long *map) BUG(); } =20 -int iommu_free_pgtables(struct domain *d) +int iommu_free_pgtables(struct domain *d, struct iommu_context *ctx) { struct domain_iommu *hd =3D dom_iommu(d); struct page_info *pg; @@ -592,7 +596,7 @@ int iommu_free_pgtables(struct domain *d) return 0; =20 /* After this barrier, no new IOMMU mappings can be inserted. */ - spin_barrier(&hd->arch.mapping_lock); + spin_barrier(&ctx->arch.mapping_lock); =20 /* * Pages will be moved to the free list below. So we want to @@ -600,7 +604,7 @@ int iommu_free_pgtables(struct domain *d) */ iommu_vcall(hd->platform_ops, clear_root_pgtable, d); =20 - while ( (pg =3D page_list_remove_head(&hd->arch.pgtables.list)) ) + while ( (pg =3D page_list_remove_head(&ctx->arch.pgtables)) ) { free_domheap_page(pg); =20 @@ -612,6 +616,7 @@ int iommu_free_pgtables(struct domain *d) } =20 struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd, + struct iommu_context *ctx, uint64_t contig_mask) { unsigned int memflags =3D 0; @@ -656,9 +661,7 @@ struct page_info *iommu_alloc_pgtable(struct domain_iom= mu *hd, =20 unmap_domain_page(p); =20 - spin_lock(&hd->arch.pgtables.lock); - page_list_add(pg, &hd->arch.pgtables.list); - spin_unlock(&hd->arch.pgtables.lock); + page_list_add(pg, &ctx->arch.pgtables); =20 return pg; } @@ -697,13 +700,12 @@ static void cf_check free_queued_pgtables(void *arg) } } =20 -void iommu_queue_free_pgtable(struct domain_iommu *hd, struct page_info *p= g) +void iommu_queue_free_pgtable(struct domain *d, struct iommu_context *ctx, + struct page_info *pg) { unsigned int cpu =3D smp_processor_id(); =20 - spin_lock(&hd->arch.pgtables.lock); - page_list_del(pg, &hd->arch.pgtables.list); - spin_unlock(&hd->arch.pgtables.lock); + page_list_del(pg, &ctx->arch.pgtables); =20 page_list_add_tail(pg, &per_cpu(free_pgt_list, cpu)); =20 diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h index 37c4a1dc82..91f106968e 100644 --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -416,9 +416,18 @@ extern int iommu_get_extra_reserved_device_memory(iomm= u_grdm_t *func, # define iommu_vcall iommu_call #endif =20 +struct iommu_context { + #ifdef CONFIG_HAS_PASSTHROUGH + u16 id; /* Context id (0 means default context) */ + + struct arch_iommu_context arch; + #endif +}; + struct domain_iommu { #ifdef CONFIG_HAS_PASSTHROUGH struct arch_iommu arch; + struct iommu_context default_ctx; #endif =20 /* iommu_ops */ @@ -453,6 +462,7 @@ struct domain_iommu { #define dom_iommu(d) (&(d)->iommu) #define iommu_set_feature(d, f) set_bit(f, dom_iommu(d)->features) #define iommu_clear_feature(d, f) clear_bit(f, dom_iommu(d)->features) +#define iommu_default_context(d) (&dom_iommu(d)->default_ctx) /* does not = lock ! */ =20 #ifdef CONFIG_HAS_PASSTHROUGH /* Are we using the domain P2M table as its IOMMU pagetable? */ --=20 2.51.2 -- Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sun Feb 8 01:17:05 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=reject dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1763637019; cv=none; d=zohomail.com; s=zohoarc; b=U1CaB8LoDmqk3qw0+RoArX7ZZk+Uk1Ur7qIDzLZxtCxm2JNr0GJrKPpYx54MPmNHoJDPjVM6WIyUyCe+Y5Yfp9Ee9z78GjvtJUovyWyP9mbyxeLTSx8gg3Dc0jIUt/pv7jtFptypd4Gk9/BRR3VO3Y4UNYrpoO7wHJdiJTsMUs4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1763637019; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=oEKbdG6ielNAks5NqJQuvhTJR35C9k/ERkSX1n4x528=; b=fuSqz2+2u0CxS6R+19kwyrOsdRcxkYX2NxePDJ2ywxOsZUpUDKylLem8GchNN/MJxO3Qu0jgKxbG8P4EmdxYkXbVQGLqmiaUHsF5z1VVTEo2tEG3fxObGzo15BKqkQucmgM1DWnyVBu+OtnCl3GYKl4HtK8YTb+VXBWoYxSK5jA= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 176363701899983.24478710077972; Thu, 20 Nov 2025 03:10:18 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.1166861.1493325 (Exim 4.92) (envelope-from ) id 1vM2Y2-0002cX-Fg; Thu, 20 Nov 2025 11:09:58 +0000 Received: by outflank-mailman (output) from mailman id 1166861.1493325; Thu, 20 Nov 2025 11:09:58 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2Y2-0002bq-AL; Thu, 20 Nov 2025 11:09:58 +0000 Received: by outflank-mailman (input) for mailman id 1166861; Thu, 20 Nov 2025 11:09:56 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2Y0-0001PI-Ml for xen-devel@lists.xenproject.org; Thu, 20 Nov 2025 11:09:56 +0000 Received: from mail128-17.atl41.mandrillapp.com (mail128-17.atl41.mandrillapp.com [198.2.128.17]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 71dff962-c601-11f0-9d18-b5c5bf9af7f9; Thu, 20 Nov 2025 12:09:55 +0100 (CET) Received: from pmta08.mandrill.prod.atl01.rsglab.com (localhost [127.0.0.1]) by mail128-17.atl41.mandrillapp.com (Mailchimp) with ESMTP id 4dBwc33WLyzCf9PNg for ; Thu, 20 Nov 2025 11:09:55 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id ebe588fab1704163be70331fd0c31b01; Thu, 20 Nov 2025 11:09:55 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 71dff962-c601-11f0-9d18-b5c5bf9af7f9 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1763636995; x=1763906995; bh=oEKbdG6ielNAks5NqJQuvhTJR35C9k/ERkSX1n4x528=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=cxwLwjTbVwH77lW2JfnPtavhYp+07kEeFqgk9yC44Y4cVXQdMsysciZRtrd5qvYD9 vmuMRimDZGnVx5gdbP0X9A5e/jnR+bKduzwSCRRPTbBLdOWFl9ASCe5Fl2N0lLYkPa 6hwG0nmZZRW7gxBomVnqUhU/OI5IAm4SZg9+gXdyIWDNCXccs5xVVXac7Z3s16UrCN pOCAi+uvM07uYzTdqHlgp5lh0Iw3zRfsMWD+bWPK2CU2QmZ8GSa//FUF54z42TsaXe +hw5BJ+rGCZ/U6rUCfzI0cwxUYPFJ9dQ91YkiJ5x9sM2iExhwxAesz7bampmrlco1J MGinNuLEgvFQA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1763636995; x=1763897495; i=teddy.astie@vates.tech; bh=oEKbdG6ielNAks5NqJQuvhTJR35C9k/ERkSX1n4x528=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=Y3ou0ESd2vfAeiR5RkiP8R8ebsym6Tm7xsYXWcuqcU8YRaNCxTk+50klVXtbbsCLX gwCPOEhQpWYKEDcP6nsr6PgefTvDe9VuUJ6iof0gibnifH1G+WtwQFGh+eE+8KY1lu tpYUzKObGyGtnk8G4ilogBhl6G3Na5+cEViVvxMB7KvNC4/Ou/+InUvOpL5zJtgQd5 zXJgBK8ZwnjftZuxEnMN/XiJxujcJFlFmrDzV4PLcUAIghkkD7vAfhgNDr8aFZj3ti HOdegz9qo4DBU2v0qnwqVInRB0lOuBSedYS2F8ztf1eyohoyTqbltStlXjwloHE8vP xQ+Z39rjvepdw== From: "Teddy Astie" Subject: =?utf-8?Q?[RFC=20PATCH=20v7=2005/14]=20iommu:=20Simplify=20quarantine=20logic?= X-Mailer: git-send-email 2.51.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1763636994248 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Jan Beulich" , "Andrew Cooper" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" , "Jason Andryuk" Message-Id: <31b2ede1eab92048b69fe4999a35eb7d4a4617ba.1763569135.git.teddy.astie@vates.tech> In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.ebe588fab1704163be70331fd0c31b01?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20251120:md Date: Thu, 20 Nov 2025 11:09:55 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity teddy.astie@vates.tech) (identity @mandrillapp.com) X-ZM-MESSAGEID: 1763637019693018900 Content-Type: text/plain; charset="utf-8" Current quarantine code is very hard to change and is very complicated, remove most bits of it and replace it with direct reassignement to dom_io domain instead. Signed-off-by: Teddy Astie --- A idea would be to rework this feature using the new reworked IOMMU subsystem. --- xen/arch/x86/include/asm/pci.h | 17 -- xen/drivers/passthrough/amd/iommu_map.c | 129 +--------- xen/drivers/passthrough/amd/pci_amd_iommu.c | 51 +--- xen/drivers/passthrough/pci.c | 7 +- xen/drivers/passthrough/vtd/iommu.c | 253 ++------------------ xen/drivers/passthrough/x86/iommu.c | 1 - 6 files changed, 29 insertions(+), 429 deletions(-) diff --git a/xen/arch/x86/include/asm/pci.h b/xen/arch/x86/include/asm/pci.h index 0b98081aea..c64dd13452 100644 --- a/xen/arch/x86/include/asm/pci.h +++ b/xen/arch/x86/include/asm/pci.h @@ -17,23 +17,6 @@ struct pci_dev; =20 struct arch_pci_dev { vmask_t used_vectors; - /* - * These fields are (de)initialized under pcidevs-lock. Other uses of - * them don't race (de)initialization and hence don't strictly need any - * locking. - */ - union { - /* Subset of struct arch_iommu's fields, to be used in dom_io. */ - struct { - uint64_t pgd_maddr; - } vtd; - struct { - struct page_info *root_table; - } amd; - }; - domid_t pseudo_domid; - mfn_t leaf_mfn; - struct page_list_head pgtables_list; }; =20 int pci_conf_write_intercept(unsigned int seg, unsigned int bdf, diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthro= ugh/amd/iommu_map.c index 81a63cce8e..42827c7dc7 100644 --- a/xen/drivers/passthrough/amd/iommu_map.c +++ b/xen/drivers/passthrough/amd/iommu_map.c @@ -656,9 +656,6 @@ int amd_iommu_reserve_domain_unity_map(struct domain *d= , struct iommu_context *c { int rc; =20 - if ( d =3D=3D dom_io ) - return 0; - for ( rc =3D 0; !rc && map; map =3D map->next ) { p2m_access_t p2ma =3D p2m_access_n; @@ -680,9 +677,6 @@ int amd_iommu_reserve_domain_unity_unmap(struct domain = *d, struct iommu_context { int rc; =20 - if ( d =3D=3D dom_io ) - return 0; - for ( rc =3D 0; map; map =3D map->next ) { int ret =3D iommu_identity_mapping(d, ctx, p2m_access_x, map->addr, @@ -771,134 +765,15 @@ int cf_check amd_iommu_get_reserved_device_memory( return 0; } =20 -static int fill_qpt(union amd_iommu_pte *this, unsigned int level, - struct page_info *pgs[IOMMU_MAX_PT_LEVELS]) -{ - struct domain_iommu *hd =3D dom_iommu(dom_io); - struct iommu_context *ctx =3D iommu_default_context(dom_io); - unsigned int i; - int rc =3D 0; - - for ( i =3D 0; !rc && i < PTE_PER_TABLE_SIZE; ++i ) - { - union amd_iommu_pte *pte =3D &this[i], *next; - - if ( !pte->pr ) - { - if ( !pgs[level] ) - { - /* - * The pgtable allocator is fine for the leaf page, as wel= l as - * page table pages, and the resulting allocations are alw= ays - * zeroed. - */ - pgs[level] =3D iommu_alloc_pgtable(hd, ctx, 0); - if ( !pgs[level] ) - { - rc =3D -ENOMEM; - break; - } - - if ( level ) - { - next =3D __map_domain_page(pgs[level]); - rc =3D fill_qpt(next, level - 1, pgs); - unmap_domain_page(next); - } - } - - /* - * PDEs are essentially a subset of PTEs, so this function - * is fine to use even at the leaf. - */ - set_iommu_pde_present(pte, mfn_x(page_to_mfn(pgs[level])), lev= el, - true, true); - } - else if ( level && pte->next_level ) - { - next =3D map_domain_page(_mfn(pte->mfn)); - rc =3D fill_qpt(next, level - 1, pgs); - unmap_domain_page(next); - } - } - - return rc; -} - int cf_check amd_iommu_quarantine_init(struct pci_dev *pdev, bool scratch_= page) { - struct domain_iommu *hd =3D dom_iommu(dom_io); - struct iommu_context *ctx =3D iommu_default_context(dom_io); - unsigned int level =3D ctx->arch.amd.paging_mode; - unsigned int req_id =3D get_dma_requestor_id(pdev->seg, pdev->sbdf.bdf= ); - const struct ivrs_mappings *ivrs_mappings =3D get_ivrs_mappings(pdev->= seg); - int rc; + amd_iommu_quarantine_teardown(pdev); =20 - ASSERT(pcidevs_locked()); - ASSERT(!ctx->arch.amd.root_table); - ASSERT(page_list_empty(&ctx->arch.pgtables)); - - if ( !scratch_page && !ivrs_mappings[req_id].unity_map ) - return 0; - - ASSERT(pdev->arch.pseudo_domid !=3D DOMID_INVALID); - - if ( pdev->arch.amd.root_table ) - { - clear_domain_page(pdev->arch.leaf_mfn); - return 0; - } - - pdev->arch.amd.root_table =3D iommu_alloc_pgtable(hd, ctx, 0); - if ( !pdev->arch.amd.root_table ) - return -ENOMEM; - - /* Transiently install the root into DomIO, for iommu_identity_mapping= (). */ - ctx->arch.amd.root_table =3D pdev->arch.amd.root_table; - - rc =3D amd_iommu_reserve_domain_unity_map(dom_io, ctx, - ivrs_mappings[req_id].unity_ma= p, - 0); - - iommu_identity_map_teardown(dom_io, ctx); - ctx->arch.amd.root_table =3D NULL; - - if ( rc ) - AMD_IOMMU_WARN("%pp: quarantine unity mapping failed\n", &pdev->sb= df); - else if ( scratch_page ) - { - union amd_iommu_pte *root; - struct page_info *pgs[IOMMU_MAX_PT_LEVELS] =3D {}; - - root =3D __map_domain_page(pdev->arch.amd.root_table); - rc =3D fill_qpt(root, level - 1, pgs); - unmap_domain_page(root); - - pdev->arch.leaf_mfn =3D page_to_mfn(pgs[0]); - } - - page_list_move(&pdev->arch.pgtables_list, &ctx->arch.pgtables); - - if ( rc ) - amd_iommu_quarantine_teardown(pdev); - - return rc; + return 0; } =20 void amd_iommu_quarantine_teardown(struct pci_dev *pdev) { - struct iommu_context *ctx =3D iommu_default_context(dom_io); - - ASSERT(pcidevs_locked()); - - if ( !pdev->arch.amd.root_table ) - return; - - ASSERT(page_list_empty(&ctx->arch.pgtables)); - page_list_move(&ctx->arch.pgtables, &pdev->arch.pgtables_list); - while ( iommu_free_pgtables(dom_io, ctx) =3D=3D -ERESTART ) - /* nothing */; - pdev->arch.amd.root_table =3D NULL; } =20 /* diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/pass= through/amd/pci_amd_iommu.c index 964f6b47db..c871660661 100644 --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -138,9 +138,6 @@ static int __must_check amd_iommu_setup_domain_device( const struct page_info *root_pg; domid_t domid; =20 - if ( QUARANTINE_SKIP(domain, pdev) ) - return 0; - BUG_ON(!ctx->arch.amd.paging_mode || !iommu->dev_table.buffer); =20 rc =3D allocate_domain_resources(domain); @@ -159,16 +156,8 @@ static int __must_check amd_iommu_setup_domain_device( dte =3D &table[req_id]; ivrs_dev =3D &get_ivrs_mappings(iommu->sbdf.seg)[req_id]; =20 - if ( domain !=3D dom_io ) - { - root_pg =3D ctx->arch.amd.root_table; - domid =3D domain->domain_id; - } - else - { - root_pg =3D pdev->arch.amd.root_table; - domid =3D pdev->arch.pseudo_domid; - } + root_pg =3D ctx->arch.amd.root_table; + domid =3D domain->domain_id; =20 spin_lock_irqsave(&iommu->lock, flags); =20 @@ -414,9 +403,6 @@ static void amd_iommu_disable_domain_device(const struc= t domain *domain, int req_id; u8 bus =3D pdev->bus; =20 - if ( QUARANTINE_SKIP(domain, pdev) ) - return; - ASSERT(pcidevs_locked()); =20 if ( pci_ats_device(iommu->sbdf.seg, bus, pdev->devfn) && @@ -479,14 +465,9 @@ static int cf_check reassign_device( return -ENODEV; } =20 - if ( !QUARANTINE_SKIP(target, pdev) ) - { - rc =3D amd_iommu_setup_domain_device(target, target_ctx, iommu, de= vfn, pdev); - if ( rc ) - return rc; - } - else - amd_iommu_disable_domain_device(source, iommu, devfn, pdev); + rc =3D amd_iommu_setup_domain_device(target, target_ctx, iommu, devfn,= pdev); + if ( rc ) + return rc; =20 if ( devfn =3D=3D pdev->devfn && pdev->domain !=3D target ) { @@ -579,8 +560,6 @@ static int cf_check amd_iommu_add_device(u8 devfn, stru= ct pci_dev *pdev) struct iommu_context *ctx; u16 bdf; struct ivrs_mappings *ivrs_mappings; - bool fresh_domid =3D false; - int ret; =20 if ( !pdev->domain ) return -EINVAL; @@ -649,22 +628,7 @@ static int cf_check amd_iommu_add_device(u8 devfn, str= uct pci_dev *pdev) AMD_IOMMU_WARN("%pd: unity mapping failed for %pp\n", pdev->domain, &PCI_SBDF(pdev->seg, bdf)); =20 - if ( iommu_quarantine && pdev->arch.pseudo_domid =3D=3D DOMID_INVALID ) - { - pdev->arch.pseudo_domid =3D iommu_alloc_domid(iommu->domid_map); - if ( pdev->arch.pseudo_domid =3D=3D DOMID_INVALID ) - return -ENOSPC; - fresh_domid =3D true; - } - - ret =3D amd_iommu_setup_domain_device(pdev->domain, ctx, iommu, devfn,= pdev); - if ( ret && fresh_domid ) - { - iommu_free_domid(pdev->arch.pseudo_domid, iommu->domid_map); - pdev->arch.pseudo_domid =3D DOMID_INVALID; - } - - return ret; + return amd_iommu_setup_domain_device(pdev->domain, ctx, iommu, devfn, = pdev); } =20 static int cf_check amd_iommu_remove_device(u8 devfn, struct pci_dev *pdev) @@ -700,9 +664,6 @@ static int cf_check amd_iommu_remove_device(u8 devfn, s= truct pci_dev *pdev) =20 amd_iommu_quarantine_teardown(pdev); =20 - iommu_free_domid(pdev->arch.pseudo_domid, iommu->domid_map); - pdev->arch.pseudo_domid =3D DOMID_INVALID; - if ( amd_iommu_perdev_intremap && ivrs_mappings[bdf].dte_requestor_id =3D=3D bdf && ivrs_mappings[bdf].intremap_table ) diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 52c22fa50c..ee73d55740 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -1371,12 +1371,7 @@ static int cf_check _dump_pci_devices(struct pci_seg= *pseg, void *arg) list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list ) { printk("%pp - ", &pdev->sbdf); -#ifdef CONFIG_X86 - if ( pdev->domain =3D=3D dom_io ) - printk("DomIO:%x", pdev->arch.pseudo_domid); - else -#endif - printk("%pd", pdev->domain); + printk("%pd", pdev->domain); printk(" - node %-3d", (pdev->node !=3D NUMA_NO_NODE) ? pdev->node= : -1); pdev_dump_msi(pdev); printk("\n"); diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/= vtd/iommu.c index 9252c3e0f3..f269fca9bf 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -49,14 +49,6 @@ #define CONTIG_MASK DMA_PTE_CONTIG_MASK #include =20 -/* dom_io is used as a sentinel for quarantined devices */ -#define QUARANTINE_SKIP(d, pgd_maddr) ((d) =3D=3D dom_io && !(pgd_maddr)) -#define DEVICE_DOMID(d, pdev) ((d) !=3D dom_io ? (d)->domain_id \ - : (pdev)->arch.pseudo_domid) -#define DEVICE_PGTABLE(d, pdev) ((d) !=3D dom_io \ - ? iommu_default_context(d)->arch.vtd.pgd_= maddr \ - : (pdev)->arch.vtd.pgd_maddr) - bool __read_mostly iommu_igfx =3D true; bool __read_mostly iommu_qinval =3D true; #ifndef iommu_snoop @@ -1495,8 +1487,6 @@ int domain_context_mapping_one( int rc, ret; bool flush_dev_iotlb; =20 - if ( QUARANTINE_SKIP(domain, pgd_maddr) ) - return 0; =20 ASSERT(pcidevs_locked()); spin_lock(&iommu->lock); @@ -1518,8 +1508,6 @@ int domain_context_mapping_one( domid =3D did_to_domain_id(iommu, prev_did); if ( domid < DOMID_FIRST_RESERVED ) prev_dom =3D rcu_lock_domain_by_id(domid); - else if ( pdev ? domid =3D=3D pdev->arch.pseudo_domid : domid > DO= MID_MASK ) - prev_dom =3D rcu_lock_domain(dom_io); if ( !prev_dom ) { spin_unlock(&iommu->lock); @@ -1651,8 +1639,8 @@ int domain_context_mapping_one( ret =3D domain_context_unmap_one(domain, iommu, bus, devfn); else ret =3D domain_context_mapping_one(prev_dom, ctx, iommu, bus, = devfn, pdev, - DEVICE_DOMID(prev_dom, pdev), - DEVICE_PGTABLE(prev_dom, pdev= ), + prev_dom->domain_id, + iommu_default_context(prev_do= m)->arch.vtd.pgd_maddr, (mode & MAP_WITH_RMRR) | MAP_ERROR_RECOVERY) < 0; =20 @@ -1674,8 +1662,8 @@ static int domain_context_mapping(struct domain *doma= in, struct iommu_context *c { const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); const struct acpi_rmrr_unit *rmrr; - paddr_t pgd_maddr =3D DEVICE_PGTABLE(domain, pdev); - domid_t orig_domid =3D pdev->arch.pseudo_domid; + paddr_t pgd_maddr =3D ctx->arch.vtd.pgd_maddr; + domid_t did =3D domain->domain_id; int ret =3D 0; unsigned int i, mode =3D 0; uint16_t seg =3D pdev->seg, bdf; @@ -1728,20 +1716,11 @@ static int domain_context_mapping(struct domain *do= main, struct iommu_context *c if ( !drhd ) return -ENODEV; =20 - if ( iommu_quarantine && orig_domid =3D=3D DOMID_INVALID ) - { - pdev->arch.pseudo_domid =3D - iommu_alloc_domid(drhd->iommu->pseudo_domid_map); - if ( pdev->arch.pseudo_domid =3D=3D DOMID_INVALID ) - return -ENOSPC; - } - if ( iommu_debug ) printk(VTDPREFIX "%pd:PCIe: map %pp\n", domain, &PCI_SBDF(seg, bus, devfn)); ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, bus, = devfn, pdev, - DEVICE_DOMID(domain, pdev), pgd_m= addr, - mode); + did, pgd_maddr, mode); if ( ret > 0 ) ret =3D 0; if ( !ret && devfn =3D=3D pdev->devfn && ats_device(pdev, drhd) > = 0 ) @@ -1753,21 +1732,12 @@ static int domain_context_mapping(struct domain *do= main, struct iommu_context *c if ( !drhd ) return -ENODEV; =20 - if ( iommu_quarantine && orig_domid =3D=3D DOMID_INVALID ) - { - pdev->arch.pseudo_domid =3D - iommu_alloc_domid(drhd->iommu->pseudo_domid_map); - if ( pdev->arch.pseudo_domid =3D=3D DOMID_INVALID ) - return -ENOSPC; - } - if ( iommu_debug ) printk(VTDPREFIX "%pd:PCI: map %pp\n", domain, &PCI_SBDF(seg, bus, devfn)); =20 ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, bus, = devfn, - pdev, DEVICE_DOMID(domain, pdev), - pgd_maddr, mode); + pdev, did, pgd_maddr, mode); if ( ret < 0 ) break; prev_present =3D ret; @@ -1797,8 +1767,7 @@ static int domain_context_mapping(struct domain *doma= in, struct iommu_context *c */ if ( ret >=3D 0 ) ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, b= us, devfn, - NULL, DEVICE_DOMID(domain, pd= ev), - pgd_maddr, mode); + NULL, did, pgd_maddr, mode); =20 /* * Devices behind PCIe-to-PCI/PCIx bridge may generate different @@ -1813,8 +1782,7 @@ static int domain_context_mapping(struct domain *doma= in, struct iommu_context *c if ( !ret && pdev_type(seg, bus, devfn) =3D=3D DEV_TYPE_PCIe2PCI_B= RIDGE && (secbus !=3D pdev->bus || pdev->devfn !=3D 0) ) ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, s= ecbus, 0, - NULL, DEVICE_DOMID(domain, pd= ev), - pgd_maddr, mode); + NULL, did, pgd_maddr, mode); =20 if ( ret ) { @@ -1836,13 +1804,6 @@ static int domain_context_mapping(struct domain *dom= ain, struct iommu_context *c if ( !ret && devfn =3D=3D pdev->devfn ) pci_vtd_quirk(pdev); =20 - if ( ret && drhd && orig_domid =3D=3D DOMID_INVALID ) - { - iommu_free_domid(pdev->arch.pseudo_domid, - drhd->iommu->pseudo_domid_map); - pdev->arch.pseudo_domid =3D DOMID_INVALID; - } - return ret; } =20 @@ -2006,10 +1967,6 @@ static const struct acpi_drhd_unit *domain_context_u= nmap( return ERR_PTR(-EINVAL); } =20 - if ( !ret && pdev->devfn =3D=3D devfn && - !QUARANTINE_SKIP(domain, pdev->arch.vtd.pgd_maddr) ) - check_cleanup_domid_map(domain, pdev, iommu); - return drhd; } =20 @@ -2043,21 +2000,6 @@ static void cf_check iommu_domain_teardown(struct do= main *d) static void quarantine_teardown(struct pci_dev *pdev, const struct acpi_drhd_unit *drhd) { - struct iommu_context *ctx =3D iommu_default_context(dom_io); - - ASSERT(pcidevs_locked()); - - if ( !pdev->arch.vtd.pgd_maddr ) - return; - - ASSERT(page_list_empty(&ctx->arch.pgtables)); - page_list_move(&ctx->arch.pgtables, &pdev->arch.pgtables_list); - while ( iommu_free_pgtables(dom_io, ctx) =3D=3D -ERESTART ) - /* nothing */; - pdev->arch.vtd.pgd_maddr =3D 0; - - if ( drhd ) - cleanup_domid_map(pdev->arch.pseudo_domid, drhd->iommu); } =20 static int __must_check cf_check intel_iommu_map_page( @@ -2398,13 +2340,6 @@ static int cf_check intel_iommu_remove_device(u8 dev= fn, struct pci_dev *pdev) =20 quarantine_teardown(pdev, drhd); =20 - if ( drhd ) - { - iommu_free_domid(pdev->arch.pseudo_domid, - drhd->iommu->pseudo_domid_map); - pdev->arch.pseudo_domid =3D DOMID_INVALID; - } - return 0; } =20 @@ -2762,42 +2697,22 @@ static int cf_check reassign_device_ownership( { int ret; =20 - if ( !QUARANTINE_SKIP(target, pdev->arch.vtd.pgd_maddr) ) - { - struct iommu_context *target_ctx =3D iommu_default_context(target); - - if ( !has_arch_pdevs(target) ) - vmx_pi_hooks_assign(target); + if ( !has_arch_pdevs(target) ) + vmx_pi_hooks_assign(target); =20 #ifdef CONFIG_PV - /* - * Devices assigned to untrusted domains (here assumed to be any d= omU) - * can attempt to send arbitrary LAPIC/MSI messages. We are unprot= ected - * by the root complex unless interrupt remapping is enabled. - */ - if ( !iommu_intremap && !is_hardware_domain(target) && - !is_system_domain(target) ) - untrusted_msi =3D true; + /* + * Devices assigned to untrusted domains (here assumed to be any do= mU) + * can attempt to send arbitrary LAPIC/MSI messages. We are unprote= cted + * by the root complex unless interrupt remapping is enabled. + */ + if ( !iommu_intremap && !is_hardware_domain(target) && + !is_system_domain(target) ) + untrusted_msi =3D true; #endif =20 - ret =3D domain_context_mapping(target, target_ctx, devfn, pdev); - - if ( !ret && pdev->devfn =3D=3D devfn && - !QUARANTINE_SKIP(source, pdev->arch.vtd.pgd_maddr) ) - { - const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_u= nit(pdev); + ret =3D domain_context_mapping(target, iommu_default_context(target), = devfn, pdev); =20 - if ( drhd ) - check_cleanup_domid_map(source, pdev, drhd->iommu); - } - } - else - { - const struct acpi_drhd_unit *drhd; - - drhd =3D domain_context_unmap(source, devfn, pdev); - ret =3D IS_ERR(drhd) ? PTR_ERR(drhd) : 0; - } if ( ret ) { if ( !has_arch_pdevs(target) ) @@ -2896,9 +2811,6 @@ static int cf_check intel_iommu_assign_device( } } =20 - if ( d =3D=3D dom_io ) - return reassign_device_ownership(s, d, devfn, pdev); - /* Setup rmrr identity mapping */ for_each_rmrr_device( rmrr, bdf, i ) { @@ -3108,135 +3020,10 @@ static void cf_check vtd_dump_page_tables(struct d= omain *d) agaw_to_level(hd->arch.vtd.agaw), 0, 0); } =20 -static int fill_qpt(struct dma_pte *this, unsigned int level, - struct page_info *pgs[6]) -{ - struct domain_iommu *hd =3D dom_iommu(dom_io); - struct iommu_context *ctx =3D iommu_default_context(dom_io); - unsigned int i; - int rc =3D 0; - - for ( i =3D 0; !rc && i < PTE_NUM; ++i ) - { - struct dma_pte *pte =3D &this[i], *next; - - if ( !dma_pte_present(*pte) ) - { - if ( !pgs[level] ) - { - /* - * The pgtable allocator is fine for the leaf page, as wel= l as - * page table pages, and the resulting allocations are alw= ays - * zeroed. - */ - pgs[level] =3D iommu_alloc_pgtable(hd, ctx, 0); - if ( !pgs[level] ) - { - rc =3D -ENOMEM; - break; - } - - if ( level ) - { - next =3D map_vtd_domain_page(page_to_maddr(pgs[level])= ); - rc =3D fill_qpt(next, level - 1, pgs); - unmap_vtd_domain_page(next); - } - } - - dma_set_pte_addr(*pte, page_to_maddr(pgs[level])); - dma_set_pte_readable(*pte); - dma_set_pte_writable(*pte); - } - else if ( level && !dma_pte_superpage(*pte) ) - { - next =3D map_vtd_domain_page(dma_pte_addr(*pte)); - rc =3D fill_qpt(next, level - 1, pgs); - unmap_vtd_domain_page(next); - } - } - - return rc; -} - static int cf_check intel_iommu_quarantine_init(struct pci_dev *pdev, bool scratch_page) { - struct domain_iommu *hd =3D dom_iommu(dom_io); - struct iommu_context *ctx =3D iommu_default_context(dom_io); - struct page_info *pg; - unsigned int agaw =3D hd->arch.vtd.agaw; - unsigned int level =3D agaw_to_level(agaw); - const struct acpi_drhd_unit *drhd; - const struct acpi_rmrr_unit *rmrr; - unsigned int i, bdf; - bool rmrr_found =3D false; - int rc; - - ASSERT(pcidevs_locked()); - ASSERT(!ctx->arch.vtd.pgd_maddr); - ASSERT(page_list_empty(&ctx->arch.pgtables)); - - if ( pdev->arch.vtd.pgd_maddr ) - { - clear_domain_page(pdev->arch.leaf_mfn); - return 0; - } - - drhd =3D acpi_find_matched_drhd_unit(pdev); - if ( !drhd ) - return -ENODEV; - - pg =3D iommu_alloc_pgtable(hd, ctx, 0); - if ( !pg ) - return -ENOMEM; - - rc =3D context_set_domain_id(NULL, pdev->arch.pseudo_domid, drhd->iomm= u); - - /* Transiently install the root into DomIO, for iommu_identity_mapping= (). */ - ctx->arch.vtd.pgd_maddr =3D page_to_maddr(pg); - - for_each_rmrr_device ( rmrr, bdf, i ) - { - if ( rc ) - break; - - if ( rmrr->segment =3D=3D pdev->seg && bdf =3D=3D pdev->sbdf.bdf ) - { - rmrr_found =3D true; - - rc =3D iommu_identity_mapping(dom_io, ctx, p2m_access_rw, - rmrr->base_address, rmrr->end_addr= ess, - 0); - if ( rc ) - printk(XENLOG_ERR VTDPREFIX - "%pp: RMRR quarantine mapping failed\n", - &pdev->sbdf); - } - } - - iommu_identity_map_teardown(dom_io, ctx); - ctx->arch.vtd.pgd_maddr =3D 0; - pdev->arch.vtd.pgd_maddr =3D page_to_maddr(pg); - - if ( !rc && scratch_page ) - { - struct dma_pte *root; - struct page_info *pgs[6] =3D {}; - - root =3D map_vtd_domain_page(pdev->arch.vtd.pgd_maddr); - rc =3D fill_qpt(root, level - 1, pgs); - unmap_vtd_domain_page(root); - - pdev->arch.leaf_mfn =3D page_to_mfn(pgs[0]); - } - - page_list_move(&pdev->arch.pgtables_list, &ctx->arch.pgtables); - - if ( rc || (!scratch_page && !rmrr_found) ) - quarantine_teardown(pdev, drhd); - - return rc; + return 0; } =20 static void cf_check vtd_quiesce(void) diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/= x86/iommu.c index 3bc8aa3e09..75c8752022 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -528,7 +528,6 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *= d) =20 void arch_pci_init_pdev(struct pci_dev *pdev) { - pdev->arch.pseudo_domid =3D DOMID_INVALID; } =20 unsigned long *__init iommu_init_domid(domid_t reserve) --=20 2.51.2 -- Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sun Feb 8 01:17:05 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=reject dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1763637022; cv=none; d=zohomail.com; s=zohoarc; b=i1iE93eqFelACNJnAdXAgScwHV6dANdYfQt9elP+e820yPhhjbfV9qwnoH42IKlt4tnhrd2K/07ld40AXjC1/CJKur1pL+YRcq9xfMXzVA6Enc83KGlBMHvkfzW3fJ4JNxWenSaTVXX/A7/pfCUvK/DWl12tx7DQRZY4+To2hwE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1763637022; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=IB8SiWp3cnHfGEHVz8f7L6lwm7yItLzdTN6a/r1KsBo=; b=bQk/OIk2MVd2Z+wmyz/JBiCU00xCK1vyf6uEgMUNpUB07szeRsS2EeKcVTSuDRHzOkjtystPHFT874JusVFSNhUtTMkdWsJJ5pMbRO8YWv7LZYY8v+joeUm0srb6OLZ57cYdYrOj8RBEuJfNam5U3UdNZoh/z0M0GwiGuL+9u6o= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1763637022927749.5968000718542; Thu, 20 Nov 2025 03:10:22 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.1166862.1493335 (Exim 4.92) (envelope-from ) id 1vM2Y3-0002tn-Po; Thu, 20 Nov 2025 11:09:59 +0000 Received: by outflank-mailman (output) from mailman id 1166862.1493335; Thu, 20 Nov 2025 11:09:59 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2Y3-0002sU-Lx; Thu, 20 Nov 2025 11:09:59 +0000 Received: by outflank-mailman (input) for mailman id 1166862; Thu, 20 Nov 2025 11:09:57 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2Y1-0001PI-OX for xen-devel@lists.xenproject.org; Thu, 20 Nov 2025 11:09:57 +0000 Received: from mail128-17.atl41.mandrillapp.com (mail128-17.atl41.mandrillapp.com [198.2.128.17]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 72b87bbb-c601-11f0-9d18-b5c5bf9af7f9; Thu, 20 Nov 2025 12:09:57 +0100 (CET) Received: from pmta08.mandrill.prod.atl01.rsglab.com (localhost [127.0.0.1]) by mail128-17.atl41.mandrillapp.com (Mailchimp) with ESMTP id 4dBwc46VMbzCf9PPb for ; Thu, 20 Nov 2025 11:09:56 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id d71239346bcd49868e89efb5fc5acd64; Thu, 20 Nov 2025 11:09:56 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 72b87bbb-c601-11f0-9d18-b5c5bf9af7f9 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1763636996; x=1763906996; bh=IB8SiWp3cnHfGEHVz8f7L6lwm7yItLzdTN6a/r1KsBo=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=fwPCa3hu2hOsSHeXJ4nMj7oVo3/TXVVHKUlnrxC3A8y3Cb8LhFPP/saDmj8UwFqiS XDfoYnZDJ6R1vbkCu9hTbZ71yN8NAEs4VvUxgy9OLpL9IPLWI3mV6lysBLy7hzwS1u PCtzIQIiJZpwo810F/LV3hys0KDVcaNQfLayYNyd/Q7IxF3awrnoO1NWk4njp9Yws3 PbuJqyJhyDtQ1rKGfqdOTxNLdIKc7Wk/pn8/Md6POMum2w3weYrSH57tKrp9iHnsTr J5YU305sczW5PCIIa8nNB8r06DQ7AaMlHdH02VNrqP5epDLD7QRWjf8J8l602xk9iQ sozRMLicDu8AA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1763636996; x=1763897496; i=teddy.astie@vates.tech; bh=IB8SiWp3cnHfGEHVz8f7L6lwm7yItLzdTN6a/r1KsBo=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=1dXOugXmB9ixtIA5mLfGYMcvAuO0IQ823Ep53dWgvJEcdGxV7EXvmrD9FKyIU6g5V e+QjMhuubAKjBKqTnzKWd1s3x7rbzx425P4kNjdXIYry7Ja+goWBWQwKwb3yPYyq55 sPEl30pE+3Vca9kZi3hrzYUCSSzOCGBv3PRzG6uafraD3ZG5T4hI4cRsogszRen1wL 9pg4SJ2UU+Or4dxpuuX4aQluLkTkW++jjE3g0X/GqjLTeygzTfIDxm6LGvFEhhFYM8 jCmq1vPd2ZicwB2kuOiTfkvEZfq+GU0MOLiKCS6OLMJuv0iVdPeDWrmcwwRvc3A3eY lC5yjDaR9E21A== From: "Teddy Astie" Subject: =?utf-8?Q?[RFC=20PATCH=20v7=2006/14]=20vtd:=20Remove=20MAP=5FERROR=5FRECOVERY=20code=20path=20in=20domain=5Fcontext=5Fmapping=5Fone?= X-Mailer: git-send-email 2.51.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1763636996032 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Jan Beulich" , "Andrew Cooper" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" Message-Id: <149e3a765a5a55792be7af0a4d9c0ca9fb5098c6.1763569135.git.teddy.astie@vates.tech> In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.d71239346bcd49868e89efb5fc5acd64?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20251120:md Date: Thu, 20 Nov 2025 11:09:56 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity teddy.astie@vates.tech) (identity @mandrillapp.com) X-ZM-MESSAGEID: 1763637023537018900 Content-Type: text/plain; charset="utf-8" This logic is basically never called as the only possible failures are - no memory to allocate the pagetable (if it isn't already allocated) this is fixed in this patch serie by ensuring that the pagetable is alloc= ated when entering this function - EILSEQ when there is a race condtion with hardware, which should not happ= en under normal circonstances Remove this logic to simplify the error management of the function. Signed-off-by: Teddy Astie --- This is fairly similar to [1], although we don't check for -EILSEQ. Such failures can happen through me_wifi_quirk which can recursively calls domain_context_mapping_one. [1] https://lore.kernel.org/xen-devel/b0e81bd67c3f135a4102d12ed95a52ce56482= 992.1762961527.git.teddy.astie@vates.tech/ --- xen/drivers/passthrough/vtd/iommu.c | 20 -------------------- xen/drivers/passthrough/vtd/vtd.h | 3 +-- 2 files changed, 1 insertion(+), 22 deletions(-) diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/= vtd/iommu.c index f269fca9bf..986b05b9dd 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1627,26 +1627,6 @@ int domain_context_mapping_one( if ( !seg && !rc ) rc =3D me_wifi_quirk(domain, bus, devfn, domid, pgd_maddr, mode); =20 - if ( rc && !(mode & MAP_ERROR_RECOVERY) ) - { - if ( !prev_dom || - /* - * Unmapping here means DEV_TYPE_PCI devices with RMRRs (if s= uch - * exist) would cause problems if such a region was actually - * accessed. - */ - (prev_dom =3D=3D dom_io && !pdev) ) - ret =3D domain_context_unmap_one(domain, iommu, bus, devfn); - else - ret =3D domain_context_mapping_one(prev_dom, ctx, iommu, bus, = devfn, pdev, - prev_dom->domain_id, - iommu_default_context(prev_do= m)->arch.vtd.pgd_maddr, - (mode & MAP_WITH_RMRR) | - MAP_ERROR_RECOVERY) < 0; - - if ( !ret && pdev && pdev->devfn =3D=3D devfn ) - check_cleanup_domid_map(domain, pdev, iommu); - } =20 if ( prev_dom ) rcu_unlock_domain(prev_dom); diff --git a/xen/drivers/passthrough/vtd/vtd.h b/xen/drivers/passthrough/vt= d/vtd.h index f0286b40c3..0178214929 100644 --- a/xen/drivers/passthrough/vtd/vtd.h +++ b/xen/drivers/passthrough/vtd/vtd.h @@ -28,8 +28,7 @@ */ #define MAP_WITH_RMRR (1u << 0) #define MAP_OWNER_DYING (1u << 1) -#define MAP_ERROR_RECOVERY (1u << 2) -#define UNMAP_ME_PHANTOM_FUNC (1u << 3) +#define UNMAP_ME_PHANTOM_FUNC (1u << 2) =20 struct IO_APIC_route_remap_entry { union { --=20 2.51.2 -- Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sun Feb 8 01:17:05 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=reject dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1763637019; cv=none; d=zohomail.com; s=zohoarc; b=i3yTB9UCwPyWCQvHymM+yjHnvbT9M9tpCba947hkueQXQ5W3dlBmoLi+NcX8yuuOeXc9/+3f34Cjbp1FqxjizK8KN8qMId7i0is9AsLv2+FYMB0QQL51rlFaxv6rExtpG+hvXdLT1IocXQnyWrDef8tqL/KBvDFuQB9/kdKvfTU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1763637019; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=yUV9y407otX9CTpYQfrPpY0mk2tstV0R95lLgKbizS0=; b=fmcUxUmDE7ZkjShmXuCtUmVPbpWe7HqugrJhsXGEtHXOmdadzofc+BbyMRPYeK/+O348oTPlNcIUboKiLtuFI6zFQDo/Mr9jDRfAvvPbCvxP5FlZ3K8RgUZerU9/tGmatQQeMhG2tPNTfh9wVVF2jCx+faQsWDURelpa5n+vcLo= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1763637019664806.2522620805412; Thu, 20 Nov 2025 03:10:19 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.1166863.1493345 (Exim 4.92) (envelope-from ) id 1vM2Y5-0003Bq-99; Thu, 20 Nov 2025 11:10:01 +0000 Received: by outflank-mailman (output) from mailman id 1166863.1493345; Thu, 20 Nov 2025 11:10:01 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2Y5-0003BX-5W; Thu, 20 Nov 2025 11:10:01 +0000 Received: by outflank-mailman (input) for mailman id 1166863; Thu, 20 Nov 2025 11:10:00 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2Y4-0001PI-ED for xen-devel@lists.xenproject.org; Thu, 20 Nov 2025 11:10:00 +0000 Received: from mail180-15.suw31.mandrillapp.com (mail180-15.suw31.mandrillapp.com [198.2.180.15]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 73b8db9e-c601-11f0-9d18-b5c5bf9af7f9; Thu, 20 Nov 2025 12:09:59 +0100 (CET) Received: from pmta11.mandrill.prod.suw01.rsglab.com (localhost [127.0.0.1]) by mail180-15.suw31.mandrillapp.com (Mailchimp) with ESMTP id 4dBwc63Ng0zPm112H for ; Thu, 20 Nov 2025 11:09:58 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id e781f1b7ca2f4b25996ea524fb6efc5f; Thu, 20 Nov 2025 11:09:58 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 73b8db9e-c601-11f0-9d18-b5c5bf9af7f9 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1763636998; x=1763906998; bh=yUV9y407otX9CTpYQfrPpY0mk2tstV0R95lLgKbizS0=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=m890f9m8Cfb7FagmI29g3tyzPaE7WALU/wNNVBfHOhUOgGQyAYQ1u+Gs4M1OZJEC1 jRGg+fVL+aI/p1wt+RwUCsaWUhm27CArl4HwZjyrL/eJG6BdcLFFdiJcBnoVrFLf3A 4hUxsPAD95Ysov0dANBy7nM+/5rZXlbWy88zr8RLAZXmbniLTAT0N/F39g4NIjMVAk e9wesSg6GXlXD5SwK4bjIGS/M/giwX5Q8vMUJpF6ZoDh1caT6sRHPV5PQX1Rhka8O4 yR6GQVKZbMaQ40ma4BlORSZWEfMIXvbad8RlPAJNLdhcYh1nXTSnvDXk5J301fn4fd Ab0iYWw4V4k3Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1763636998; x=1763897498; i=teddy.astie@vates.tech; bh=yUV9y407otX9CTpYQfrPpY0mk2tstV0R95lLgKbizS0=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=I6S5EShl9kF6ApgDqY+JhpQBpwOuxfCy6G7VAmmerE5M5F09lPsXNBkuyRqGsRUGw ywvGVjFZtj6Vhvs+RPwckIXLPgtqrvGYQQu+UQaDhLBw8UUby8BmBymb6udB9uyZif To8l3R11TO1S8mHX8v2atMMLpUqEwmwcZ56OQopDcb9sQjphwQ9P4hQZcGk4+19WdB 1cKft26QH7je2f2aqDa6SR59X4VrGMKz8JrSXoR80CqoCPNF5xb1lmrEV5PQ6qhv94 v2/CkjZ1ELnZ1cqrqQioUqmbC8qSdtCUSHJOQ8Z4r70Ah738l2g4QZzEmkq+7HU0Ol 3j+xNm9DHlgJA== From: "Teddy Astie" Subject: =?utf-8?Q?[RFC=20PATCH=20v7=2007/14]=20iommu:=20Simplify=20hardware=20did=20management?= X-Mailer: git-send-email 2.51.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1763636997412 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Jan Beulich" , "Andrew Cooper" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" , "Jason Andryuk" Message-Id: In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.e781f1b7ca2f4b25996ea524fb6efc5f?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20251120:md Date: Thu, 20 Nov 2025 11:09:58 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @mandrillapp.com) (identity teddy.astie@vates.tech) X-ZM-MESSAGEID: 1763637021657018900 Content-Type: text/plain; charset="utf-8" Simplify the hardware DID management by allocating a DID per IOMMU context (currently Xen domain) instead of trying to reuse Xen domain DID (which may not be possible depending on hardware constraints like did limits). Signed-off-by Teddy Astie --- xen/arch/x86/include/asm/iommu.h | 5 +- xen/drivers/passthrough/amd/iommu.h | 3 + xen/drivers/passthrough/amd/iommu_cmd.c | 4 +- xen/drivers/passthrough/amd/iommu_init.c | 3 +- xen/drivers/passthrough/vtd/extern.h | 2 - xen/drivers/passthrough/vtd/iommu.c | 335 +++++------------------ xen/drivers/passthrough/vtd/iommu.h | 2 - xen/drivers/passthrough/vtd/qinval.c | 2 +- xen/drivers/passthrough/x86/iommu.c | 27 +- 9 files changed, 89 insertions(+), 294 deletions(-) diff --git a/xen/arch/x86/include/asm/iommu.h b/xen/arch/x86/include/asm/io= mmu.h index 94513ba9dc..d20c3cda59 100644 --- a/xen/arch/x86/include/asm/iommu.h +++ b/xen/arch/x86/include/asm/iommu.h @@ -45,12 +45,15 @@ struct arch_iommu_context /* Intel VT-d */ struct { uint64_t pgd_maddr; /* io page directory machine address */ - unsigned long *iommu_bitmap; /* bitmap of iommu(s) that the co= ntext uses */ + domid_t *didmap; /* per-iommu DID (valid only if related iommu= _dev_cnt > 0) */ + unsigned long *iommu_dev_cnt; /* counter of devices per iommu = */ } vtd; /* AMD IOMMU */ struct { unsigned int paging_mode; struct page_info *root_table; + domid_t *didmap; /* per-iommu DID (valid only if related iommu= _dev_cnt > 0) */ + unsigned long *iommu_dev_cnt; /* counter of devices per iommu = */ } amd; }; }; diff --git a/xen/drivers/passthrough/amd/iommu.h b/xen/drivers/passthrough/= amd/iommu.h index 4938cc38ed..db6d7ace02 100644 --- a/xen/drivers/passthrough/amd/iommu.h +++ b/xen/drivers/passthrough/amd/iommu.h @@ -35,6 +35,7 @@ =20 #define iommu_found() (!list_empty(&amd_iommu_head)) =20 +extern unsigned int nr_amd_iommus; extern struct list_head amd_iommu_head; =20 typedef struct event_entry @@ -106,6 +107,8 @@ struct amd_iommu { =20 int enabled; =20 + unsigned int index; + struct list_head ats_devices; }; =20 diff --git a/xen/drivers/passthrough/amd/iommu_cmd.c b/xen/drivers/passthro= ugh/amd/iommu_cmd.c index 6b80c57f44..0c4dcf4ece 100644 --- a/xen/drivers/passthrough/amd/iommu_cmd.c +++ b/xen/drivers/passthrough/amd/iommu_cmd.c @@ -331,11 +331,13 @@ static void _amd_iommu_flush_pages(struct domain *d, daddr_t daddr, unsigned int order) { struct amd_iommu *iommu; - unsigned int dom_id =3D d->domain_id; + struct iommu_context *ctx =3D iommu_default_context(d); =20 /* send INVALIDATE_IOMMU_PAGES command */ for_each_amd_iommu ( iommu ) { + domid_t dom_id =3D ctx->arch.amd.didmap[iommu->index]; + invalidate_iommu_pages(iommu, daddr, dom_id, order); flush_command_buffer(iommu, 0); } diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthr= ough/amd/iommu_init.c index 56b5c2c6ec..5cbb3fdb05 100644 --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -23,7 +23,7 @@ =20 #include "iommu.h" =20 -static int __initdata nr_amd_iommus; +unsigned int nr_amd_iommus =3D 0; static bool __initdata pci_init; =20 static struct tasklet amd_iommu_irq_tasklet; @@ -920,6 +920,7 @@ static void enable_iommu(struct amd_iommu *iommu) set_iommu_translation_control(iommu, IOMMU_CONTROL_ENABLED); =20 iommu->enabled =3D 1; + iommu->index =3D nr_amd_iommus; =20 spin_unlock_irqrestore(&iommu->lock, flags); =20 diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough= /vtd/extern.h index 3dcb77c711..82db8f9435 100644 --- a/xen/drivers/passthrough/vtd/extern.h +++ b/xen/drivers/passthrough/vtd/extern.h @@ -45,8 +45,6 @@ void disable_intremap(struct vtd_iommu *iommu); int iommu_alloc(struct acpi_drhd_unit *drhd); void iommu_free(struct acpi_drhd_unit *drhd); =20 -domid_t did_to_domain_id(const struct vtd_iommu *iommu, unsigned int did); - int iommu_flush_iec_global(struct vtd_iommu *iommu); int iommu_flush_iec_index(struct vtd_iommu *iommu, u8 im, u16 iidx); void clear_fault_bits(struct vtd_iommu *iommu); diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/= vtd/iommu.c index 986b05b9dd..3668185ebc 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -63,50 +63,6 @@ static struct tasklet vtd_fault_tasklet; static int cf_check setup_hwdom_device(u8 devfn, struct pci_dev *); static void setup_hwdom_rmrr(struct domain *d); =20 -static bool domid_mapping(const struct vtd_iommu *iommu) -{ - return (const void *)iommu->domid_bitmap !=3D (const void *)iommu->dom= id_map; -} - -static domid_t convert_domid(const struct vtd_iommu *iommu, domid_t domid) -{ - /* - * While we need to avoid DID 0 for caching-mode IOMMUs, maintain - * the property of the transformation being the same in either - * direction. By clipping to 16 bits we ensure that the resulting - * DID will fit in the respective context entry field. - */ - BUILD_BUG_ON(DOMID_MASK >=3D 0xffff); - - return !cap_caching_mode(iommu->cap) ? domid : ~domid; -} - -static int get_iommu_did(domid_t domid, const struct vtd_iommu *iommu, - bool warn) -{ - unsigned int nr_dom, i; - - if ( !domid_mapping(iommu) ) - return convert_domid(iommu, domid); - - nr_dom =3D cap_ndoms(iommu->cap); - i =3D find_first_bit(iommu->domid_bitmap, nr_dom); - while ( i < nr_dom ) - { - if ( iommu->domid_map[i] =3D=3D domid ) - return i; - - i =3D find_next_bit(iommu->domid_bitmap, nr_dom, i + 1); - } - - if ( warn ) - dprintk(XENLOG_ERR VTDPREFIX, - "No valid iommu %u domid for Dom%d\n", - iommu->index, domid); - - return -1; -} - #define DID_FIELD_WIDTH 16 #define DID_HIGH_OFFSET 8 =20 @@ -117,127 +73,17 @@ static int get_iommu_did(domid_t domid, const struct = vtd_iommu *iommu, static int context_set_domain_id(struct context_entry *context, domid_t domid, struct vtd_iommu *iommu) { - unsigned int i; - ASSERT(pcidevs_locked()); =20 - if ( domid_mapping(iommu) ) - { - unsigned int nr_dom =3D cap_ndoms(iommu->cap); - - i =3D find_first_bit(iommu->domid_bitmap, nr_dom); - while ( i < nr_dom && iommu->domid_map[i] !=3D domid ) - i =3D find_next_bit(iommu->domid_bitmap, nr_dom, i + 1); - - if ( i >=3D nr_dom ) - { - i =3D find_first_zero_bit(iommu->domid_bitmap, nr_dom); - if ( i >=3D nr_dom ) - { - dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: no free domain id\n"= ); - return -EBUSY; - } - iommu->domid_map[i] =3D domid; - set_bit(i, iommu->domid_bitmap); - } - } - else - i =3D convert_domid(iommu, domid); - if ( context ) { context->hi &=3D ~(((1 << DID_FIELD_WIDTH) - 1) << DID_HIGH_OFFSET= ); - context->hi |=3D (i & ((1 << DID_FIELD_WIDTH) - 1)) << DID_HIGH_OF= FSET; + context->hi |=3D (domid & ((1 << DID_FIELD_WIDTH) - 1)) << DID_HIG= H_OFFSET; } =20 return 0; } =20 -static void cleanup_domid_map(domid_t domid, struct vtd_iommu *iommu) -{ - int iommu_domid; - - if ( !domid_mapping(iommu) ) - return; - - iommu_domid =3D get_iommu_did(domid, iommu, false); - - if ( iommu_domid >=3D 0 ) - { - /* - * Update domid_map[] /before/ domid_bitmap[] to avoid a race with - * context_set_domain_id(), setting the slot to DOMID_INVALID for - * did_to_domain_id() to return a suitable value while the bit is - * still set. - */ - iommu->domid_map[iommu_domid] =3D DOMID_INVALID; - clear_bit(iommu_domid, iommu->domid_bitmap); - } -} - -static bool any_pdev_behind_iommu(const struct domain *d, - const struct pci_dev *exclude, - const struct vtd_iommu *iommu) -{ - const struct pci_dev *pdev; - - for_each_pdev ( d, pdev ) - { - const struct acpi_drhd_unit *drhd; - - if ( pdev =3D=3D exclude ) - continue; - - drhd =3D acpi_find_matched_drhd_unit(pdev); - if ( drhd && drhd->iommu =3D=3D iommu ) - return true; - } - - return false; -} - -/* - * If no other devices under the same iommu owned by this domain, - * clear iommu in iommu_bitmap and clear domain_id in domid_bitmap. - */ -static void check_cleanup_domid_map(const struct domain *d, - const struct pci_dev *exclude, - struct vtd_iommu *iommu) -{ - bool found; - - if ( d =3D=3D dom_io ) - return; - - found =3D any_pdev_behind_iommu(d, exclude, iommu); - /* - * Hidden devices are associated with DomXEN but usable by the hardware - * domain. Hence they need considering here as well. - */ - if ( !found && is_hardware_domain(d) ) - found =3D any_pdev_behind_iommu(dom_xen, exclude, iommu); - - if ( !found ) - { - clear_bit(iommu->index, iommu_default_context(d)->arch.vtd.iommu_b= itmap); - cleanup_domid_map(d->domain_id, iommu); - } -} - -domid_t did_to_domain_id(const struct vtd_iommu *iommu, unsigned int did) -{ - if ( did >=3D cap_ndoms(iommu->cap) ) - return DOMID_INVALID; - - if ( !domid_mapping(iommu) ) - return convert_domid(iommu, did); - - if ( !test_bit(did, iommu->domid_bitmap) ) - return DOMID_INVALID; - - return iommu->domid_map[did]; -} - /* Allocate page table, return its machine address */ uint64_t alloc_pgtable_maddr(unsigned long npages, nodeid_t node) { @@ -755,13 +601,11 @@ static int __must_check cf_check iommu_flush_iotlb(st= ruct domain *d, dfn_t dfn, =20 iommu =3D drhd->iommu; =20 - if ( !test_bit(iommu->index, ctx->arch.vtd.iommu_bitmap) ) + if ( !ctx->arch.vtd.iommu_dev_cnt[iommu->index] ) continue; =20 flush_dev_iotlb =3D !!find_ats_dev_drhd(iommu); - iommu_domid =3D get_iommu_did(d->domain_id, iommu, !d->is_dying); - if ( iommu_domid =3D=3D -1 ) - continue; + iommu_domid =3D ctx->arch.vtd.didmap[iommu->index]; =20 if ( !page_count || (page_count & (page_count - 1)) || dfn_eq(dfn, INVALID_DFN) || !IS_ALIGNED(dfn_x(dfn), page_coun= t) ) @@ -1258,7 +1102,6 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd) { struct vtd_iommu *iommu; unsigned int sagaw, agaw =3D 0, nr_dom; - domid_t reserved_domid =3D DOMID_INVALID; int rc; =20 iommu =3D xzalloc(struct vtd_iommu); @@ -1347,43 +1190,16 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd) if ( !ecap_coherent(iommu->ecap) ) iommu_non_coherent =3D true; =20 - if ( nr_dom <=3D DOMID_MASK * 2 + cap_caching_mode(iommu->cap) ) - { - /* Allocate domain id (bit) maps. */ - iommu->domid_bitmap =3D xzalloc_array(unsigned long, - BITS_TO_LONGS(nr_dom)); - iommu->domid_map =3D xzalloc_array(domid_t, nr_dom); - rc =3D -ENOMEM; - if ( !iommu->domid_bitmap || !iommu->domid_map ) - goto free; - - /* - * If Caching mode is set, then invalid translations are tagged - * with domain id 0. Hence reserve bit/slot 0. - */ - if ( cap_caching_mode(iommu->cap) ) - { - iommu->domid_map[0] =3D DOMID_INVALID; - __set_bit(0, iommu->domid_bitmap); - } - } - else - { - /* Don't leave dangling NULL pointers. */ - iommu->domid_bitmap =3D ZERO_BLOCK_PTR; - iommu->domid_map =3D ZERO_BLOCK_PTR; - - /* - * If Caching mode is set, then invalid translations are tagged - * with domain id 0. Hence reserve the ID taking up bit/slot 0. - */ - reserved_domid =3D convert_domid(iommu, 0) ?: DOMID_INVALID; - } + /* Allocate domain id (bit) maps. */ + iommu->domid_bitmap =3D xzalloc_array(unsigned long, + BITS_TO_LONGS(nr_dom)); =20 - iommu->pseudo_domid_map =3D iommu_init_domid(reserved_domid); - rc =3D -ENOMEM; - if ( !iommu->pseudo_domid_map ) - goto free; + /* + * If Caching mode is set, then invalid translations are tagged + * with domain id 0. Hence reserve bit/slot 0. + */ + if ( cap_caching_mode(iommu->cap) ) + __set_bit(0, iommu->domid_bitmap); =20 return 0; =20 @@ -1411,8 +1227,6 @@ void __init iommu_free(struct acpi_drhd_unit *drhd) iounmap(iommu->reg); =20 xfree(iommu->domid_bitmap); - xfree(iommu->domid_map); - xfree(iommu->pseudo_domid_map); =20 if ( iommu->msi.irq >=3D 0 ) destroy_irq(iommu->msi.irq); @@ -1426,19 +1240,39 @@ void __init iommu_free(struct acpi_drhd_unit *drhd) agaw =3D 64; \ agaw; }) =20 -static int cf_check intel_iommu_domain_init(struct domain *d) +static int cf_check intel_iommu_context_init(struct domain *d, struct iomm= u_context *ctx) { - struct domain_iommu *hd =3D dom_iommu(d); - struct iommu_context *ctx =3D iommu_default_context(d); + struct acpi_drhd_unit *drhd; =20 - ctx->arch.vtd.iommu_bitmap =3D xzalloc_array(unsigned long, - BITS_TO_LONGS(nr_iommus)); - if ( !ctx->arch.vtd.iommu_bitmap ) + ctx->arch.vtd.didmap =3D xzalloc_array(domid_t, nr_iommus); + if ( !ctx->arch.vtd.didmap ) return -ENOMEM; =20 + ctx->arch.vtd.iommu_dev_cnt =3D xzalloc_array(unsigned long, nr_iommus= ); + if ( !ctx->arch.vtd.iommu_dev_cnt ) + { + xfree(ctx->arch.vtd.didmap); + return -ENOMEM; + } + + // TODO: Allocate IOMMU domid only when attaching devices ? + /* Populate context DID map using pseudo DIDs */ + for_each_drhd_unit(drhd) + { + ctx->arch.vtd.didmap[drhd->iommu->index] =3D + iommu_alloc_domid(drhd->iommu->domid_bitmap); + } + + return arch_iommu_context_init(d, ctx, 0); +} + +static int cf_check intel_iommu_domain_init(struct domain *d) +{ + struct domain_iommu *hd =3D dom_iommu(d); + hd->arch.vtd.agaw =3D width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH); =20 - return 0; + return intel_iommu_context_init(d, iommu_default_context(d)); } =20 static void __hwdom_init cf_check intel_iommu_hwdom_init(struct domain *d) @@ -1482,11 +1316,11 @@ int domain_context_mapping_one( struct context_entry *context, *context_entries, lctxt; __uint128_t res, old; uint64_t maddr; - uint16_t seg =3D iommu->drhd->segment, prev_did =3D 0; - struct domain *prev_dom =3D NULL; + uint16_t seg =3D iommu->drhd->segment, prev_did =3D 0, did; int rc, ret; - bool flush_dev_iotlb; + bool flush_dev_iotlb, overwrite_entry =3D false; =20 + struct iommu_context *prev_ctx =3D pdev->domain ? iommu_default_contex= t(pdev->domain) : NULL; =20 ASSERT(pcidevs_locked()); spin_lock(&iommu->lock); @@ -1500,23 +1334,12 @@ int domain_context_mapping_one( context =3D &context_entries[devfn]; old =3D (lctxt =3D *context).full; =20 + did =3D ctx->arch.vtd.didmap[iommu->index]; + if ( context_present(lctxt) ) { - domid_t domid; - prev_did =3D context_domain_id(lctxt); - domid =3D did_to_domain_id(iommu, prev_did); - if ( domid < DOMID_FIRST_RESERVED ) - prev_dom =3D rcu_lock_domain_by_id(domid); - if ( !prev_dom ) - { - spin_unlock(&iommu->lock); - unmap_vtd_domain_page(context_entries); - dprintk(XENLOG_DEBUG VTDPREFIX, - "no domain for did %u (nr_dom %u)\n", - prev_did, cap_ndoms(iommu->cap)); - return -ESRCH; - } + overwrite_entry =3D true; } =20 if ( iommu_hwdom_passthrough && is_hardware_domain(domain) ) @@ -1532,11 +1355,7 @@ int domain_context_mapping_one( root =3D domain_pgd_maddr(domain, ctx, pgd_maddr, iommu->nr_pt_lev= els); if ( !root ) { - spin_unlock(&ctx->arch.mapping_lock); - spin_unlock(&iommu->lock); unmap_vtd_domain_page(context_entries); - if ( prev_dom ) - rcu_unlock_domain(prev_dom); return -ENOMEM; } =20 @@ -1549,35 +1368,13 @@ int domain_context_mapping_one( spin_unlock(&ctx->arch.mapping_lock); } =20 - rc =3D context_set_domain_id(&lctxt, domid, iommu); + rc =3D context_set_domain_id(&lctxt, did, iommu); if ( rc ) - { - unlock: - spin_unlock(&iommu->lock); - unmap_vtd_domain_page(context_entries); - if ( prev_dom ) - rcu_unlock_domain(prev_dom); - return rc; - } - - if ( !prev_dom ) - { - context_set_address_width(lctxt, level_to_agaw(iommu->nr_pt_levels= )); - context_set_fault_enable(lctxt); - context_set_present(lctxt); - } - else if ( prev_dom =3D=3D domain ) - { - ASSERT(lctxt.full =3D=3D context->full); - rc =3D !!pdev; goto unlock; - } - else - { - ASSERT(context_address_width(lctxt) =3D=3D - level_to_agaw(iommu->nr_pt_levels)); - ASSERT(!context_fault_disable(lctxt)); - } + + context_set_address_width(lctxt, level_to_agaw(iommu->nr_pt_levels)); + context_set_fault_enable(lctxt); + context_set_present(lctxt); =20 res =3D cmpxchg16b(context, &old, &lctxt.full); =20 @@ -1587,8 +1384,6 @@ int domain_context_mapping_one( */ if ( res !=3D old ) { - if ( pdev ) - check_cleanup_domid_map(domain, pdev, iommu); printk(XENLOG_ERR "%pp: unexpected context entry %016lx_%016lx (expected %01= 6lx_%016lx)\n", &PCI_SBDF(seg, bus, devfn), @@ -1602,9 +1397,9 @@ int domain_context_mapping_one( spin_unlock(&iommu->lock); =20 rc =3D iommu_flush_context_device(iommu, prev_did, PCI_BDF(bus, devfn), - DMA_CCMD_MASK_NOBIT, !prev_dom); + DMA_CCMD_MASK_NOBIT, !overwrite_entry); flush_dev_iotlb =3D !!find_ats_dev_drhd(iommu); - ret =3D iommu_flush_iotlb_dsi(iommu, prev_did, !prev_dom, flush_dev_io= tlb); + ret =3D iommu_flush_iotlb_dsi(iommu, prev_did, !overwrite_entry, flush= _dev_iotlb); =20 /* * The current logic for returns: @@ -1620,18 +1415,27 @@ int domain_context_mapping_one( if ( rc > 0 ) rc =3D 0; =20 - set_bit(iommu->index, ctx->arch.vtd.iommu_bitmap); + if ( prev_ctx ) + { + /* Don't underflow the counter. */ + BUG_ON(!prev_ctx->arch.vtd.iommu_dev_cnt[iommu->index]); + prev_ctx->arch.vtd.iommu_dev_cnt[iommu->index]--; + } + + ctx->arch.vtd.iommu_dev_cnt[iommu->index]++; =20 unmap_vtd_domain_page(context_entries); + spin_unlock(&iommu->lock); =20 if ( !seg && !rc ) rc =3D me_wifi_quirk(domain, bus, devfn, domid, pgd_maddr, mode); =20 + return rc; =20 - if ( prev_dom ) - rcu_unlock_domain(prev_dom); - - return rc ?: pdev && prev_dom; + unlock: + unmap_vtd_domain_page(context_entries); + spin_unlock(&iommu->lock); + return rc; } =20 static const struct acpi_drhd_unit *domain_context_unmap( @@ -1643,7 +1447,7 @@ static int domain_context_mapping(struct domain *doma= in, struct iommu_context *c const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); const struct acpi_rmrr_unit *rmrr; paddr_t pgd_maddr =3D ctx->arch.vtd.pgd_maddr; - domid_t did =3D domain->domain_id; + domid_t did =3D ctx->arch.vtd.didmap[drhd->iommu->index]; int ret =3D 0; unsigned int i, mode =3D 0; uint16_t seg =3D pdev->seg, bdf; @@ -1972,9 +1776,10 @@ static void cf_check iommu_domain_teardown(struct do= main *d) ASSERT(!ctx->arch.vtd.pgd_maddr); =20 for_each_drhd_unit ( drhd ) - cleanup_domid_map(d->domain_id, drhd->iommu); + iommu_free_domid(d->domain_id, drhd->iommu->domid_bitmap); =20 - XFREE(ctx->arch.vtd.iommu_bitmap); + XFREE(ctx->arch.vtd.iommu_dev_cnt); + XFREE(ctx->arch.vtd.didmap); } =20 static void quarantine_teardown(struct pci_dev *pdev, diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/= vtd/iommu.h index 29d350b23d..77edfa3587 100644 --- a/xen/drivers/passthrough/vtd/iommu.h +++ b/xen/drivers/passthrough/vtd/iommu.h @@ -506,9 +506,7 @@ struct vtd_iommu { } flush; =20 struct list_head ats_devices; - unsigned long *pseudo_domid_map; /* "pseudo" domain id bitmap */ unsigned long *domid_bitmap; /* domain id bitmap */ - domid_t *domid_map; /* domain id mapping array */ uint32_t version; }; =20 diff --git a/xen/drivers/passthrough/vtd/qinval.c b/xen/drivers/passthrough= /vtd/qinval.c index 036f3e8505..3f25b6a2e0 100644 --- a/xen/drivers/passthrough/vtd/qinval.c +++ b/xen/drivers/passthrough/vtd/qinval.c @@ -229,7 +229,7 @@ static int __must_check dev_invalidate_sync(struct vtd_= iommu *iommu, rc =3D queue_invalidate_wait(iommu, 0, 1, 1, 1); if ( rc =3D=3D -ETIMEDOUT && !pdev->broken ) { - struct domain *d =3D rcu_lock_domain_by_id(did_to_domain_id(iommu,= did)); + struct domain *d =3D rcu_lock_domain(pdev->domain); =20 /* * In case the domain has been freed or the IOMMU domid bitmap is diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/= x86/iommu.c index 75c8752022..98cca92dc3 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -534,9 +534,6 @@ unsigned long *__init iommu_init_domid(domid_t reserve) { unsigned long *map; =20 - if ( !iommu_quarantine ) - return ZERO_BLOCK_PTR; - BUILD_BUG_ON(DOMID_MASK * 2U >=3D UINT16_MAX); =20 map =3D xzalloc_array(unsigned long, BITS_TO_LONGS(UINT16_MAX - DOMID_= MASK)); @@ -551,36 +548,24 @@ unsigned long *__init iommu_init_domid(domid_t reserv= e) =20 domid_t iommu_alloc_domid(unsigned long *map) { - /* - * This is used uniformly across all IOMMUs, such that on typical - * systems we wouldn't re-use the same ID very quickly (perhaps never). - */ - static unsigned int start; - unsigned int idx =3D find_next_zero_bit(map, UINT16_MAX - DOMID_MASK, = start); + /* TODO: Consider nr_doms ? */ + unsigned int idx =3D find_next_zero_bit(map, UINT16_MAX, 0); =20 - ASSERT(pcidevs_locked()); - - if ( idx >=3D UINT16_MAX - DOMID_MASK ) - idx =3D find_first_zero_bit(map, UINT16_MAX - DOMID_MASK); - if ( idx >=3D UINT16_MAX - DOMID_MASK ) - return DOMID_INVALID; + if ( idx >=3D UINT16_MAX ) + return UINT16_MAX; =20 __set_bit(idx, map); =20 - start =3D idx + 1; - - return idx | (DOMID_MASK + 1); + return idx; } =20 void iommu_free_domid(domid_t domid, unsigned long *map) { ASSERT(pcidevs_locked()); =20 - if ( domid =3D=3D DOMID_INVALID ) + if ( domid =3D=3D UINT16_MAX ) return; =20 - ASSERT(domid > DOMID_MASK); - if ( !__test_and_clear_bit(domid & DOMID_MASK, map) ) BUG(); } --=20 2.51.2 -- Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sun Feb 8 01:17:05 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=reject dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1763637031; cv=none; d=zohomail.com; s=zohoarc; b=gqSBdZ93IOB96AWdiorvV45Cu9RYrvSPZY/FSRHBdvmND5Wsnrkm0fKin1HGw9+tJ8JWiPbf7gEbYN/2kacdJeAvIJsdmCghv7SyjS/3NiFDypMXzCtY8MW3ERV7OauFkaqgHyCtRP+buG6UHAKGystDXiOYif6MC94N9f+bB4w= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1763637031; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=7PY47YiBEAqJUtWCu8V5zgWutXDzUBRUXX0Mm4bYILU=; b=elTsiLw2oVXF2Lsi8U0qp7wsq7FYVxcoVFe/Rz3uwQ6EHO8VqDkZme4LOAZy2MIpHgUh5O1ToduLY7idG0nOfWRbwKex+b0gLpLFXBbbmJCPcvdfcYY1XROsOviuo0wLbldRwRC/QSVsPG8u78CovVRj3U/mJuQe8ngDv7TaNGE= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1763637030179645.2272491484157; Thu, 20 Nov 2025 03:10:30 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.1166866.1493360 (Exim 4.92) (envelope-from ) id 1vM2YA-0004Fs-EP; Thu, 20 Nov 2025 11:10:06 +0000 Received: by outflank-mailman (output) from mailman id 1166866.1493360; Thu, 20 Nov 2025 11:10:06 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2YA-0004EE-5Z; Thu, 20 Nov 2025 11:10:06 +0000 Received: by outflank-mailman (input) for mailman id 1166866; Thu, 20 Nov 2025 11:10:05 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2Y8-0001PI-Gw for xen-devel@lists.xenproject.org; Thu, 20 Nov 2025 11:10:05 +0000 Received: from mail128-17.atl41.mandrillapp.com (mail128-17.atl41.mandrillapp.com [198.2.128.17]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 75a199dc-c601-11f0-9d18-b5c5bf9af7f9; Thu, 20 Nov 2025 12:10:02 +0100 (CET) Received: from pmta08.mandrill.prod.atl01.rsglab.com (localhost [127.0.0.1]) by mail128-17.atl41.mandrillapp.com (Mailchimp) with ESMTP id 4dBwc94n00zCf9RNh for ; Thu, 20 Nov 2025 11:10:01 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id 9649909ca9c644af83ec3718ce3f354f; Thu, 20 Nov 2025 11:10:01 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 75a199dc-c601-11f0-9d18-b5c5bf9af7f9 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1763637001; x=1763907001; bh=7PY47YiBEAqJUtWCu8V5zgWutXDzUBRUXX0Mm4bYILU=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=LikyBE+XofQWyK4B/7P5SkOJdSc3CL5NR4qU+CApmLZQzOFIibAPe3yBI9ZZ7YIOg tjar0VBWq8F2HvlQ6zk4rzt9bsluG+RMPcQFcPUM+/L74anmLxLy9ytPXfX8x++R39 7XtyfeUwoqYhyjnmVVB6v1xpNEbZuhlyk+CrS1femTugg6SaLGsf6fQi/ugPvePXY4 tc72y78TrD6aFWziIAQ//ZVbJnA/UxUQD0X6o5cUXIei+WryozZuqXVMxQRadCJ5er SfyaWkjdZGSvIf/pl+g0gK4ZoKZnm1YgWf2D8QDepXADF2gPuWKsvZMVjX2WUjpXaf D7CZ8E4sdhY2g== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1763637001; x=1763897501; i=teddy.astie@vates.tech; bh=7PY47YiBEAqJUtWCu8V5zgWutXDzUBRUXX0Mm4bYILU=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=b/98XeGv2B1XQhW0+gWfHwXOQlZDQkgCZBRW+N+AZ6LHTjZqu+wMRf4VsbRPIediD BDTxAwHXbxkI+dYix4psA2AIRNYMN5zEpQyOoOG8j3ojTfkWRkc+LBxB4kFnPD0PGh 6jAyL/UdVx662uD1u5IgnHZg8hEVCGXEujsq/rHpmYj4EPUk9VXCZxwOLEAQE31hbG 0HNVI+kJu5w36STZ6WnjDYIddK9G5/c2+71mN/97/pO8EqWhlHtU4Nh5Rb5KY/tA2b WsHyLbsrgS7mDgfCMO9CyWnKRe9UPCtxZClKFZ/xq7PMBdvAc+OUtf8vHCMotmPDj9 w5Nf1gghXpx0Q== From: "Teddy Astie" Subject: =?utf-8?Q?[RFC=20PATCH=20v7=2008/14]=20iommu:=20Introduce=20redesigned=20IOMMU=20subsystem?= X-Mailer: git-send-email 2.51.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1763636998872 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Jan Beulich" , "Andrew Cooper" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" , "Anthony PERARD" , "Michal Orzel" , "Julien Grall" , "Stefano Stabellini" , "Jason Andryuk" Message-Id: In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.9649909ca9c644af83ec3718ce3f354f?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20251120:md Date: Thu, 20 Nov 2025 11:10:01 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity teddy.astie@vates.tech) (identity @mandrillapp.com) X-ZM-MESSAGEID: 1763637035841018900 Content-Type: text/plain; charset="utf-8" Introduce the changes proposed in docs/designs/iommu-context.md. Signed-off-by Teddy Astie --- This patch is still quite large but I am not sure how to split it further. --- xen/arch/x86/include/asm/iommu.h | 8 +- xen/arch/x86/mm/p2m-ept.c | 2 +- xen/arch/x86/pv/dom0_build.c | 6 +- xen/common/memory.c | 4 +- xen/drivers/passthrough/amd/iommu.h | 13 +- xen/drivers/passthrough/amd/iommu_cmd.c | 20 +- xen/drivers/passthrough/amd/iommu_init.c | 2 +- xen/drivers/passthrough/amd/iommu_map.c | 52 +- xen/drivers/passthrough/amd/pci_amd_iommu.c | 383 ++++--- xen/drivers/passthrough/iommu.c | 635 +++++++++++- xen/drivers/passthrough/pci.c | 387 +++---- xen/drivers/passthrough/vtd/extern.h | 17 +- xen/drivers/passthrough/vtd/iommu.c | 1030 +++++++------------ xen/drivers/passthrough/vtd/quirks.c | 22 +- xen/drivers/passthrough/x86/iommu.c | 153 ++- xen/include/xen/iommu.h | 99 +- xen/include/xen/pci.h | 3 + 17 files changed, 1530 insertions(+), 1306 deletions(-) diff --git a/xen/arch/x86/include/asm/iommu.h b/xen/arch/x86/include/asm/io= mmu.h index d20c3cda59..654a07b9b2 100644 --- a/xen/arch/x86/include/asm/iommu.h +++ b/xen/arch/x86/include/asm/iommu.h @@ -2,10 +2,12 @@ #ifndef __ARCH_X86_IOMMU_H__ #define __ARCH_X86_IOMMU_H__ =20 +#include #include #include #include #include +#include #include #include #include @@ -39,18 +41,16 @@ struct arch_iommu_context struct list_head identity_maps; =20 =20 - spinlock_t mapping_lock; /* io page table lock */ - union { /* Intel VT-d */ struct { uint64_t pgd_maddr; /* io page directory machine address */ domid_t *didmap; /* per-iommu DID (valid only if related iommu= _dev_cnt > 0) */ unsigned long *iommu_dev_cnt; /* counter of devices per iommu = */ + uint32_t superpage_progress; /* superpage progress during tear= down */ } vtd; /* AMD IOMMU */ struct { - unsigned int paging_mode; struct page_info *root_table; domid_t *didmap; /* per-iommu DID (valid only if related iommu= _dev_cnt > 0) */ unsigned long *iommu_dev_cnt; /* counter of devices per iommu = */ @@ -72,7 +72,7 @@ struct arch_iommu struct { unsigned int paging_mode; struct guest_iommu *g_iommu; - }; + } amd; }; }; =20 diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index b854a08b4c..6cc97ec139 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -978,7 +978,7 @@ out: rc =3D iommu_iotlb_flush(d, _dfn(gfn), 1ul << order, (iommu_flags ? IOMMU_FLUSHF_added : 0) | (vtd_pte_present ? IOMMU_FLUSHF_modified - : 0)); + : 0), 0); else if ( need_iommu_pt_sync(d) ) rc =3D iommu_flags ? iommu_legacy_map(d, _dfn(gfn), mfn, 1ul << order, iommu_fl= ags) : diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c index 21158ce181..1bae5f2ba9 100644 --- a/xen/arch/x86/pv/dom0_build.c +++ b/xen/arch/x86/pv/dom0_build.c @@ -77,7 +77,7 @@ static __init void mark_pv_pt_pages_rdonly(struct domain = *d, * iommu_memory_setup() ended up mapping them. */ if ( need_iommu_pt_sync(d) && - iommu_unmap(d, _dfn(mfn_x(page_to_mfn(page))), 1, 0, flush_fl= ags) ) + iommu_unmap(d, _dfn(mfn_x(page_to_mfn(page))), 1, 0, flush_fl= ags, 0) ) BUG(); =20 /* Read-only mapping + PGC_allocated + page-table page. */ @@ -128,7 +128,7 @@ static void __init iommu_memory_setup(struct domain *d,= const char *what, =20 while ( (rc =3D iommu_map(d, _dfn(mfn_x(mfn)), mfn, nr, IOMMUF_readable | IOMMUF_writable | IOMMUF_pre= empt, - flush_flags)) > 0 ) + flush_flags, 0)) > 0 ) { mfn =3D mfn_add(mfn, rc); nr -=3D rc; @@ -962,7 +962,7 @@ static int __init dom0_construct(const struct boot_doma= in *bd) } =20 /* Use while() to avoid compiler warning. */ - while ( iommu_iotlb_flush_all(d, flush_flags) ) + while ( iommu_iotlb_flush_all(d, 0, flush_flags) ) break; =20 if ( initrd_len !=3D 0 ) diff --git a/xen/common/memory.c b/xen/common/memory.c index 3688e6dd50..0c0526a160 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -928,7 +928,7 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_= add_to_physmap *xatp, this_cpu(iommu_dont_flush_iotlb) =3D 0; =20 ret =3D iommu_iotlb_flush(d, _dfn(xatp->idx - done), done, - IOMMU_FLUSHF_modified); + IOMMU_FLUSHF_modified, 0); if ( unlikely(ret) && rc >=3D 0 ) rc =3D ret; =20 @@ -942,7 +942,7 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_= add_to_physmap *xatp, put_page(pages[i]); =20 ret =3D iommu_iotlb_flush(d, _dfn(xatp->gpfn - done), done, - IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified= ); + IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified= , 0); if ( unlikely(ret) && rc >=3D 0 ) rc =3D ret; } diff --git a/xen/drivers/passthrough/amd/iommu.h b/xen/drivers/passthrough/= amd/iommu.h index db6d7ace02..0bd0f15a72 100644 --- a/xen/drivers/passthrough/amd/iommu.h +++ b/xen/drivers/passthrough/amd/iommu.h @@ -198,11 +198,10 @@ void amd_iommu_quarantine_teardown(struct pci_dev *pd= ev); /* mapping functions */ int __must_check cf_check amd_iommu_map_page( struct domain *d, dfn_t dfn, mfn_t mfn, unsigned int flags, - unsigned int *flush_flags); + unsigned int *flush_flags, struct iommu_context *ctx); int __must_check cf_check amd_iommu_unmap_page( struct domain *d, dfn_t dfn, unsigned int order, - unsigned int *flush_flags); -int __must_check amd_iommu_alloc_root(struct domain *d); + unsigned int *flush_flags, struct iommu_context *ctx); int amd_iommu_reserve_domain_unity_map(struct domain *d, struct iommu_cont= ext *ctx, const struct ivrs_unity_map *map, unsigned int flag); @@ -211,7 +210,7 @@ int amd_iommu_reserve_domain_unity_unmap(struct domain = *d, struct iommu_context int cf_check amd_iommu_get_reserved_device_memory( iommu_grdm_t *func, void *ctxt); int __must_check cf_check amd_iommu_flush_iotlb_pages( - struct domain *d, dfn_t dfn, unsigned long page_count, + struct domain *d, struct iommu_context *ctx, dfn_t dfn, unsigned long = page_count, unsigned int flush_flags); void amd_iommu_print_entries(const struct amd_iommu *iommu, unsigned int d= ev_id, dfn_t dfn); @@ -233,9 +232,9 @@ void iommu_dte_add_device_entry(struct amd_iommu_dte *d= te, const struct ivrs_mappings *ivrs_dev); =20 /* send cmd to iommu */ -void amd_iommu_flush_all_pages(struct domain *d); -void amd_iommu_flush_pages(struct domain *d, unsigned long dfn, - unsigned int order); +void amd_iommu_flush_all_pages(struct domain *d, struct iommu_context *ctx= ); +void amd_iommu_flush_pages(struct domain *d, struct iommu_context *ctx, + unsigned long dfn, unsigned int order); void amd_iommu_flush_iotlb(u8 devfn, const struct pci_dev *pdev, daddr_t daddr, unsigned int order); void amd_iommu_flush_device(struct amd_iommu *iommu, uint16_t bdf, diff --git a/xen/drivers/passthrough/amd/iommu_cmd.c b/xen/drivers/passthro= ugh/amd/iommu_cmd.c index 0c4dcf4ece..13c353b038 100644 --- a/xen/drivers/passthrough/amd/iommu_cmd.c +++ b/xen/drivers/passthrough/amd/iommu_cmd.c @@ -327,19 +327,21 @@ static void amd_iommu_flush_all_iotlbs(const struct d= omain *d, daddr_t daddr, } =20 /* Flush iommu cache after p2m changes. */ -static void _amd_iommu_flush_pages(struct domain *d, +static void _amd_iommu_flush_pages(struct domain *d, struct iommu_context = *ctx, daddr_t daddr, unsigned int order) { struct amd_iommu *iommu; - struct iommu_context *ctx =3D iommu_default_context(d); =20 /* send INVALIDATE_IOMMU_PAGES command */ for_each_amd_iommu ( iommu ) { - domid_t dom_id =3D ctx->arch.amd.didmap[iommu->index]; + if ( ctx->arch.amd.iommu_dev_cnt[iommu->index] ) + { + domid_t dom_id =3D ctx->arch.amd.didmap[iommu->index]; =20 - invalidate_iommu_pages(iommu, daddr, dom_id, order); - flush_command_buffer(iommu, 0); + invalidate_iommu_pages(iommu, daddr, dom_id, order); + flush_command_buffer(iommu, 0); + } } =20 if ( ats_enabled ) @@ -355,15 +357,15 @@ static void _amd_iommu_flush_pages(struct domain *d, } } =20 -void amd_iommu_flush_all_pages(struct domain *d) +void amd_iommu_flush_all_pages(struct domain *d, struct iommu_context *ctx) { - _amd_iommu_flush_pages(d, INV_IOMMU_ALL_PAGES_ADDRESS, 0); + _amd_iommu_flush_pages(d, ctx, INV_IOMMU_ALL_PAGES_ADDRESS, 0); } =20 -void amd_iommu_flush_pages(struct domain *d, +void amd_iommu_flush_pages(struct domain *d, struct iommu_context *ctx, unsigned long dfn, unsigned int order) { - _amd_iommu_flush_pages(d, __dfn_to_daddr(dfn), order); + _amd_iommu_flush_pages(d, ctx, __dfn_to_daddr(dfn), order); } =20 void amd_iommu_flush_device(struct amd_iommu *iommu, uint16_t bdf, diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthr= ough/amd/iommu_init.c index 5cbb3fdb05..bf32b6c718 100644 --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -1538,7 +1538,7 @@ static void invalidate_all_domain_pages(void) =20 for_each_domain( d ) if ( is_iommu_enabled(d) ) - amd_iommu_flush_all_pages(d); + amd_iommu_flush_all_pages(d, iommu_default_context(d)); } =20 static int cf_check _invalidate_all_devices( diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthro= ugh/amd/iommu_map.c index 42827c7dc7..01b36fdf4f 100644 --- a/xen/drivers/passthrough/amd/iommu_map.c +++ b/xen/drivers/passthrough/amd/iommu_map.c @@ -276,7 +276,7 @@ static int iommu_pde_from_dfn(struct domain *d, struct = iommu_context *ctx, struct domain_iommu *hd =3D dom_iommu(d); =20 table =3D ctx->arch.amd.root_table; - level =3D ctx->arch.amd.paging_mode; + level =3D hd->arch.amd.paging_mode; =20 if ( !table || target < 1 || level < target || level > 6 ) { @@ -400,21 +400,17 @@ static void queue_free_pt(struct domain *d, struct io= mmu_context *ctx, mfn_t mfn =20 int cf_check amd_iommu_map_page( struct domain *d, dfn_t dfn, mfn_t mfn, unsigned int flags, - unsigned int *flush_flags) + unsigned int *flush_flags, struct iommu_context *ctx) { struct domain_iommu *hd =3D dom_iommu(d); - struct iommu_context *ctx =3D iommu_default_context(d); unsigned int level =3D (IOMMUF_order(flags) / PTE_PER_TABLE_SHIFT) + 1; bool contig; - int rc; unsigned long pt_mfn =3D 0; union amd_iommu_pte old; =20 ASSERT((hd->platform_ops->page_sizes >> IOMMUF_order(flags)) & PAGE_SIZE_4K); =20 - spin_lock(&ctx->arch.mapping_lock); - /* * IOMMU mapping request can be safely ignored when the domain is dyin= g. * @@ -422,25 +418,11 @@ int cf_check amd_iommu_map_page( * before any page tables are freed (see iommu_free_pgtables()). */ if ( d->is_dying ) - { - spin_unlock(&ctx->arch.mapping_lock); return 0; - } - - rc =3D amd_iommu_alloc_root(d); - if ( rc ) - { - spin_unlock(&ctx->arch.mapping_lock); - AMD_IOMMU_ERROR("root table alloc failed, dfn =3D %"PRI_dfn"\n", - dfn_x(dfn)); - domain_crash(d); - return rc; - } =20 if ( iommu_pde_from_dfn(d, ctx, dfn_x(dfn), level, &pt_mfn, flush_flag= s, true) || !pt_mfn ) { - spin_unlock(&ctx->arch.mapping_lock); AMD_IOMMU_ERROR("invalid IO pagetable entry dfn =3D %"PRI_dfn"\n", dfn_x(dfn)); domain_crash(d); @@ -452,7 +434,7 @@ int cf_check amd_iommu_map_page( flags & IOMMUF_writable, flags & IOMMUF_readable, &contig); =20 - while ( unlikely(contig) && ++level < ctx->arch.amd.paging_mode ) + while ( unlikely(contig) && ++level < hd->arch.amd.paging_mode ) { struct page_info *pg =3D mfn_to_page(_mfn(pt_mfn)); unsigned long next_mfn; @@ -471,8 +453,6 @@ int cf_check amd_iommu_map_page( perfc_incr(iommu_pt_coalesces); } =20 - spin_unlock(&ctx->arch.mapping_lock); - *flush_flags |=3D IOMMU_FLUSHF_added; if ( old.pr ) { @@ -486,11 +466,11 @@ int cf_check amd_iommu_map_page( } =20 int cf_check amd_iommu_unmap_page( - struct domain *d, dfn_t dfn, unsigned int order, unsigned int *flush_f= lags) + struct domain *d, dfn_t dfn, unsigned int order, unsigned int *flush_f= lags, + struct iommu_context *ctx) { unsigned long pt_mfn =3D 0; struct domain_iommu *hd =3D dom_iommu(d); - struct iommu_context *ctx =3D iommu_default_context(d); unsigned int level =3D (order / PTE_PER_TABLE_SHIFT) + 1; union amd_iommu_pte old =3D {}; =20 @@ -500,17 +480,11 @@ int cf_check amd_iommu_unmap_page( */ ASSERT((hd->platform_ops->page_sizes >> order) & PAGE_SIZE_4K); =20 - spin_lock(&ctx->arch.mapping_lock); - if ( !ctx->arch.amd.root_table ) - { - spin_unlock(&ctx->arch.mapping_lock); return 0; - } =20 if ( iommu_pde_from_dfn(d, ctx, dfn_x(dfn), level, &pt_mfn, flush_flag= s, false) ) { - spin_unlock(&ctx->arch.mapping_lock); AMD_IOMMU_ERROR("invalid IO pagetable entry dfn =3D %"PRI_dfn"\n", dfn_x(dfn)); domain_crash(d); @@ -524,7 +498,7 @@ int cf_check amd_iommu_unmap_page( /* Mark PTE as 'page not present'. */ old =3D clear_iommu_pte_present(pt_mfn, dfn_x(dfn), level, &free); =20 - while ( unlikely(free) && ++level < ctx->arch.amd.paging_mode ) + while ( unlikely(free) && ++level < hd->arch.amd.paging_mode ) { struct page_info *pg =3D mfn_to_page(_mfn(pt_mfn)); =20 @@ -540,8 +514,6 @@ int cf_check amd_iommu_unmap_page( } } =20 - spin_unlock(&ctx->arch.mapping_lock); - if ( old.pr ) { *flush_flags |=3D IOMMU_FLUSHF_modified; @@ -608,7 +580,7 @@ static unsigned long flush_count(unsigned long dfn, uns= igned long page_count, } =20 int cf_check amd_iommu_flush_iotlb_pages( - struct domain *d, dfn_t dfn, unsigned long page_count, + struct domain *d, struct iommu_context *ctx, dfn_t dfn, unsigned long = page_count, unsigned int flush_flags) { unsigned long dfn_l =3D dfn_x(dfn); @@ -626,7 +598,7 @@ int cf_check amd_iommu_flush_iotlb_pages( /* If so requested or if the range wraps then just flush everything. */ if ( (flush_flags & IOMMU_FLUSHF_all) || dfn_l + page_count < dfn_l ) { - amd_iommu_flush_all_pages(d); + amd_iommu_flush_all_pages(d, ctx); return 0; } =20 @@ -639,13 +611,13 @@ int cf_check amd_iommu_flush_iotlb_pages( * flush code. */ if ( page_count =3D=3D 1 ) /* order 0 flush count */ - amd_iommu_flush_pages(d, dfn_l, 0); + amd_iommu_flush_pages(d, ctx, dfn_l, 0); else if ( flush_count(dfn_l, page_count, 9) =3D=3D 1 ) - amd_iommu_flush_pages(d, dfn_l, 9); + amd_iommu_flush_pages(d, ctx, dfn_l, 9); else if ( flush_count(dfn_l, page_count, 18) =3D=3D 1 ) - amd_iommu_flush_pages(d, dfn_l, 18); + amd_iommu_flush_pages(d, ctx, dfn_l, 18); else - amd_iommu_flush_all_pages(d); + amd_iommu_flush_all_pages(d, ctx); =20 return 0; } diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/pass= through/amd/pci_amd_iommu.c index c871660661..3c17d78caf 100644 --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -20,8 +20,11 @@ #include #include #include +#include +#include =20 #include +#include =20 #include "iommu.h" #include "../ats.h" @@ -85,18 +88,6 @@ int get_dma_requestor_id(uint16_t seg, uint16_t bdf) return req_id; } =20 -static int __must_check allocate_domain_resources(struct domain *d) -{ - struct iommu_context *ctx =3D iommu_default_context(d); - int rc; - - spin_lock(&ctx->arch.mapping_lock); - rc =3D amd_iommu_alloc_root(d); - spin_unlock(&ctx->arch.mapping_lock); - - return rc; -} - static bool any_pdev_behind_iommu(const struct domain *d, const struct pci_dev *exclude, const struct amd_iommu *iommu) @@ -127,8 +118,9 @@ static bool use_ats( =20 static int __must_check amd_iommu_setup_domain_device( struct domain *domain, struct iommu_context *ctx, struct amd_iommu *io= mmu, - uint8_t devfn, struct pci_dev *pdev) + uint8_t devfn, struct pci_dev *pdev, struct iommu_context *prev_ctx) { + struct domain_iommu *hd =3D dom_iommu(domain); struct amd_iommu_dte *table, *dte; unsigned long flags; unsigned int req_id, sr_flags; @@ -138,11 +130,7 @@ static int __must_check amd_iommu_setup_domain_device( const struct page_info *root_pg; domid_t domid; =20 - BUG_ON(!ctx->arch.amd.paging_mode || !iommu->dev_table.buffer); - - rc =3D allocate_domain_resources(domain); - if ( rc ) - return rc; + BUG_ON(!hd->arch.amd.paging_mode || !iommu->dev_table.buffer); =20 req_id =3D get_dma_requestor_id(iommu->sbdf.seg, pdev->sbdf.bdf); ivrs_dev =3D &get_ivrs_mappings(iommu->sbdf.seg)[req_id]; @@ -157,7 +145,7 @@ static int __must_check amd_iommu_setup_domain_device( ivrs_dev =3D &get_ivrs_mappings(iommu->sbdf.seg)[req_id]; =20 root_pg =3D ctx->arch.amd.root_table; - domid =3D domain->domain_id; + domid =3D ctx->arch.amd.didmap[iommu->index]; =20 spin_lock_irqsave(&iommu->lock, flags); =20 @@ -166,7 +154,7 @@ static int __must_check amd_iommu_setup_domain_device( /* bind DTE to domain page-tables */ rc =3D amd_iommu_set_root_page_table( dte, page_to_maddr(root_pg), domid, - ctx->arch.amd.paging_mode, sr_flags); + hd->arch.amd.paging_mode, sr_flags); if ( rc ) { ASSERT(rc < 0); @@ -208,7 +196,7 @@ static int __must_check amd_iommu_setup_domain_device( else rc =3D amd_iommu_set_root_page_table( dte, page_to_maddr(root_pg), domid, - ctx->arch.amd.paging_mode, sr_flags); + hd->arch.amd.paging_mode, sr_flags); if ( rc < 0 ) { spin_unlock_irqrestore(&iommu->lock, flags); @@ -251,6 +239,7 @@ static int __must_check amd_iommu_setup_domain_device( spin_unlock_irqrestore(&iommu->lock, flags); =20 amd_iommu_flush_device(iommu, req_id, prev_domid); + amd_iommu_flush_device(iommu, req_id, domid); } else spin_unlock_irqrestore(&iommu->lock, flags); @@ -259,7 +248,7 @@ static int __must_check amd_iommu_setup_domain_device( "root table =3D %#"PRIx64", " "domain =3D %d, paging mode =3D %d\n", req_id, pdev->type, page_to_maddr(root_pg), - domid, ctx->arch.amd.paging_mode); + domid, hd->arch.amd.paging_mode); =20 ASSERT(pcidevs_locked()); =20 @@ -272,6 +261,15 @@ static int __must_check amd_iommu_setup_domain_device( amd_iommu_flush_iotlb(devfn, pdev, INV_IOMMU_ALL_PAGES_ADDRESS, 0); } =20 + if ( prev_ctx ) + { + /* Don't underflow the counter. */ + BUG_ON(!prev_ctx->arch.amd.iommu_dev_cnt[iommu->index]); + prev_ctx->arch.amd.iommu_dev_cnt[iommu->index]--; + } + + ctx->arch.amd.iommu_dev_cnt[iommu->index]++; + return 0; } =20 @@ -338,27 +336,12 @@ static int cf_check iov_enable_xt(void) return 0; } =20 -int amd_iommu_alloc_root(struct domain *d) -{ - struct domain_iommu *hd =3D dom_iommu(d); - struct iommu_context *ctx =3D iommu_default_context(d); - - if ( unlikely(!ctx->arch.amd.root_table) && d !=3D dom_io ) - { - ctx->arch.amd.root_table =3D iommu_alloc_pgtable(hd, ctx, 0); - if ( !ctx->arch.amd.root_table ) - return -ENOMEM; - } - - return 0; -} - unsigned int __read_mostly amd_iommu_max_paging_mode =3D IOMMU_MAX_PT_LEVE= LS; int __read_mostly amd_iommu_min_paging_mode =3D 1; =20 static int cf_check amd_iommu_domain_init(struct domain *d) { - struct iommu_context *ctx =3D iommu_default_context(d); + struct domain_iommu *hd =3D dom_iommu(d); int pglvl =3D amd_iommu_get_paging_mode( 1UL << (domain_max_paddr_bits(d) - PAGE_SHIFT)); =20 @@ -369,20 +352,20 @@ static int cf_check amd_iommu_domain_init(struct doma= in *d) * Choose the number of levels for the IOMMU page tables, taking into * account unity maps. */ - ctx->arch.amd.paging_mode =3D max(pglvl, amd_iommu_min_paging_mode); + hd->arch.amd.paging_mode =3D max(pglvl, amd_iommu_min_paging_mode); =20 return 0; } =20 -static int cf_check amd_iommu_add_device(u8 devfn, struct pci_dev *pdev); +static int __hwdom_init cf_check setup_hwdom_device(u8 devfn, struct pci_d= ev *pdev) +{ + return iommu_attach_context(hardware_domain, pdev, 0); +} =20 static void __hwdom_init cf_check amd_iommu_hwdom_init(struct domain *d) { const struct amd_iommu *iommu; =20 - if ( allocate_domain_resources(d) ) - BUG(); - for_each_amd_iommu ( iommu ) if ( iomem_deny_access(d, PFN_DOWN(iommu->mmio_base_phys), PFN_DOWN(iommu->mmio_base_phys + @@ -391,11 +374,12 @@ static void __hwdom_init cf_check amd_iommu_hwdom_ini= t(struct domain *d) =20 /* Make sure workarounds are applied (if needed) before adding devices= . */ arch_iommu_hwdom_init(d); - setup_hwdom_pci_devices(d, amd_iommu_add_device); + setup_hwdom_pci_devices(d, setup_hwdom_device); } =20 static void amd_iommu_disable_domain_device(const struct domain *domain, struct amd_iommu *iommu, + struct iommu_context *prev_ctx, uint8_t devfn, struct pci_dev = *pdev) { struct amd_iommu_dte *table, *dte; @@ -442,155 +426,82 @@ static void amd_iommu_disable_domain_device(const st= ruct domain *domain, AMD_IOMMU_DEBUG("Disable: device id =3D %#x, " "domain =3D %d, paging mode =3D %d\n", req_id, dte->domain_id, - iommu_default_context(domain)->arch.amd.paging_mod= e); + dom_iommu(domain)->arch.amd.paging_mode); } else spin_unlock_irqrestore(&iommu->lock, flags); + + BUG_ON(!prev_ctx->arch.amd.iommu_dev_cnt[iommu->index]); + prev_ctx->arch.amd.iommu_dev_cnt[iommu->index]--; } =20 -static int cf_check reassign_device( - struct domain *source, struct domain *target, u8 devfn, - struct pci_dev *pdev) +static int cf_check amd_iommu_context_init(struct domain *d, struct iommu_= context *ctx, + u32 flags) { struct amd_iommu *iommu; - struct iommu_context *target_ctx =3D iommu_default_context(target); - struct iommu_context *source_ctx =3D iommu_default_context(source); - int rc; + struct domain_iommu *hd =3D dom_iommu(d); =20 - iommu =3D find_iommu_for_device(pdev->sbdf); - if ( !iommu ) + ctx->arch.amd.didmap =3D xzalloc_array(domid_t, nr_amd_iommus); + if ( !ctx->arch.amd.didmap ) + return -ENOMEM; + + ctx->arch.amd.iommu_dev_cnt =3D xzalloc_array(unsigned long, nr_amd_io= mmus); + if ( !ctx->arch.amd.iommu_dev_cnt ) { - AMD_IOMMU_WARN("failed to find IOMMU: %pp cannot be assigned to %p= d\n", - &PCI_SBDF(pdev->seg, pdev->bus, devfn), target); - return -ENODEV; + xfree(ctx->arch.amd.didmap); + return -ENOMEM; } =20 - rc =3D amd_iommu_setup_domain_device(target, target_ctx, iommu, devfn,= pdev); - if ( rc ) - return rc; - - if ( devfn =3D=3D pdev->devfn && pdev->domain !=3D target ) + // TODO: Allocate IOMMU domid only when attaching devices ? + /* Populate context DID map using pseudo DIDs */ + for_each_amd_iommu(iommu) { - write_lock(&source->pci_lock); - list_del(&pdev->domain_list); - write_unlock(&source->pci_lock); - - pdev->domain =3D target; - - write_lock(&target->pci_lock); - list_add(&pdev->domain_list, &target->pdev_list); - write_unlock(&target->pci_lock); + ctx->arch.amd.didmap[iommu->index] =3D + iommu_alloc_domid(iommu->domid_map); } =20 - /* - * If the device belongs to the hardware domain, and it has a unity ma= pping, - * don't remove it from the hardware domain, because BIOS may referenc= e that - * mapping. - */ - if ( !is_hardware_domain(source) ) + if ( !ctx->opaque ) { - const struct ivrs_mappings *ivrs_mappings =3D get_ivrs_mappings(pd= ev->seg); - unsigned int bdf =3D PCI_BDF(pdev->bus, devfn); - - rc =3D amd_iommu_reserve_domain_unity_unmap( - source, source_ctx, - ivrs_mappings[get_dma_requestor_id(pdev->seg, bdf)].unity= _map); - if ( rc ) - return rc; + /* Create initial context page */ + ctx->arch.amd.root_table =3D iommu_alloc_pgtable(hd, ctx, 0); } =20 - AMD_IOMMU_DEBUG("Re-assign %pp from %pd to %pd\n", - &PCI_SBDF(pdev->seg, pdev->bus, devfn), source, target= ); + return arch_iommu_context_init(d, ctx, flags); =20 - return 0; } =20 -static int cf_check amd_iommu_assign_device( - struct domain *d, u8 devfn, struct pci_dev *pdev, u32 flag) +static int cf_check amd_iommu_context_teardown(struct domain *d, + struct iommu_context *ctx, u32 fla= gs) { - struct ivrs_mappings *ivrs_mappings =3D get_ivrs_mappings(pdev->seg); - unsigned int bdf =3D PCI_BDF(pdev->bus, devfn); - int req_id =3D get_dma_requestor_id(pdev->seg, bdf); - int rc =3D amd_iommu_reserve_domain_unity_map( - d, iommu_default_context(d), - ivrs_mappings[req_id].unity_map, flag); + struct amd_iommu *iommu; + pcidevs_lock(); =20 - if ( !rc ) - rc =3D reassign_device(pdev->domain, d, devfn, pdev); + // TODO: Cleanup mappings + ASSERT(ctx->arch.amd.didmap); =20 - if ( rc && !is_hardware_domain(d) ) + for_each_amd_iommu(iommu) { - int ret =3D amd_iommu_reserve_domain_unity_unmap( - d, iommu_default_context(d), - ivrs_mappings[req_id].unity_map); - - if ( ret ) - { - printk(XENLOG_ERR "AMD-Vi: " - "unity-unmap for %pd/%04x:%02x:%02x.%u failed (%d)\n", - d, pdev->seg, pdev->bus, - PCI_SLOT(devfn), PCI_FUNC(devfn), ret); - domain_crash(d); - } + iommu_free_domid(ctx->arch.amd.didmap[iommu->index], iommu->domid_= map); } =20 - return rc; -} - -static void cf_check amd_iommu_clear_root_pgtable(struct domain *d) -{ - struct iommu_context *ctx =3D iommu_default_context(d); - - spin_lock(&ctx->arch.mapping_lock); - ctx->arch.amd.root_table =3D NULL; - spin_unlock(&ctx->arch.mapping_lock); -} - -static void cf_check amd_iommu_domain_destroy(struct domain *d) -{ - struct iommu_context *ctx =3D iommu_default_context(d); + xfree(ctx->arch.amd.didmap); =20 - iommu_identity_map_teardown(d, ctx); - ASSERT(!ctx->arch.amd.root_table); + pcidevs_unlock(); + return arch_iommu_context_teardown(d, ctx, flags); } =20 -static int cf_check amd_iommu_add_device(u8 devfn, struct pci_dev *pdev) +static int cf_check amd_iommu_attach( + struct domain *d, struct pci_dev *pdev, struct iommu_context *ctx) { - struct amd_iommu *iommu; - struct iommu_context *ctx; - u16 bdf; - struct ivrs_mappings *ivrs_mappings; - - if ( !pdev->domain ) - return -EINVAL; - - ctx =3D iommu_default_context(pdev->domain); - - for_each_amd_iommu(iommu) - if ( pdev->sbdf.sbdf =3D=3D iommu->sbdf.sbdf ) - return is_hardware_domain(pdev->domain) ? 0 : -ENODEV; - - iommu =3D find_iommu_for_device(pdev->sbdf); - if ( unlikely(!iommu) ) - { - /* Filter bridge devices. */ - if ( pdev->type =3D=3D DEV_TYPE_PCI_HOST_BRIDGE && - is_hardware_domain(pdev->domain) ) - { - AMD_IOMMU_DEBUG("Skipping host bridge %pp\n", &pdev->sbdf); - return 0; - } - - AMD_IOMMU_WARN("no IOMMU for %pp; cannot be handed to %pd\n", - &PCI_SBDF(pdev->seg, pdev->bus, devfn), pdev->doma= in); - return -ENODEV; - } + int ret; + struct ivrs_mappings *ivrs_mappings =3D get_ivrs_mappings(pdev->seg); + int req_id =3D get_dma_requestor_id(pdev->seg, pdev->sbdf.bdf); + struct ivrs_unity_map *map =3D ivrs_mappings[req_id].unity_map; + struct amd_iommu *iommu =3D find_iommu_for_device(pdev->sbdf); + uint16_t bdf =3D pdev->sbdf.bdf; =20 - ivrs_mappings =3D get_ivrs_mappings(pdev->seg); - bdf =3D PCI_BDF(pdev->bus, devfn); - if ( !ivrs_mappings || - !ivrs_mappings[ivrs_mappings[bdf].dte_requestor_id].valid ) - return -EPERM; + if ( !iommu ) + return 0; =20 if ( iommu_intremap && ivrs_mappings[bdf].dte_requestor_id =3D=3D bdf && @@ -621,55 +532,98 @@ static int cf_check amd_iommu_add_device(u8 devfn, st= ruct pci_dev *pdev) amd_iommu_flush_device(iommu, bdf, DOMID_INVALID); } =20 - if ( amd_iommu_reserve_domain_unity_map( - pdev->domain, ctx, - ivrs_mappings[ivrs_mappings[bdf].dte_requestor_id].unity_map, - 0) ) - AMD_IOMMU_WARN("%pd: unity mapping failed for %pp\n", - pdev->domain, &PCI_SBDF(pdev->seg, bdf)); + ret =3D amd_iommu_reserve_domain_unity_map(d, ctx, map, 0); + if ( ret ) + return ret; =20 - return amd_iommu_setup_domain_device(pdev->domain, ctx, iommu, devfn, = pdev); + return amd_iommu_setup_domain_device(d, ctx, iommu, pdev->devfn, pdev,= NULL); } =20 -static int cf_check amd_iommu_remove_device(u8 devfn, struct pci_dev *pdev) +static int cf_check amd_iommu_detach(struct domain *d, struct pci_dev *pde= v, + struct iommu_context *prev_ctx) { - struct amd_iommu *iommu; - struct iommu_context *ctx; - u16 bdf; - struct ivrs_mappings *ivrs_mappings; + struct ivrs_mappings *ivrs_mappings =3D get_ivrs_mappings(pdev->seg); + int req_id =3D get_dma_requestor_id(pdev->seg, pdev->sbdf.bdf); + struct amd_iommu *iommu =3D find_iommu_for_device(pdev->sbdf); =20 - if ( !pdev->domain ) - return -EINVAL; + if ( !iommu ) + return 0; =20 - ctx =3D iommu_default_context(pdev->domain); + amd_iommu_disable_domain_device(d, iommu, prev_ctx, pdev->devfn, pdev); + + return amd_iommu_reserve_domain_unity_unmap(d, prev_ctx, ivrs_mappings= [req_id].unity_map); +} + +static int cf_check amd_iommu_add_devfn(struct domain *d, struct pci_dev *= pdev, + u16 devfn, struct iommu_context *c= tx) +{ + struct amd_iommu *iommu =3D find_iommu_for_device(pdev->sbdf); =20 - iommu =3D find_iommu_for_device(pdev->sbdf); if ( !iommu ) + return 0; + + return amd_iommu_setup_domain_device(d, ctx, iommu, pdev->devfn, pdev,= NULL); +} + +static int cf_check amd_iommu_remove_devfn(struct domain *d, struct pci_de= v *pdev, + u16 devfn, struct iommu_context= *prev_ctx) +{ + struct amd_iommu *iommu =3D find_iommu_for_device(pdev->sbdf); + + if ( !iommu ) + return 0; + + amd_iommu_disable_domain_device(d, iommu, prev_ctx, pdev->devfn, pdev); + + return 0; +} + +static int cf_check amd_iommu_reattach(struct domain *d, + struct pci_dev *pdev, + struct iommu_context *prev_ctx, + struct iommu_context *ctx) +{ + int ret, rc, req_id =3D get_dma_requestor_id(pdev->seg, pdev->sbdf.bdf= ); + struct ivrs_mappings *ivrs_mapping =3D &get_ivrs_mappings(pdev->seg)[r= eq_id]; + struct ivrs_unity_map *map =3D ivrs_mapping ? ivrs_mapping->unity_map = : NULL; + struct amd_iommu *iommu =3D find_iommu_for_device(pdev->sbdf); + + if ( !iommu ) + return 0; + + ret =3D amd_iommu_reserve_domain_unity_map(d, ctx, map, 0); + if ( ret ) + return ret; + + ret =3D amd_iommu_setup_domain_device(d, ctx, ivrs_mapping->iommu, pde= v->devfn, + pdev, prev_ctx); + if ( ret ) { - AMD_IOMMU_WARN("failed to find IOMMU: %pp cannot be removed from %= pd\n", - &PCI_SBDF(pdev->seg, pdev->bus, devfn), pdev->doma= in); - return -ENODEV; - } + if ( (rc =3D amd_iommu_reserve_domain_unity_unmap(d, ctx, map)) ) + AMD_IOMMU_DEBUG(" Unable to unmap RMRR from d%dc%d for %pp (%d= )\n", + d->domain_id, prev_ctx->id, &pdev->sbdf, rc); =20 - amd_iommu_disable_domain_device(pdev->domain, iommu, devfn, pdev); + return ret; + } =20 - ivrs_mappings =3D get_ivrs_mappings(pdev->seg); - bdf =3D PCI_BDF(pdev->bus, devfn); + if ( (rc =3D amd_iommu_reserve_domain_unity_unmap(d, prev_ctx, map)) ) + AMD_IOMMU_DEBUG(" Unable to unmap previous RMRR for %pp (%d)\n", + &pdev->sbdf, rc); =20 - if ( amd_iommu_reserve_domain_unity_unmap( - pdev->domain, ctx, - ivrs_mappings[ivrs_mappings[bdf].dte_requestor_id].unity_map)= ) - AMD_IOMMU_WARN("%pd: unity unmapping failed for %pp\n", - pdev->domain, &PCI_SBDF(pdev->seg, bdf)); + return ret; +} =20 - amd_iommu_quarantine_teardown(pdev); +static void cf_check amd_iommu_clear_root_pgtable(struct domain *d, struct= iommu_context *ctx) +{ + ctx->arch.amd.root_table =3D NULL; +} =20 - if ( amd_iommu_perdev_intremap && - ivrs_mappings[bdf].dte_requestor_id =3D=3D bdf && - ivrs_mappings[bdf].intremap_table ) - amd_iommu_free_intremap_table(iommu, &ivrs_mappings[bdf], bdf); +static void cf_check amd_iommu_domain_destroy(struct domain *d) +{ + struct iommu_context *ctx =3D iommu_default_context(d); =20 - return 0; + iommu_identity_map_teardown(d, ctx); + ASSERT(!ctx->arch.amd.root_table); } =20 static int cf_check amd_iommu_group_id(u16 seg, u8 bus, u8 devfn) @@ -729,30 +683,45 @@ static void amd_dump_page_table_level(struct page_inf= o *pg, int level, =20 static void cf_check amd_dump_page_tables(struct domain *d) { - struct iommu_context *ctx =3D iommu_default_context(d); + unsigned int i; + struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx; =20 - if ( !ctx->arch.amd.root_table ) - return; + if (d =3D=3D dom_io) + printk("d[IO] page tables\n"); + else + printk("d%hu page tables\n", d->domain_id); + + for (i =3D 0; i < (1 + hd->other_contexts.count); ++i) + { + if ( (ctx =3D iommu_get_context(d, i)) ) + { + printk(" Context %d (%"PRI_mfn")\n", i, + mfn_x(page_to_mfn(ctx->arch.amd.root_table))); =20 - printk("AMD IOMMU %pd table has %u levels\n", d, ctx->arch.amd.paging_= mode); - amd_dump_page_table_level(ctx->arch.amd.root_table, - ctx->arch.amd.paging_mode, 0, 0); + amd_dump_page_table_level(ctx->arch.amd.root_table, + hd->arch.amd.paging_mode, 0, 0); + iommu_put_context(ctx); + } + } } =20 static const struct iommu_ops __initconst_cf_clobber _iommu_ops =3D { .page_sizes =3D PAGE_SIZE_4K | PAGE_SIZE_2M | PAGE_SIZE_1G, .init =3D amd_iommu_domain_init, .hwdom_init =3D amd_iommu_hwdom_init, - .quarantine_init =3D amd_iommu_quarantine_init, - .add_device =3D amd_iommu_add_device, - .remove_device =3D amd_iommu_remove_device, - .assign_device =3D amd_iommu_assign_device, + .context_init =3D amd_iommu_context_init, + .context_teardown =3D amd_iommu_context_teardown, + .attach =3D amd_iommu_attach, + .detach =3D amd_iommu_detach, + .reattach =3D amd_iommu_reattach, + .add_devfn =3D amd_iommu_add_devfn, + .remove_devfn =3D amd_iommu_remove_devfn, .teardown =3D amd_iommu_domain_destroy, .clear_root_pgtable =3D amd_iommu_clear_root_pgtable, .map_page =3D amd_iommu_map_page, .unmap_page =3D amd_iommu_unmap_page, .iotlb_flush =3D amd_iommu_flush_iotlb_pages, - .reassign_device =3D reassign_device, .get_device_group_id =3D amd_iommu_group_id, .enable_x2apic =3D iov_enable_xt, .update_ire_from_apic =3D amd_iommu_ioapic_update_ire, diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iomm= u.c index 32c5011820..feda2e390b 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -208,13 +208,15 @@ int iommu_domain_init(struct domain *d, unsigned int = opts) hd->node =3D NUMA_NO_NODE; #endif =20 + rspin_lock_init(&hd->default_ctx.lock); + ret =3D arch_iommu_domain_init(d); if ( ret ) return ret; =20 hd->platform_ops =3D iommu_get_ops(); ret =3D iommu_call(hd->platform_ops, init, d); - if ( ret || is_system_domain(d) ) + if ( ret || (is_system_domain(d) && d !=3D dom_io) ) return ret; =20 /* @@ -236,7 +238,17 @@ int iommu_domain_init(struct domain *d, unsigned int o= pts) =20 ASSERT(!(hd->need_sync && hd->hap_pt_share)); =20 - return 0; + rspin_lock(&hd->default_ctx.lock); + ret =3D iommu_context_init(d, &hd->default_ctx, 0, IOMMU_CONTEXT_INIT_= default); + rspin_unlock(&hd->default_ctx.lock); + + rwlock_init(&hd->other_contexts.lock); + hd->other_contexts.initialized =3D (atomic_t)ATOMIC_INIT(0); + hd->other_contexts.count =3D 0; + hd->other_contexts.bitmap =3D NULL; + hd->other_contexts.map =3D NULL; + + return ret; } =20 static void cf_check iommu_dump_page_tables(unsigned char key) @@ -249,14 +261,11 @@ static void cf_check iommu_dump_page_tables(unsigned = char key) =20 for_each_domain(d) { - if ( is_hardware_domain(d) || !is_iommu_enabled(d) ) + if ( !is_iommu_enabled(d) ) continue; =20 if ( iommu_use_hap_pt(d) ) - { printk("%pd sharing page tables\n", d); - continue; - } =20 iommu_vcall(dom_iommu(d)->platform_ops, dump_page_tables, d); } @@ -274,9 +283,13 @@ void __hwdom_init iommu_hwdom_init(struct domain *d) iommu_vcall(hd->platform_ops, hwdom_init, d); } =20 -static void iommu_teardown(struct domain *d) +void cf_check iommu_domain_destroy(struct domain *d) { struct domain_iommu *hd =3D dom_iommu(d); + struct pci_dev *pdev; + + if ( !is_iommu_enabled(d) ) + return; =20 /* * During early domain creation failure, we may reach here with the @@ -285,17 +298,68 @@ static void iommu_teardown(struct domain *d) if ( !hd->platform_ops ) return; =20 + /* Move all devices back to quarantine */ + /* TODO: Is it needed ? */ + for_each_pdev(d, pdev) + { + int rc =3D iommu_reattach_context(d, dom_io, pdev, 0); + + if ( rc ) + { + printk(XENLOG_WARNING "Unable to quarantine device %pp (%d)\n"= , &pdev->sbdf, rc); + pdev->broken =3D true; + } + else + pdev->domain =3D dom_io; + } + iommu_vcall(hd->platform_ops, teardown, d); + + arch_iommu_domain_destroy(d); } =20 -void iommu_domain_destroy(struct domain *d) -{ - if ( !is_iommu_enabled(d) ) - return; +bool cf_check iommu_check_context(struct domain *d, uint16_t ctx_id) { + struct domain_iommu *hd =3D dom_iommu(d); =20 - iommu_teardown(d); + if ( ctx_id =3D=3D 0 ) + return true; /* Default context always exist. */ =20 - arch_iommu_domain_destroy(d); + if ( (ctx_id - 1) >=3D hd->other_contexts.count ) + return false; /* out of bounds */ + + if ( ctx_id =3D=3D IOMMU_INVALID_CONTEXT_ID ) + return false; /* Invalid ID */ + + return test_bit(ctx_id - 1, hd->other_contexts.bitmap); +} + +struct iommu_context * cf_check iommu_get_context(struct domain *d, uint16= _t ctx_id) { + struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx; + + if ( !iommu_check_context(d, ctx_id) ) + return NULL; + + if (ctx_id =3D=3D 0) + ctx =3D &hd->default_ctx; + else + ctx =3D &hd->other_contexts.map[ctx_id - 1]; + + rspin_lock(&ctx->lock); + /* Check if the context is still valid at this point */ + if ( unlikely(!iommu_check_context(d, ctx_id)) ) + { + /* Context has been destroyed in between */ + rspin_unlock(&ctx->lock); + return NULL; + } + + return ctx; +} + +void cf_check iommu_put_context(struct iommu_context *ctx) +{ + rspin_unlock(&ctx->lock); } =20 static unsigned int mapping_order(const struct domain_iommu *hd, @@ -323,11 +387,11 @@ static unsigned int mapping_order(const struct domain= _iommu *hd, return order; } =20 -long iommu_map(struct domain *d, dfn_t dfn0, mfn_t mfn0, - unsigned long page_count, unsigned int flags, - unsigned int *flush_flags) +static long _iommu_map(struct domain *d, dfn_t dfn0, mfn_t mfn0, + unsigned long page_count, unsigned int flags, + unsigned int *flush_flags, struct iommu_context *ct= x) { - const struct domain_iommu *hd =3D dom_iommu(d); + struct domain_iommu *hd =3D dom_iommu(d); unsigned long i; unsigned int order, j =3D 0; int rc =3D 0; @@ -350,7 +414,7 @@ long iommu_map(struct domain *d, dfn_t dfn0, mfn_t mfn0, return i; =20 rc =3D iommu_call(hd->platform_ops, map_page, d, dfn, mfn, - flags | IOMMUF_order(order), flush_flags); + flags | IOMMUF_order(order), flush_flags, ctx); =20 if ( likely(!rc) ) continue; @@ -361,10 +425,10 @@ long iommu_map(struct domain *d, dfn_t dfn0, mfn_t mf= n0, d->domain_id, dfn_x(dfn), mfn_x(mfn), rc); =20 /* while statement to satisfy __must_check */ - while ( iommu_unmap(d, dfn0, i, 0, flush_flags) ) + while ( iommu_unmap(d, dfn0, i, 0, flush_flags, ctx->id) ) break; =20 - if ( !is_hardware_domain(d) ) + if ( !ctx->id && !is_hardware_domain(d) ) domain_crash(d); =20 break; @@ -375,43 +439,67 @@ long iommu_map(struct domain *d, dfn_t dfn0, mfn_t mf= n0, * page, flush everything and clear flush flags. */ if ( page_count > 1 && unlikely(rc) && - !iommu_iotlb_flush_all(d, *flush_flags) ) + !iommu_iotlb_flush_all(d, ctx->id, *flush_flags) ) *flush_flags =3D 0; =20 return rc; } =20 +long iommu_map(struct domain *d, dfn_t dfn0, mfn_t mfn0, + unsigned long page_count, unsigned int flags, + unsigned int *flush_flags, uint16_t ctx_id) +{ + struct iommu_context *ctx; + long ret; + + if ( !(ctx =3D iommu_get_context(d, ctx_id)) ) + return -ENOENT; + + ret =3D _iommu_map(d, dfn0, mfn0, page_count, flags, flush_flags, ctx); + + iommu_put_context(ctx); + + return ret; +} + int iommu_legacy_map(struct domain *d, dfn_t dfn, mfn_t mfn, unsigned long page_count, unsigned int flags) { + struct iommu_context *ctx; unsigned int flush_flags =3D 0; - int rc; + int rc =3D 0; =20 ASSERT(!(flags & IOMMUF_preempt)); - rc =3D iommu_map(d, dfn, mfn, page_count, flags, &flush_flags); =20 - if ( !this_cpu(iommu_dont_flush_iotlb) && !rc ) - rc =3D iommu_iotlb_flush(d, dfn, page_count, flush_flags); + ctx =3D iommu_get_context(d, 0); + + if ( !ctx->opaque ) + { + rc =3D iommu_map(d, dfn, mfn, page_count, flags, &flush_flags, 0); + + if ( !this_cpu(iommu_dont_flush_iotlb) && !rc ) + rc =3D iommu_iotlb_flush(d, dfn, page_count, flush_flags, 0); + } + + iommu_put_context(ctx); =20 return rc; } =20 -long iommu_unmap(struct domain *d, dfn_t dfn0, unsigned long page_count, - unsigned int flags, unsigned int *flush_flags) +static long _iommu_unmap(struct domain *d, dfn_t dfn0, unsigned long page_= count, + unsigned int flags, unsigned int *flush_flags, + struct iommu_context *ctx) { - const struct domain_iommu *hd =3D dom_iommu(d); + struct domain_iommu *hd =3D dom_iommu(d); unsigned long i; unsigned int order, j =3D 0; int rc =3D 0; - struct iommu_context *ctx; =20 if ( !is_iommu_enabled(d) ) return 0; =20 ASSERT(!(flags & ~IOMMUF_preempt)); =20 - ctx =3D iommu_default_context(d); - for ( i =3D 0; i < page_count; i +=3D 1UL << order ) { dfn_t dfn =3D dfn_add(dfn0, i); @@ -425,7 +513,8 @@ long iommu_unmap(struct domain *d, dfn_t dfn0, unsigned= long page_count, return i; =20 err =3D iommu_call(hd->platform_ops, unmap_page, d, dfn, - flags | IOMMUF_order(order), flush_flags); + flags | IOMMUF_order(order), flush_flags, + ctx); =20 if ( likely(!err) ) continue; @@ -438,7 +527,7 @@ long iommu_unmap(struct domain *d, dfn_t dfn0, unsigned= long page_count, if ( !rc ) rc =3D err; =20 - if ( !is_hardware_domain(d) ) + if ( !ctx->id && !is_hardware_domain(d) ) { domain_crash(d); break; @@ -450,41 +539,74 @@ long iommu_unmap(struct domain *d, dfn_t dfn0, unsign= ed long page_count, * page, flush everything and clear flush flags. */ if ( page_count > 1 && unlikely(rc) && - !iommu_iotlb_flush_all(d, *flush_flags) ) + !iommu_iotlb_flush_all(d, ctx->id, *flush_flags) ) *flush_flags =3D 0; =20 return rc; } =20 +long iommu_unmap(struct domain *d, dfn_t dfn0, unsigned long page_count, + unsigned int flags, unsigned int *flush_flags, + uint16_t ctx_id) +{ + struct iommu_context *ctx; + long ret; + + if ( !(ctx =3D iommu_get_context(d, ctx_id)) ) + return -ENOENT; + + ret =3D _iommu_unmap(d, dfn0, page_count, flags, flush_flags, ctx); + + iommu_put_context(ctx); + + return ret; +} + int iommu_legacy_unmap(struct domain *d, dfn_t dfn, unsigned long page_cou= nt) { unsigned int flush_flags =3D 0; - int rc =3D iommu_unmap(d, dfn, page_count, 0, &flush_flags); + struct iommu_context *ctx; + int rc =3D 0; =20 - if ( !this_cpu(iommu_dont_flush_iotlb) && !rc ) - rc =3D iommu_iotlb_flush(d, dfn, page_count, flush_flags); + ctx =3D iommu_get_context(d, 0); + + if ( !ctx->opaque ) + { + rc =3D iommu_unmap(d, dfn, page_count, 0, &flush_flags, 0); + + if ( !this_cpu(iommu_dont_flush_iotlb) && !rc ) + rc =3D iommu_iotlb_flush(d, dfn, page_count, flush_flags, 0); + } + + iommu_put_context(ctx); =20 return rc; } =20 int iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn, - unsigned int *flags) + unsigned int *flags, uint16_t ctx_id) { - const struct domain_iommu *hd =3D dom_iommu(d); + struct domain_iommu *hd =3D dom_iommu(d); struct iommu_context *ctx; + int ret; =20 if ( !is_iommu_enabled(d) || !hd->platform_ops->lookup_page ) return -EOPNOTSUPP; =20 - ctx =3D iommu_default_context(d); + if ( !(ctx =3D iommu_get_context(d, ctx_id)) ) + return -ENOENT; =20 - return iommu_call(hd->platform_ops, lookup_page, d, dfn, mfn, flags); + ret =3D iommu_call(hd->platform_ops, lookup_page, d, dfn, mfn, flags, = ctx); + + iommu_put_context(ctx); + return ret; } =20 int iommu_iotlb_flush(struct domain *d, dfn_t dfn, unsigned long page_coun= t, - unsigned int flush_flags) + unsigned int flush_flags, uint16_t ctx_id) { - const struct domain_iommu *hd =3D dom_iommu(d); + struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx; int rc; =20 if ( !is_iommu_enabled(d) || !hd->platform_ops->iotlb_flush || @@ -494,7 +616,10 @@ int iommu_iotlb_flush(struct domain *d, dfn_t dfn, uns= igned long page_count, if ( dfn_eq(dfn, INVALID_DFN) ) return -EINVAL; =20 - rc =3D iommu_call(hd->platform_ops, iotlb_flush, d, dfn, page_count, + if ( !(ctx =3D iommu_get_context(d, ctx_id)) ) + return -ENOENT; + + rc =3D iommu_call(hd->platform_ops, iotlb_flush, d, ctx, dfn, page_cou= nt, flush_flags); if ( unlikely(rc) ) { @@ -503,23 +628,29 @@ int iommu_iotlb_flush(struct domain *d, dfn_t dfn, un= signed long page_count, "d%d: IOMMU IOTLB flush failed: %d, dfn %"PRI_dfn", pag= e count %lu flags %x\n", d->domain_id, rc, dfn_x(dfn), page_count, flush_flags); =20 - if ( !is_hardware_domain(d) ) + if ( !ctx->id && !is_hardware_domain(d) ) domain_crash(d); } =20 + iommu_put_context(ctx); + return rc; } =20 -int iommu_iotlb_flush_all(struct domain *d, unsigned int flush_flags) +int iommu_iotlb_flush_all(struct domain *d, uint16_t ctx_id, unsigned int = flush_flags) { - const struct domain_iommu *hd =3D dom_iommu(d); + struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx; int rc; =20 if ( !is_iommu_enabled(d) || !hd->platform_ops->iotlb_flush || !flush_flags ) return 0; =20 - rc =3D iommu_call(hd->platform_ops, iotlb_flush, d, INVALID_DFN, 0, + if ( !(ctx =3D iommu_get_context(d, ctx_id)) ) + return -ENOENT; + + rc =3D iommu_call(hd->platform_ops, iotlb_flush, d, ctx, _dfn(0), 0, flush_flags | IOMMU_FLUSHF_all); if ( unlikely(rc) ) { @@ -532,21 +663,419 @@ int iommu_iotlb_flush_all(struct domain *d, unsigned= int flush_flags) domain_crash(d); } =20 + iommu_put_context(ctx); return rc; } =20 +int cf_check iommu_context_init(struct domain *d, struct iommu_context *ct= x, + uint16_t ctx_id, unsigned int flags) +{ + if ( !dom_iommu(d)->platform_ops->context_init ) + return -ENOSYS; + + INIT_LIST_HEAD(&ctx->devices); + ctx->id =3D ctx_id; + ctx->dying =3D false; + ctx->opaque =3D false; /* assume non-opaque by default */ + + return iommu_call(dom_iommu(d)->platform_ops, context_init, d, ctx, fl= ags); +} + +int iommu_context_alloc(struct domain *d, uint16_t *ctx_id, unsigned int f= lags) +{ + unsigned int i; + int ret; + struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx; + + do { + i =3D find_first_zero_bit(hd->other_contexts.bitmap, hd->other_con= texts.count); + + if ( i >=3D hd->other_contexts.count ) + return -ENOSPC; + + ctx =3D &hd->other_contexts.map[i]; + + /* Try to lock the mutex, can fail on concurrent accesses */ + if ( !rspin_trylock(&ctx->lock) ) + continue; + + /* We can now set it as used, we keep the lock for initialization.= */ + set_bit(i, hd->other_contexts.bitmap); + } while (0); + + *ctx_id =3D i + 1; + + ret =3D iommu_context_init(d, ctx, *ctx_id, flags); + + if ( ret ) + clear_bit(*ctx_id, hd->other_contexts.bitmap); + + iommu_put_context(ctx); + return ret; +} + +/** + * Attach dev phantom functions to ctx, override any existing + * mapped context. + */ +static int cf_check iommu_reattach_phantom(struct domain *d, device_t *dev, + struct iommu_context *ctx) +{ + int ret =3D 0; + uint8_t devfn =3D dev->devfn; + struct domain_iommu *hd =3D dom_iommu(d); + + while ( dev->phantom_stride ) + { + devfn +=3D dev->phantom_stride; + + if ( PCI_SLOT(devfn) !=3D PCI_SLOT(dev->devfn) ) + break; + + ret =3D iommu_call(hd->platform_ops, add_devfn, d, dev, devfn, ctx= ); + + if ( ret ) + break; + } + + return ret; +} + +/** + * Detach all device phantom functions. + */ +static int cf_check iommu_detach_phantom(struct domain *d, device_t *dev, + struct iommu_context *prev_ctx) +{ + int ret =3D 0; + uint8_t devfn =3D dev->devfn; + struct domain_iommu *hd =3D dom_iommu(d); + + while ( dev->phantom_stride ) + { + devfn +=3D dev->phantom_stride; + + if ( PCI_SLOT(devfn) !=3D PCI_SLOT(dev->devfn) ) + break; + + ret =3D iommu_call(hd->platform_ops, remove_devfn, d, dev, devfn, = prev_ctx); + + if ( ret ) + break; + } + + return ret; +} + +int cf_check iommu_attach_context(struct domain *d, device_t *dev, uint16_= t ctx_id) +{ + struct iommu_context *ctx =3D NULL; + int ret =3D 0, rc; + + if ( dev->context =3D=3D ctx_id ) + return 0; + + if ( !(ctx =3D iommu_get_context(d, ctx_id)) ) + { + ret =3D -ENOENT; + goto unlock; + } + + pcidevs_lock(); + + if ( ctx->dying ) + { + ret =3D -EINVAL; + goto unlock; + } + + /* ignore attach operations on PCIe bridges */ + if ( dev->type !=3D DEV_TYPE_PCIe_BRIDGE ) + ret =3D iommu_call(dom_iommu(d)->platform_ops, attach, d, dev, ctx= ); + + if ( ret ) + goto unlock; + + /* See iommu_reattach_context() */ + rc =3D iommu_reattach_phantom(d, dev, ctx); + + if ( rc ) + { + printk(XENLOG_ERR "IOMMU: Unable to attach %pp phantom functions\n= ", + &dev->sbdf); + + if( iommu_call(dom_iommu(d)->platform_ops, detach, d, dev, ctx) + || iommu_detach_phantom(d, dev, ctx) ) + { + printk(XENLOG_ERR "IOMMU: Improperly detached %pp\n", &dev->sb= df); + WARN(); + } + + ret =3D -EIO; + goto unlock; + } + + dev->context =3D ctx_id; + list_add(&dev->context_list, &ctx->devices); + +unlock: + pcidevs_unlock(); + + if ( ctx ) + iommu_put_context(ctx); + + return ret; +} + +int cf_check iommu_detach_context(struct domain *d, device_t *dev) +{ + struct iommu_context *ctx; + int ret =3D 0, rc; + + if ( !dev->domain || dev->context =3D=3D IOMMU_INVALID_CONTEXT_ID ) + { + printk(XENLOG_WARNING "IOMMU: Trying to detach a non-attached devi= ce\n"); + WARN(); + return 0; + } + + /* Make sure device is actually in the domain. */ + ASSERT(d =3D=3D dev->domain); + + pcidevs_lock(); + + ctx =3D iommu_get_context(d, dev->context); + ASSERT(ctx); /* device is using an invalid context ? + dev->context invalid ? */ + + /* ignore detach operations on PCIe bridges */ + if ( dev->type !=3D DEV_TYPE_PCIe_BRIDGE ) + ret =3D iommu_call(dom_iommu(d)->platform_ops, detach, d, dev, ctx= ); + + if ( ret ) + goto unlock; + + rc =3D iommu_detach_phantom(d, dev, ctx); + + if ( rc ) + printk(XENLOG_WARNING "IOMMU: " + "Improperly detached device functions (%d)\n", rc); + + list_del(&dev->context_list); + +unlock: + pcidevs_unlock(); + iommu_put_context(ctx); + return ret; +} + +int cf_check iommu_reattach_context(struct domain *prev_dom, struct domain= *next_dom, + device_t *dev, uint16_t ctx_id) +{ + uint16_t prev_ctx_id; + device_t *ctx_dev; + struct domain_iommu *prev_hd, *next_hd; + struct iommu_context *prev_ctx =3D NULL, *next_ctx =3D NULL; + int ret =3D 0, rc; + bool same_domain; + + /* Make sure we actually are doing something meaningful */ + BUG_ON(!prev_dom && !next_dom); + + /* Device domain must be coherent with prev_dom. */ + ASSERT(!prev_dom || dev->domain =3D=3D prev_dom); + + /// TODO: Do such cases exists ? + // /* Platform ops must match */ + // if (dom_iommu(prev_dom)->platform_ops !=3D dom_iommu(next_dom)->pla= tform_ops) + // return -EINVAL; + + if ( !prev_dom ) + return iommu_attach_context(next_dom, dev, ctx_id); + + if ( !next_dom ) + return iommu_detach_context(prev_dom, dev); + + prev_hd =3D dom_iommu(prev_dom); + next_hd =3D dom_iommu(next_dom); + + pcidevs_lock(); + + same_domain =3D prev_dom =3D=3D next_dom; + + prev_ctx_id =3D dev->context; + + if ( same_domain && (ctx_id =3D=3D prev_ctx_id) ) + { + printk(XENLOG_DEBUG + "IOMMU: Reattaching %pp to same IOMMU context c%hu\n", + &dev->sbdf, ctx_id); + ret =3D 0; + goto unlock; + } + + if ( !(prev_ctx =3D iommu_get_context(prev_dom, prev_ctx_id)) ) + { + ret =3D -ENOENT; + goto unlock; + } + + if ( !(next_ctx =3D iommu_get_context(next_dom, ctx_id)) ) + { + ret =3D -ENOENT; + goto unlock; + } + + if ( next_ctx->dying ) + { + ret =3D -EINVAL; + goto unlock; + } + + /* ignore reattach operations on PCIe bridges */ + if ( dev->type !=3D DEV_TYPE_PCIe_BRIDGE ) + ret =3D iommu_call(prev_hd->platform_ops, reattach, next_dom, dev, + prev_ctx, next_ctx); + + if ( ret ) + goto unlock; + + /* + * We need to do special handling for phantom devices as they + * also use some other PCI functions behind the scenes. + */ + rc =3D iommu_reattach_phantom(next_dom, dev, next_ctx); + + if ( rc ) + { + /** + * Device is being partially reattached (we have primary function = and + * maybe some phantom functions attached to next_ctx, some others = to prev_ctx), + * some functions of the device will be attached to next_ctx. + */ + printk(XENLOG_WARNING "IOMMU: " + "Device %pp improperly reattached due to phantom function" + " reattach failure between %dd%dc and %dd%dc (%d)\n", dev, + prev_dom->domain_id, prev_ctx->id, next_dom->domain_id, + next_dom->domain_id, rc); + + /* Try reattaching to previous context, reverting into a consisten= t state. */ + if ( iommu_call(prev_hd->platform_ops, reattach, prev_dom, dev, ne= xt_ctx, + prev_ctx) || iommu_reattach_phantom(prev_dom, dev,= prev_ctx) ) + { + printk(XENLOG_ERR "Unable to reattach %pp back to %dd%dc\n", + &dev->sbdf, prev_dom->domain_id, prev_ctx->id); + + if ( !is_hardware_domain(prev_dom) ) + domain_crash(prev_dom); + + if ( prev_dom !=3D next_dom && !is_hardware_domain(next_dom) ) + domain_crash(next_dom); + + rc =3D -EIO; + } + + ret =3D rc; + goto unlock; + } + + /* Remove device from previous context, and add it to new one. */ + list_for_each_entry(ctx_dev, &prev_ctx->devices, context_list) + { + if ( ctx_dev =3D=3D dev ) + { + list_del(&ctx_dev->context_list); + list_add(&ctx_dev->context_list, &next_ctx->devices); + break; + } + } + + if (!ret) + dev->context =3D ctx_id; /* update device context*/ + +unlock: + pcidevs_unlock(); + + if ( prev_ctx ) + iommu_put_context(prev_ctx); + + if ( next_ctx ) + iommu_put_context(next_ctx); + + return ret; +} + +int cf_check iommu_context_teardown(struct domain *d, struct iommu_context= *ctx, u32 flags) +{ + struct domain_iommu *hd =3D dom_iommu(d); + + if ( !hd->platform_ops->context_teardown ) + return -ENOSYS; + + ctx->dying =3D true; + + /* first reattach devices back to default context if needed */ + if ( flags & IOMMU_TEARDOWN_REATTACH_DEFAULT ) + { + struct pci_dev *device; + list_for_each_entry(device, &ctx->devices, context_list) + iommu_reattach_context(d, d, device, 0); + } + else if (!list_empty(&ctx->devices)) + return -EBUSY; /* there is a device in context */ + + return iommu_call(hd->platform_ops, context_teardown, d, ctx, flags); +} + +int cf_check iommu_context_free(struct domain *d, uint16_t ctx_id, u32 fla= gs) +{ + int ret; + struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx; + + if ( ctx_id =3D=3D 0 ) + return -EINVAL; + + if ( !(ctx =3D iommu_get_context(d, ctx_id)) ) + return -ENOENT; + + ret =3D iommu_context_teardown(d, ctx, flags); + + if ( !ret ) + clear_bit(ctx_id - 1, hd->other_contexts.bitmap); + + iommu_put_context(ctx); + return ret; +} + int iommu_quarantine_dev_init(device_t *dev) { - const struct domain_iommu *hd =3D dom_iommu(dom_io); + int ret; + uint16_t ctx_id; =20 - if ( !iommu_quarantine || !hd->platform_ops->quarantine_init ) + if ( !iommu_quarantine ) return 0; =20 - return iommu_call(hd->platform_ops, quarantine_init, - dev, iommu_quarantine =3D=3D IOMMU_quarantine_scratc= h_page); + ret =3D iommu_context_alloc(dom_io, &ctx_id, IOMMU_CONTEXT_INIT_quaran= tine); + + if ( ret ) + return ret; + + /** TODO: Setup scratch page, mappings... */ + + ret =3D iommu_reattach_context(dev->domain, dom_io, dev, ctx_id); + + if ( ret ) + { + ASSERT(!iommu_context_free(dom_io, ctx_id, 0)); + return ret; + } + + return ret; } =20 -static int __init iommu_quarantine_init(void) +int __init iommu_quarantine_init(void) { dom_io->options |=3D XEN_DOMCTL_CDF_iommu; =20 diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index ee73d55740..2d9db79787 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -330,6 +330,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg,= u8 bus, u8 devfn) *((u8*) &pdev->bus) =3D bus; *((u8*) &pdev->devfn) =3D devfn; pdev->domain =3D NULL; + pdev->context =3D IOMMU_INVALID_CONTEXT_ID; =20 INIT_LIST_HEAD(&pdev->vf_list); =20 @@ -604,7 +605,6 @@ static void pci_enable_acs(struct pci_dev *pdev) } =20 static int iommu_add_device(struct pci_dev *pdev); -static int iommu_enable_device(struct pci_dev *pdev); static int iommu_remove_device(struct pci_dev *pdev); =20 unsigned int pci_size_mem_bar(pci_sbdf_t sbdf, unsigned int pos, @@ -654,6 +654,101 @@ unsigned int pci_size_mem_bar(pci_sbdf_t sbdf, unsign= ed int pos, return is64bits ? 2 : 1; } =20 +static int device_assigned(struct pci_dev *pdev) +{ + int rc =3D 0; + + /* + * If the device exists and it is not owned by either the hardware + * domain or dom_io then it must be assigned to a guest, or be + * hidden (owned by dom_xen). + */ + if ( pdev->domain !=3D hardware_domain && pdev->domain !=3D dom_io ) + rc =3D -EBUSY; + + return rc; +} + +/* Caller should hold the pcidevs_lock */ +static int pci_reassign_device(struct domain *prev_dom, struct domain *nex= t_dom, + struct pci_dev *pdev, u32 flag) +{ + int rc =3D 0; + ASSERT(prev_dom || next_dom); + + if ( !is_iommu_enabled(next_dom) ) + return -EINVAL; + + if ( !arch_iommu_use_permitted(next_dom) ) + return -EXDEV; + + /* Do not allow broken devices to be assigned to guests. */ + if ( pdev->broken && next_dom !=3D hardware_domain && next_dom !=3D do= m_io ) + return -EBADF; + + if ( prev_dom ) + { + write_lock(&prev_dom->pci_lock); + vpci_deassign_device(pdev); + write_unlock(&prev_dom->pci_lock); + } + + rc =3D pdev_msix_assign(next_dom, pdev); + if ( rc ) + goto done; + + pdev->fault.count =3D 0; + + if ( prev_dom && next_dom ) + { + printk(XENLOG_INFO "PCI: Reassigning PCI device from %dd to %dd\n", + prev_dom->domain_id, next_dom->domain_id); + } + else if ( prev_dom ) + { + printk(XENLOG_INFO "PCI: Assigning PCI device to %dd\n", prev_dom-= >domain_id); + } + else if ( next_dom ) + { + printk(XENLOG_INFO "PCI: Remove PCI device of %dd\n", next_dom->do= main_id); + } + else + { + ASSERT_UNREACHABLE(); + } + + rc =3D iommu_reattach_context(prev_dom, next_dom, pci_to_dev(pdev), 0); + + if ( rc ) + goto done; + + if ( prev_dom ) + { + write_lock(&prev_dom->pci_lock); + list_del(&pdev->domain_list); + write_unlock(&prev_dom->pci_lock); + } + + pdev->domain =3D next_dom; + + if ( next_dom ) + { + write_lock(&next_dom->pci_lock); + list_add(&pdev->domain_list, &next_dom->pdev_list); + + rc =3D vpci_assign_device(pdev); + write_unlock(&next_dom->pci_lock); + } + + done: + + /* The device is assigned to dom_io so mark it as quarantined */ + if ( !rc && next_dom =3D=3D dom_io ) + pdev->quarantine =3D true; + + return rc; +} + int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info, nodeid_t node) { @@ -798,7 +893,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, } } else - iommu_enable_device(pdev); + ret =3D iommu_add_device(pdev); =20 pci_enable_acs(pdev); =20 @@ -878,74 +973,6 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn) return ret; } =20 -/* Caller should hold the pcidevs_lock */ -static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus, - uint8_t devfn) -{ - const struct domain_iommu *hd =3D dom_iommu(d); - struct pci_dev *pdev; - struct domain *target; - int ret =3D 0; - - if ( !is_iommu_enabled(d) ) - return -EINVAL; - - ASSERT(pcidevs_locked()); - pdev =3D pci_get_pdev(d, PCI_SBDF(seg, bus, devfn)); - if ( !pdev ) - return -ENODEV; - - /* De-assignment from dom_io should de-quarantine the device */ - if ( (pdev->quarantine || iommu_quarantine) && pdev->domain !=3D dom_i= o ) - { - ret =3D iommu_quarantine_dev_init(pci_to_dev(pdev)); - if ( ret ) - return ret; - - target =3D dom_io; - } - else - target =3D hardware_domain; - - while ( pdev->phantom_stride ) - { - devfn +=3D pdev->phantom_stride; - if ( PCI_SLOT(devfn) !=3D PCI_SLOT(pdev->devfn) ) - break; - ret =3D iommu_call(hd->platform_ops, reassign_device, d, target, d= evfn, - pci_to_dev(pdev)); - if ( ret ) - goto out; - } - - write_lock(&d->pci_lock); - vpci_deassign_device(pdev); - write_unlock(&d->pci_lock); - - devfn =3D pdev->devfn; - ret =3D iommu_call(hd->platform_ops, reassign_device, d, target, devfn, - pci_to_dev(pdev)); - if ( ret ) - goto out; - - if ( pdev->domain =3D=3D hardware_domain ) - pdev->quarantine =3D false; - - pdev->fault.count =3D 0; - - write_lock(&target->pci_lock); - /* Re-assign back to hardware_domain */ - ret =3D vpci_assign_device(pdev); - write_unlock(&target->pci_lock); - - out: - if ( ret ) - printk(XENLOG_G_ERR "%pd: deassign (%pp) failed (%d)\n", - d, &PCI_SBDF(seg, bus, devfn), ret); - - return ret; -} - int pci_release_devices(struct domain *d) { int combined_ret; @@ -967,13 +994,10 @@ int pci_release_devices(struct domain *d) struct pci_dev *pdev =3D list_first_entry(&d->pdev_list, struct pci_dev, domain_list); - uint16_t seg =3D pdev->seg; - uint8_t bus =3D pdev->bus; - uint8_t devfn =3D pdev->devfn; int ret; =20 write_unlock(&d->pci_lock); - ret =3D deassign_device(d, seg, bus, devfn); + ret =3D pci_reassign_device(d, dom_io, pdev, 0); write_lock(&d->pci_lock); if ( ret ) { @@ -1181,25 +1205,18 @@ struct setup_hwdom { static void __hwdom_init setup_one_hwdom_device(const struct setup_hwdom *= ctxt, struct pci_dev *pdev) { - u8 devfn =3D pdev->devfn; int err; =20 - do { - err =3D ctxt->handler(devfn, pdev); - if ( err ) - { - printk(XENLOG_ERR "setup %pp for d%d failed (%d)\n", - &pdev->sbdf, ctxt->d->domain_id, err); - if ( devfn =3D=3D pdev->devfn ) - return; - } - devfn +=3D pdev->phantom_stride; - } while ( devfn !=3D pdev->devfn && - PCI_SLOT(devfn) =3D=3D PCI_SLOT(pdev->devfn) ); + err =3D ctxt->handler(pdev->devfn, pdev); + + if ( err ) + goto done; =20 write_lock(&ctxt->d->pci_lock); err =3D vpci_assign_device(pdev); write_unlock(&ctxt->d->pci_lock); + +done: if ( err ) printk(XENLOG_ERR "setup of vPCI for d%d failed: %d\n", ctxt->d->domain_id, err); @@ -1398,8 +1415,6 @@ __initcall(setup_dump_pcidevs); static int iommu_add_device(struct pci_dev *pdev) { const struct domain_iommu *hd; - int rc; - unsigned int devfn =3D pdev->devfn; =20 if ( !pdev->domain ) return -EINVAL; @@ -1410,180 +1425,18 @@ static int iommu_add_device(struct pci_dev *pdev) if ( !is_iommu_enabled(pdev->domain) ) return 0; =20 - rc =3D iommu_call(hd->platform_ops, add_device, devfn, pci_to_dev(pdev= )); - if ( rc || !pdev->phantom_stride ) - return rc; - - for ( ; ; ) - { - devfn +=3D pdev->phantom_stride; - if ( PCI_SLOT(devfn) !=3D PCI_SLOT(pdev->devfn) ) - return 0; - rc =3D iommu_call(hd->platform_ops, add_device, devfn, pci_to_dev(= pdev)); - if ( rc ) - printk(XENLOG_WARNING "IOMMU: add %pp failed (%d)\n", - &PCI_SBDF(pdev->seg, pdev->bus, devfn), rc); - } -} - -static int iommu_enable_device(struct pci_dev *pdev) -{ - const struct domain_iommu *hd; - - if ( !pdev->domain ) - return -EINVAL; - - ASSERT(pcidevs_locked()); - - hd =3D dom_iommu(pdev->domain); - if ( !is_iommu_enabled(pdev->domain) || - !hd->platform_ops->enable_device ) - return 0; - - return iommu_call(hd->platform_ops, enable_device, pci_to_dev(pdev)); + return iommu_attach_context(pdev->domain, pci_to_dev(pdev), 0); } =20 static int iommu_remove_device(struct pci_dev *pdev) { - const struct domain_iommu *hd; - u8 devfn; - if ( !pdev->domain ) return -EINVAL; =20 - hd =3D dom_iommu(pdev->domain); if ( !is_iommu_enabled(pdev->domain) ) return 0; =20 - for ( devfn =3D pdev->devfn ; pdev->phantom_stride; ) - { - int rc; - - devfn +=3D pdev->phantom_stride; - if ( PCI_SLOT(devfn) !=3D PCI_SLOT(pdev->devfn) ) - break; - rc =3D iommu_call(hd->platform_ops, remove_device, devfn, - pci_to_dev(pdev)); - if ( !rc ) - continue; - - printk(XENLOG_ERR "IOMMU: remove %pp failed (%d)\n", - &PCI_SBDF(pdev->seg, pdev->bus, devfn), rc); - return rc; - } - - devfn =3D pdev->devfn; - - return iommu_call(hd->platform_ops, remove_device, devfn, pci_to_dev(p= dev)); -} - -static int device_assigned(u16 seg, u8 bus, u8 devfn) -{ - struct pci_dev *pdev; - int rc =3D 0; - - ASSERT(pcidevs_locked()); - pdev =3D pci_get_pdev(NULL, PCI_SBDF(seg, bus, devfn)); - - if ( !pdev ) - rc =3D -ENODEV; - /* - * If the device exists and it is not owned by either the hardware - * domain or dom_io then it must be assigned to a guest, or be - * hidden (owned by dom_xen). - */ - else if ( pdev->domain !=3D hardware_domain && - pdev->domain !=3D dom_io ) - rc =3D -EBUSY; - - return rc; -} - -/* Caller should hold the pcidevs_lock */ -static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 = flag) -{ - const struct domain_iommu *hd =3D dom_iommu(d); - struct pci_dev *pdev; - int rc =3D 0; - - if ( !is_iommu_enabled(d) ) - return 0; - - if ( !arch_iommu_use_permitted(d) ) - return -EXDEV; - - /* device_assigned() should already have cleared the device for assign= ment */ - ASSERT(pcidevs_locked()); - pdev =3D pci_get_pdev(NULL, PCI_SBDF(seg, bus, devfn)); - ASSERT(pdev && (pdev->domain =3D=3D hardware_domain || - pdev->domain =3D=3D dom_io)); - - /* Do not allow broken devices to be assigned to guests. */ - rc =3D -EBADF; - if ( pdev->broken && d !=3D hardware_domain && d !=3D dom_io ) - goto done; - - write_lock(&pdev->domain->pci_lock); - vpci_deassign_device(pdev); - write_unlock(&pdev->domain->pci_lock); - - rc =3D pdev_msix_assign(d, pdev); - if ( rc ) - goto done; - - if ( pdev->domain !=3D dom_io ) - { - rc =3D iommu_quarantine_dev_init(pci_to_dev(pdev)); - if ( rc ) - goto done; - } - - pdev->fault.count =3D 0; - - rc =3D iommu_call(hd->platform_ops, assign_device, d, devfn, pci_to_de= v(pdev), - flag); - - while ( pdev->phantom_stride && !rc ) - { - devfn +=3D pdev->phantom_stride; - if ( PCI_SLOT(devfn) !=3D PCI_SLOT(pdev->devfn) ) - break; - rc =3D iommu_call(hd->platform_ops, assign_device, d, devfn, - pci_to_dev(pdev), flag); - } - - if ( rc ) - goto done; - - write_lock(&d->pci_lock); - rc =3D vpci_assign_device(pdev); - write_unlock(&d->pci_lock); - - done: - if ( rc ) - { - printk(XENLOG_G_WARNING "%pd: assign %s(%pp) failed (%d)\n", - d, devfn !=3D pdev->devfn ? "phantom function " : "", - &PCI_SBDF(seg, bus, devfn), rc); - - if ( devfn !=3D pdev->devfn && deassign_device(d, seg, bus, pdev->= devfn) ) - { - /* - * Device with phantom functions that failed to both assign and - * rollback. Mark the device as broken and crash the target d= omain, - * as the state of the functions at this point is unknown and = Xen - * has no way to assert consistent context assignment among th= em. - */ - pdev->broken =3D true; - if ( !is_hardware_domain(d) && d !=3D dom_io ) - domain_crash(d); - } - } - /* The device is assigned to dom_io so mark it as quarantined */ - else if ( d =3D=3D dom_io ) - pdev->quarantine =3D true; - - return rc; + return iommu_detach_context(pdev->domain, pdev); } =20 static int iommu_get_device_group( @@ -1673,6 +1526,7 @@ int iommu_do_pci_domctl( u8 bus, devfn; int ret =3D 0; uint32_t machine_sbdf; + struct pci_dev *pdev; =20 switch ( domctl->cmd ) { @@ -1742,7 +1596,15 @@ int iommu_do_pci_domctl( devfn =3D PCI_DEVFN(machine_sbdf); =20 pcidevs_lock(); - ret =3D device_assigned(seg, bus, devfn); + pdev =3D pci_get_pdev(NULL, PCI_SBDF(seg, bus, devfn)); + + if ( !pdev ) + { + printk(XENLOG_G_INFO "%pp doesn't exist", &PCI_SBDF(seg, bus, = devfn)); + break; + } + + ret =3D device_assigned(pdev); if ( domctl->cmd =3D=3D XEN_DOMCTL_test_assign_device ) { if ( ret ) @@ -1753,7 +1615,7 @@ int iommu_do_pci_domctl( } } else if ( !ret ) - ret =3D assign_device(d, seg, bus, devfn, flags); + ret =3D pci_reassign_device(pdev->domain, d, pdev, flags); pcidevs_unlock(); if ( ret =3D=3D -ERESTART ) ret =3D hypercall_create_continuation(__HYPERVISOR_domctl, @@ -1787,7 +1649,20 @@ int iommu_do_pci_domctl( devfn =3D PCI_DEVFN(machine_sbdf); =20 pcidevs_lock(); - ret =3D deassign_device(d, seg, bus, devfn); + pdev =3D pci_get_pdev(d, PCI_SBDF(seg, bus, devfn)); + + if ( pdev ) + { + struct domain *target =3D hardware_domain; + + if ( (pdev->quarantine || iommu_quarantine) && pdev->domain != =3D dom_io ) + target =3D dom_io; + + ret =3D pci_reassign_device(d, target, pdev, 0); + } + else + ret =3D -ENODEV; + pcidevs_unlock(); break; =20 diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough= /vtd/extern.h index 82db8f9435..a980be3646 100644 --- a/xen/drivers/passthrough/vtd/extern.h +++ b/xen/drivers/passthrough/vtd/extern.h @@ -78,12 +78,12 @@ uint64_t alloc_pgtable_maddr(unsigned long npages, node= id_t node); void free_pgtable_maddr(u64 maddr); void *map_vtd_domain_page(u64 maddr); void unmap_vtd_domain_page(const void *va); -int domain_context_mapping_one(struct domain *domain, struct iommu_context= *ctx, - struct vtd_iommu *iommu, uint8_t bus, uint8= _t devfn, - const struct pci_dev *pdev, domid_t domid, - paddr_t pgd_maddr, unsigned int mode); -int domain_context_unmap_one(struct domain *domain, struct vtd_iommu *iomm= u, - uint8_t bus, uint8_t devfn); +int apply_context_single(struct domain *domain, struct iommu_context *ctx, + struct vtd_iommu *iommu, uint8_t bus, uint8_t dev= fn, + struct iommu_context *prev_ctx); +int unapply_context_single(struct domain *domain, struct vtd_iommu *iommu, + struct iommu_context *prev_ctx, uint8_t bus, + uint8_t devfn); int cf_check intel_iommu_get_reserved_device_memory( iommu_grdm_t *func, void *ctxt); =20 @@ -104,8 +104,9 @@ void platform_quirks_init(void); void vtd_ops_preamble_quirk(struct vtd_iommu *iommu); void vtd_ops_postamble_quirk(struct vtd_iommu *iommu); int __must_check me_wifi_quirk(struct domain *domain, uint8_t bus, - uint8_t devfn, domid_t domid, paddr_t pgd_m= addr, - unsigned int mode); + uint8_t devfn, domid_t domid, + unsigned int mode, struct iommu_context *ct= x, + struct iommu_context *prev_ctx); void pci_vtd_quirk(const struct pci_dev *); void quirk_iommu_caps(struct vtd_iommu *iommu); =20 diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/= vtd/iommu.c index 3668185ebc..3319903297 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -61,7 +62,6 @@ static unsigned int __ro_after_init min_pt_levels =3D UIN= T_MAX; static struct tasklet vtd_fault_tasklet; =20 static int cf_check setup_hwdom_device(u8 devfn, struct pci_dev *); -static void setup_hwdom_rmrr(struct domain *d); =20 #define DID_FIELD_WIDTH 16 #define DID_HIGH_OFFSET 8 @@ -165,7 +165,7 @@ static uint64_t addr_to_dma_page_maddr(struct domain *d= omain, u64 pte_maddr =3D 0; =20 addr &=3D (((u64)1) << addr_width) - 1; - ASSERT(spin_is_locked(&ctx->arch.mapping_lock)); + ASSERT(rspin_is_locked(&ctx->lock)); ASSERT(target || !alloc); =20 if ( !ctx->arch.vtd.pgd_maddr ) @@ -270,36 +270,22 @@ static uint64_t addr_to_dma_page_maddr(struct domain = *domain, return pte_maddr; } =20 -static paddr_t domain_pgd_maddr(struct domain *d, struct iommu_context *ct= x, - paddr_t pgd_maddr, unsigned int nr_pt_leve= ls) +static paddr_t get_context_pgd(struct domain *d, struct iommu_context *ctx, + unsigned int nr_pt_levels) { unsigned int agaw; + paddr_t pgd_maddr =3D ctx->arch.vtd.pgd_maddr; =20 - ASSERT(spin_is_locked(&ctx->arch.mapping_lock)); - - if ( pgd_maddr ) - /* nothing */; - else if ( iommu_use_hap_pt(d) ) + if ( !ctx->arch.vtd.pgd_maddr ) { - pagetable_t pgt =3D p2m_get_pagetable(p2m_get_hostp2m(d)); + /* + * Ensure we have pagetables allocated down to the smallest + * level the loop below may need to run to. + */ + addr_to_dma_page_maddr(d, ctx, 0, min_pt_levels, NULL, true); =20 - pgd_maddr =3D pagetable_get_paddr(pgt); - } - else - { if ( !ctx->arch.vtd.pgd_maddr ) - { - /* - * Ensure we have pagetables allocated down to the smallest - * level the loop below may need to run to. - */ - addr_to_dma_page_maddr(d, ctx, 0, min_pt_levels, NULL, true); - - if ( !ctx->arch.vtd.pgd_maddr ) - return 0; - } - - pgd_maddr =3D ctx->arch.vtd.pgd_maddr; + return 0; } =20 /* Skip top level(s) of page tables for less-than-maximum level DRHDs.= */ @@ -569,17 +555,20 @@ static int __must_check iommu_flush_all(void) return rc; } =20 -static int __must_check cf_check iommu_flush_iotlb(struct domain *d, dfn_t= dfn, +static int __must_check cf_check iommu_flush_iotlb(struct domain *d, + struct iommu_context *c= tx, + dfn_t dfn, unsigned long page_coun= t, unsigned int flush_flag= s) { - struct iommu_context *ctx =3D iommu_default_context(d); struct acpi_drhd_unit *drhd; struct vtd_iommu *iommu; bool flush_dev_iotlb; int iommu_domid; int ret =3D 0; =20 + ASSERT(ctx); + if ( flush_flags & IOMMU_FLUSHF_all ) { dfn =3D INVALID_DFN; @@ -1240,7 +1229,8 @@ void __init iommu_free(struct acpi_drhd_unit *drhd) agaw =3D 64; \ agaw; }) =20 -static int cf_check intel_iommu_context_init(struct domain *d, struct iomm= u_context *ctx) +static int cf_check intel_iommu_context_init(struct domain *d, struct iomm= u_context *ctx, + u32 flags) { struct acpi_drhd_unit *drhd; =20 @@ -1255,6 +1245,27 @@ static int cf_check intel_iommu_context_init(struct = domain *d, struct iommu_cont return -ENOMEM; } =20 + ctx->arch.vtd.superpage_progress =3D 0; + + if ( flags & IOMMU_CONTEXT_INIT_default ) + { + ctx->arch.vtd.pgd_maddr =3D 0; + + /* + * Context is considered "opaque" (non-managed) in these cases : + * - HAP is enabled, in this case, the pagetable is not managed b= y the + * IOMMU code, thus opaque + * - IOMMU is in passthrough which means that there is no actual = pagetable + */ + if ( iommu_use_hap_pt(d) ) + { + pagetable_t pgt =3D p2m_get_pagetable(p2m_get_hostp2m(d)); + ctx->arch.vtd.pgd_maddr =3D pagetable_get_paddr(pgt); + + ctx->opaque =3D true; + } + } + // TODO: Allocate IOMMU domid only when attaching devices ? /* Populate context DID map using pseudo DIDs */ for_each_drhd_unit(drhd) @@ -1263,7 +1274,11 @@ static int cf_check intel_iommu_context_init(struct = domain *d, struct iommu_cont iommu_alloc_domid(drhd->iommu->domid_bitmap); } =20 - return arch_iommu_context_init(d, ctx, 0); + if ( !ctx->opaque ) + /* Create initial context page */ + addr_to_dma_page_maddr(d, ctx, 0, min_pt_levels, NULL, true); + + return arch_iommu_context_init(d, ctx, flags); } =20 static int cf_check intel_iommu_domain_init(struct domain *d) @@ -1272,7 +1287,7 @@ static int cf_check intel_iommu_domain_init(struct do= main *d) =20 hd->arch.vtd.agaw =3D width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH); =20 - return intel_iommu_context_init(d, iommu_default_context(d)); + return 0; } =20 static void __hwdom_init cf_check intel_iommu_hwdom_init(struct domain *d) @@ -1280,7 +1295,7 @@ static void __hwdom_init cf_check intel_iommu_hwdom_i= nit(struct domain *d) struct acpi_drhd_unit *drhd; =20 setup_hwdom_pci_devices(d, setup_hwdom_device); - setup_hwdom_rmrr(d); + /* Make sure workarounds are applied before enabling the IOMMU(s). */ arch_iommu_hwdom_init(d); =20 @@ -1297,21 +1312,17 @@ static void __hwdom_init cf_check intel_iommu_hwdom= _init(struct domain *d) } } =20 -/* - * This function returns - * - a negative errno value upon error, - * - zero upon success when previously the entry was non-present, or this = isn't - * the "main" request for a device (pdev =3D=3D NULL), or for no-op quar= antining - * assignments, - * - positive (one) upon success when previously the entry was present and= this - * is the "main" request for a device (pdev !=3D NULL). +/** + * Apply a context on a device. + * @param domain Domain of the context + * @param ctx IOMMU context to apply + * @param iommu IOMMU hardware to use (must match device iommu) + * @param bus PCI device bus + * @param devfn PCI device function */ -int domain_context_mapping_one( - struct domain *domain, - struct iommu_context *ctx, - struct vtd_iommu *iommu, - uint8_t bus, uint8_t devfn, const struct pci_dev *pdev, - domid_t domid, paddr_t pgd_maddr, unsigned int mode) +int apply_context_single(struct domain *domain, struct iommu_context *ctx, + struct vtd_iommu *iommu, uint8_t bus, uint8_t dev= fn, + struct iommu_context *prev_ctx) { struct context_entry *context, *context_entries, lctxt; __uint128_t res, old; @@ -1320,8 +1331,6 @@ int domain_context_mapping_one( int rc, ret; bool flush_dev_iotlb, overwrite_entry =3D false; =20 - struct iommu_context *prev_ctx =3D pdev->domain ? iommu_default_contex= t(pdev->domain) : NULL; - ASSERT(pcidevs_locked()); spin_lock(&iommu->lock); maddr =3D bus_to_context_maddr(iommu, bus); @@ -1342,7 +1351,7 @@ int domain_context_mapping_one( overwrite_entry =3D true; } =20 - if ( iommu_hwdom_passthrough && is_hardware_domain(domain) ) + if ( iommu_hwdom_passthrough && is_hardware_domain(domain) && !ctx->id= ) { context_set_translation_type(lctxt, CONTEXT_TT_PASS_THRU); } @@ -1350,9 +1359,7 @@ int domain_context_mapping_one( { paddr_t root; =20 - spin_lock(&ctx->arch.mapping_lock); - - root =3D domain_pgd_maddr(domain, ctx, pgd_maddr, iommu->nr_pt_lev= els); + root =3D get_context_pgd(domain, ctx, iommu->nr_pt_levels); if ( !root ) { unmap_vtd_domain_page(context_entries); @@ -1364,8 +1371,6 @@ int domain_context_mapping_one( context_set_translation_type(lctxt, CONTEXT_TT_DEV_IOTLB); else context_set_translation_type(lctxt, CONTEXT_TT_MULTI_LEVEL); - - spin_unlock(&ctx->arch.mapping_lock); } =20 rc =3D context_set_domain_id(&lctxt, did, iommu); @@ -1394,7 +1399,6 @@ int domain_context_mapping_one( } =20 iommu_sync_cache(context, sizeof(struct context_entry)); - spin_unlock(&iommu->lock); =20 rc =3D iommu_flush_context_device(iommu, prev_did, PCI_BDF(bus, devfn), DMA_CCMD_MASK_NOBIT, !overwrite_entry); @@ -1428,7 +1432,7 @@ int domain_context_mapping_one( spin_unlock(&iommu->lock); =20 if ( !seg && !rc ) - rc =3D me_wifi_quirk(domain, bus, devfn, domid, pgd_maddr, mode); + WARN_ON(me_wifi_quirk(domain, bus, devfn, did, 0, ctx, prev_ctx)); =20 return rc; =20 @@ -1438,152 +1442,23 @@ int domain_context_mapping_one( return rc; } =20 -static const struct acpi_drhd_unit *domain_context_unmap( - struct domain *d, uint8_t devfn, struct pci_dev *pdev); - -static int domain_context_mapping(struct domain *domain, struct iommu_cont= ext *ctx, - u8 devfn, struct pci_dev *pdev) +int apply_context(struct domain *d, struct iommu_context *ctx, + struct pci_dev *pdev, u8 devfn, + struct iommu_context *prev_ctx) { - const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); - const struct acpi_rmrr_unit *rmrr; - paddr_t pgd_maddr =3D ctx->arch.vtd.pgd_maddr; - domid_t did =3D ctx->arch.vtd.didmap[drhd->iommu->index]; + struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev); + struct vtd_iommu *iommu =3D drhd->iommu; int ret =3D 0; - unsigned int i, mode =3D 0; - uint16_t seg =3D pdev->seg, bdf; - uint8_t bus =3D pdev->bus, secbus; =20 - /* - * Generally we assume only devices from one node to get assigned to a - * given guest. But even if not, by replacing the prior value here we - * guarantee that at least some basic allocations for the device being - * added will get done against its node. Any further allocations for - * this or other devices may be penalized then, but some would also be - * if we left other than NUMA_NO_NODE untouched here. - */ - if ( drhd && drhd->iommu->node !=3D NUMA_NO_NODE ) - dom_iommu(domain)->node =3D drhd->iommu->node; + if ( !drhd ) + return -EINVAL; =20 ASSERT(pcidevs_locked()); =20 - for_each_rmrr_device( rmrr, bdf, i ) - { - if ( rmrr->segment !=3D pdev->seg || bdf !=3D pdev->sbdf.bdf ) - continue; - - mode |=3D MAP_WITH_RMRR; - break; - } - - if ( domain !=3D pdev->domain && pdev->domain !=3D dom_io && - pdev->domain->is_dying ) - mode |=3D MAP_OWNER_DYING; - - switch ( pdev->type ) - { - bool prev_present; - - case DEV_TYPE_PCI_HOST_BRIDGE: - if ( iommu_debug ) - printk(VTDPREFIX "%pd:Hostbridge: skip %pp map\n", - domain, &PCI_SBDF(seg, bus, devfn)); - if ( !is_hardware_domain(domain) ) - return -EPERM; - break; - - case DEV_TYPE_PCIe_BRIDGE: - case DEV_TYPE_PCIe2PCI_BRIDGE: - case DEV_TYPE_LEGACY_PCI_BRIDGE: - break; - - case DEV_TYPE_PCIe_ENDPOINT: - if ( !drhd ) - return -ENODEV; + ret =3D apply_context_single(d, ctx, iommu, pdev->bus, pdev->devfn, pr= ev_ctx); =20 - if ( iommu_debug ) - printk(VTDPREFIX "%pd:PCIe: map %pp\n", - domain, &PCI_SBDF(seg, bus, devfn)); - ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, bus, = devfn, pdev, - did, pgd_maddr, mode); - if ( ret > 0 ) - ret =3D 0; - if ( !ret && devfn =3D=3D pdev->devfn && ats_device(pdev, drhd) > = 0 ) - enable_ats_device(pdev, &drhd->iommu->ats_devices); - - break; - - case DEV_TYPE_PCI: - if ( !drhd ) - return -ENODEV; - - if ( iommu_debug ) - printk(VTDPREFIX "%pd:PCI: map %pp\n", - domain, &PCI_SBDF(seg, bus, devfn)); - - ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, bus, = devfn, - pdev, did, pgd_maddr, mode); - if ( ret < 0 ) - break; - prev_present =3D ret; - - if ( (ret =3D find_upstream_bridge(seg, &bus, &devfn, &secbus)) < = 1 ) - { - if ( !ret ) - break; - ret =3D -ENXIO; - } - /* - * Strictly speaking if the device is the only one behind this bri= dge - * and the only one with this (secbus,0,0) tuple, it could be allo= wed - * to be re-assigned regardless of RMRR presence. But let's deal = with - * that case only if it is actually found in the wild. Note that - * dealing with this just here would still not render the operation - * secure. - */ - else if ( prev_present && (mode & MAP_WITH_RMRR) && - domain !=3D pdev->domain ) - ret =3D -EOPNOTSUPP; - - /* - * Mapping a bridge should, if anything, pass the struct pci_dev of - * that bridge. Since bridges don't normally get assigned to guest= s, - * their owner would be the wrong one. Pass NULL instead. - */ - if ( ret >=3D 0 ) - ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, b= us, devfn, - NULL, did, pgd_maddr, mode); - - /* - * Devices behind PCIe-to-PCI/PCIx bridge may generate different - * requester-id. It may originate from devfn=3D0 on the secondary = bus - * behind the bridge. Map that id as well if we didn't already. - * - * Somewhat similar as for bridges, we don't want to pass a struct - * pci_dev here - there may not even exist one for this (secbus,0,= 0) - * tuple. If there is one, without properly working device groups = it - * may again not have the correct owner. - */ - if ( !ret && pdev_type(seg, bus, devfn) =3D=3D DEV_TYPE_PCIe2PCI_B= RIDGE && - (secbus !=3D pdev->bus || pdev->devfn !=3D 0) ) - ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, s= ecbus, 0, - NULL, did, pgd_maddr, mode); - - if ( ret ) - { - if ( !prev_present ) - domain_context_unmap(domain, devfn, pdev); - else if ( pdev->domain !=3D domain ) /* Avoid infinite recursi= on. */ - domain_context_mapping(pdev->domain, ctx, devfn, pdev); - } - - break; - - default: - dprintk(XENLOG_ERR VTDPREFIX, "%pd:unknown(%u): %pp\n", - domain, pdev->type, &PCI_SBDF(seg, bus, devfn)); - ret =3D -EINVAL; - break; - } + if ( !ret && ats_device(pdev, drhd) > 0 ) + enable_ats_device(pdev, &iommu->ats_devices); =20 if ( !ret && devfn =3D=3D pdev->devfn ) pci_vtd_quirk(pdev); @@ -1591,10 +1466,8 @@ static int domain_context_mapping(struct domain *dom= ain, struct iommu_context *c return ret; } =20 -int domain_context_unmap_one( - struct domain *domain, - struct vtd_iommu *iommu, - uint8_t bus, uint8_t devfn) +int unapply_context_single(struct domain *domain, struct vtd_iommu *iommu, + struct iommu_context *prev_ctx, uint8_t bus, ui= nt8_t devfn) { struct context_entry *context, *context_entries; u64 maddr; @@ -1648,12 +1521,18 @@ int domain_context_unmap_one( if ( rc > 0 ) rc =3D 0; =20 + if ( !rc ) + { + BUG_ON(!prev_ctx->arch.vtd.iommu_dev_cnt[iommu->index]); + prev_ctx->arch.vtd.iommu_dev_cnt[iommu->index]--; + } + spin_unlock(&iommu->lock); unmap_vtd_domain_page(context_entries); =20 if ( !iommu->drhd->segment && !rc ) - rc =3D me_wifi_quirk(domain, bus, devfn, DOMID_INVALID, 0, - UNMAP_ME_PHANTOM_FUNC); + WARN_ON(me_wifi_quirk(domain, bus, devfn, DOMID_INVALID, UNMAP_ME_= PHANTOM_FUNC, + NULL, prev_ctx)); =20 if ( rc && !is_hardware_domain(domain) && domain !=3D dom_io ) { @@ -1671,128 +1550,27 @@ int domain_context_unmap_one( return rc; } =20 -static const struct acpi_drhd_unit *domain_context_unmap( - struct domain *domain, - uint8_t devfn, - struct pci_dev *pdev) -{ - const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); - struct vtd_iommu *iommu =3D drhd ? drhd->iommu : NULL; - int ret; - uint16_t seg =3D pdev->seg; - uint8_t bus =3D pdev->bus, tmp_bus, tmp_devfn, secbus; - - switch ( pdev->type ) - { - case DEV_TYPE_PCI_HOST_BRIDGE: - if ( iommu_debug ) - printk(VTDPREFIX "%pd:Hostbridge: skip %pp unmap\n", - domain, &PCI_SBDF(seg, bus, devfn)); - return ERR_PTR(is_hardware_domain(domain) ? 0 : -EPERM); - - case DEV_TYPE_PCIe_BRIDGE: - case DEV_TYPE_PCIe2PCI_BRIDGE: - case DEV_TYPE_LEGACY_PCI_BRIDGE: - return ERR_PTR(0); - - case DEV_TYPE_PCIe_ENDPOINT: - if ( !iommu ) - return ERR_PTR(-ENODEV); - - if ( iommu_debug ) - printk(VTDPREFIX "%pd:PCIe: unmap %pp\n", - domain, &PCI_SBDF(seg, bus, devfn)); - ret =3D domain_context_unmap_one(domain, iommu, bus, devfn); - if ( !ret && devfn =3D=3D pdev->devfn && ats_device(pdev, drhd) > = 0 ) - disable_ats_device(pdev); - - break; - - case DEV_TYPE_PCI: - if ( !iommu ) - return ERR_PTR(-ENODEV); - - if ( iommu_debug ) - printk(VTDPREFIX "%pd:PCI: unmap %pp\n", - domain, &PCI_SBDF(seg, bus, devfn)); - ret =3D domain_context_unmap_one(domain, iommu, bus, devfn); - if ( ret ) - break; - - tmp_bus =3D bus; - tmp_devfn =3D devfn; - if ( (ret =3D find_upstream_bridge(seg, &tmp_bus, &tmp_devfn, - &secbus)) < 1 ) - { - if ( ret ) - { - ret =3D -ENXIO; - if ( !domain->is_dying && - !is_hardware_domain(domain) && domain !=3D dom_io ) - { - domain_crash(domain); - /* Make upper layers continue in a best effort manner.= */ - ret =3D 0; - } - } - break; - } - - ret =3D domain_context_unmap_one(domain, iommu, tmp_bus, tmp_devfn= ); - /* PCIe to PCI/PCIx bridge */ - if ( !ret && pdev_type(seg, tmp_bus, tmp_devfn) =3D=3D DEV_TYPE_PC= Ie2PCI_BRIDGE ) - ret =3D domain_context_unmap_one(domain, iommu, secbus, 0); - - break; - - default: - dprintk(XENLOG_ERR VTDPREFIX, "%pd:unknown(%u): %pp\n", - domain, pdev->type, &PCI_SBDF(seg, bus, devfn)); - return ERR_PTR(-EINVAL); - } - - return drhd; -} - -static void cf_check iommu_clear_root_pgtable(struct domain *d) +static void cf_check iommu_clear_root_pgtable(struct domain *d, + struct iommu_context *ctx) { - struct iommu_context *ctx =3D iommu_default_context(d); - - spin_lock(&ctx->arch.mapping_lock); ctx->arch.vtd.pgd_maddr =3D 0; - spin_unlock(&ctx->arch.mapping_lock); } =20 static void cf_check iommu_domain_teardown(struct domain *d) { struct iommu_context *ctx =3D iommu_default_context(d); - const struct acpi_drhd_unit *drhd; =20 if ( list_empty(&acpi_drhd_units) ) return; =20 - iommu_identity_map_teardown(d, ctx); - ASSERT(!ctx->arch.vtd.pgd_maddr); - - for_each_drhd_unit ( drhd ) - iommu_free_domid(d->domain_id, drhd->iommu->domid_bitmap); - - XFREE(ctx->arch.vtd.iommu_dev_cnt); - XFREE(ctx->arch.vtd.didmap); -} - -static void quarantine_teardown(struct pci_dev *pdev, - const struct acpi_drhd_unit *drhd) -{ } =20 static int __must_check cf_check intel_iommu_map_page( struct domain *d, dfn_t dfn, mfn_t mfn, unsigned int flags, - unsigned int *flush_flags) + unsigned int *flush_flags, struct iommu_context *ctx) { struct domain_iommu *hd =3D dom_iommu(d); - struct iommu_context *ctx =3D iommu_default_context(d); struct dma_pte *page, *pte, old, new =3D {}; u64 pg_maddr; unsigned int level =3D (IOMMUF_order(flags) / LEVEL_STRIDE) + 1; @@ -1801,35 +1579,22 @@ static int __must_check cf_check intel_iommu_map_pa= ge( ASSERT((hd->platform_ops->page_sizes >> IOMMUF_order(flags)) & PAGE_SIZE_4K); =20 - /* Do nothing if VT-d shares EPT page table */ - if ( iommu_use_hap_pt(d) ) + if ( ctx->opaque ) return 0; =20 - /* Do nothing if hardware domain and iommu supports pass thru. */ - if ( iommu_hwdom_passthrough && is_hardware_domain(d) ) - return 0; - - spin_lock(&ctx->arch.mapping_lock); - /* * IOMMU mapping request can be safely ignored when the domain is dyin= g. * - * hd->arch.mapping_lock guarantees that d->is_dying will be observed + * hd->lock guarantees that d->is_dying will be observed * before any page tables are freed (see iommu_free_pgtables()) */ if ( d->is_dying ) - { - spin_unlock(&ctx->arch.mapping_lock); return 0; - } =20 pg_maddr =3D addr_to_dma_page_maddr(d, ctx, dfn_to_daddr(dfn), level, = flush_flags, true); if ( pg_maddr < PAGE_SIZE ) - { - spin_unlock(&ctx->arch.mapping_lock); return -ENOMEM; - } =20 page =3D (struct dma_pte *)map_vtd_domain_page(pg_maddr); pte =3D &page[address_level_offset(dfn_to_daddr(dfn), level)]; @@ -1848,7 +1613,6 @@ static int __must_check cf_check intel_iommu_map_page( =20 if ( !((old.val ^ new.val) & ~DMA_PTE_CONTIG_MASK) ) { - spin_unlock(&ctx->arch.mapping_lock); unmap_vtd_domain_page(page); return 0; } @@ -1891,7 +1655,6 @@ static int __must_check cf_check intel_iommu_map_page( perfc_incr(iommu_pt_coalesces); } =20 - spin_unlock(&ctx->arch.mapping_lock); unmap_vtd_domain_page(page); =20 *flush_flags |=3D IOMMU_FLUSHF_added; @@ -1908,10 +1671,10 @@ static int __must_check cf_check intel_iommu_map_pa= ge( } =20 static int __must_check cf_check intel_iommu_unmap_page( - struct domain *d, dfn_t dfn, unsigned int order, unsigned int *flush_f= lags) + struct domain *d, dfn_t dfn, unsigned int order, unsigned int *flush_f= lags, + struct iommu_context *ctx) { struct domain_iommu *hd =3D dom_iommu(d); - struct iommu_context *ctx =3D iommu_default_context(d); daddr_t addr =3D dfn_to_daddr(dfn); struct dma_pte *page =3D NULL, *pte =3D NULL, old; uint64_t pg_maddr; @@ -1923,20 +1686,13 @@ static int __must_check cf_check intel_iommu_unmap_= page( */ ASSERT((hd->platform_ops->page_sizes >> order) & PAGE_SIZE_4K); =20 - /* Do nothing if VT-d shares EPT page table */ - if ( iommu_use_hap_pt(d) ) - return 0; - - /* Do nothing if hardware domain and iommu supports pass thru. */ - if ( iommu_hwdom_passthrough && is_hardware_domain(d) ) + if ( ctx->opaque ) return 0; =20 - spin_lock(&ctx->arch.mapping_lock); /* get target level pte */ pg_maddr =3D addr_to_dma_page_maddr(d, ctx, addr, level, flush_flags, = false); if ( pg_maddr < PAGE_SIZE ) { - spin_unlock(&ctx->arch.mapping_lock); return pg_maddr ? -ENOMEM : 0; } =20 @@ -1945,7 +1701,6 @@ static int __must_check cf_check intel_iommu_unmap_pa= ge( =20 if ( !dma_pte_present(*pte) ) { - spin_unlock(&ctx->arch.mapping_lock); unmap_vtd_domain_page(page); return 0; } @@ -1976,8 +1731,6 @@ static int __must_check cf_check intel_iommu_unmap_pa= ge( perfc_incr(iommu_pt_coalesces); } =20 - spin_unlock(&ctx->arch.mapping_lock); - unmap_vtd_domain_page(page); =20 *flush_flags |=3D IOMMU_FLUSHF_modified; @@ -1990,25 +1743,16 @@ static int __must_check cf_check intel_iommu_unmap_= page( } =20 static int cf_check intel_iommu_lookup_page( - struct domain *d, dfn_t dfn, mfn_t *mfn, unsigned int *flags) + struct domain *d, dfn_t dfn, mfn_t *mfn, unsigned int *flags, + struct iommu_context *ctx) { - struct iommu_context *ctx =3D iommu_default_context(d); uint64_t val; =20 - /* - * If VT-d shares EPT page table or if the domain is the hardware - * domain and iommu_passthrough is set then pass back the dfn. - */ - if ( iommu_use_hap_pt(d) || - (iommu_hwdom_passthrough && is_hardware_domain(d)) ) + if ( ctx->opaque ) return -EOPNOTSUPP; =20 - spin_lock(&ctx->arch.mapping_lock); - val =3D addr_to_dma_page_maddr(d, ctx, dfn_to_daddr(dfn), 0, NULL, fal= se); =20 - spin_unlock(&ctx->arch.mapping_lock); - if ( val < PAGE_SIZE ) return -ENOENT; =20 @@ -2037,103 +1781,10 @@ static bool __init vtd_ept_page_compatible(const s= truct vtd_iommu *iommu) (cap_sps_1gb(vtd_cap) && iommu_superpages); } =20 -static int cf_check intel_iommu_add_device(u8 devfn, struct pci_dev *pdev) -{ - struct acpi_rmrr_unit *rmrr; - struct iommu_context *ctx; - u16 bdf; - int ret, i; - - ASSERT(pcidevs_locked()); - - if ( !pdev->domain ) - return -EINVAL; - - ctx =3D iommu_default_context(pdev->domain); - - for_each_rmrr_device ( rmrr, bdf, i ) - { - if ( rmrr->segment =3D=3D pdev->seg && bdf =3D=3D PCI_BDF(pdev->bu= s, devfn) ) - { - /* - * iommu_add_device() is only called for the hardware - * domain (see xen/drivers/passthrough/pci.c:pci_add_device()). - * Since RMRRs are always reserved in the e820 map for the har= dware - * domain, there shouldn't be a conflict. - */ - ret =3D iommu_identity_mapping(pdev->domain, ctx, p2m_access_r= w, - rmrr->base_address, rmrr->end_add= ress, - 0); - if ( ret ) - dprintk(XENLOG_ERR VTDPREFIX, "%pd: RMRR mapping failed\n", - pdev->domain); - } - } - - ret =3D domain_context_mapping(pdev->domain, ctx, devfn, pdev); - if ( ret ) - dprintk(XENLOG_ERR VTDPREFIX, "%pd: context mapping failed\n", - pdev->domain); - - return ret; -} - -static int cf_check intel_iommu_enable_device(struct pci_dev *pdev) -{ - struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev); - int ret =3D drhd ? ats_device(pdev, drhd) : -ENODEV; - - pci_vtd_quirk(pdev); - - if ( ret <=3D 0 ) - return ret; - - ret =3D enable_ats_device(pdev, &drhd->iommu->ats_devices); - - return ret >=3D 0 ? 0 : ret; -} - -static int cf_check intel_iommu_remove_device(u8 devfn, struct pci_dev *pd= ev) -{ - const struct acpi_drhd_unit *drhd; - struct acpi_rmrr_unit *rmrr; - u16 bdf; - unsigned int i; - struct iommu_context *ctx; - - if ( !pdev->domain ) - return -EINVAL; - - ctx =3D iommu_default_context(pdev->domain); - - drhd =3D domain_context_unmap(pdev->domain, devfn, pdev); - if ( IS_ERR(drhd) ) - return PTR_ERR(drhd); - - for_each_rmrr_device ( rmrr, bdf, i ) - { - if ( rmrr->segment !=3D pdev->seg || bdf !=3D PCI_BDF(pdev->bus, d= evfn) ) - continue; - - /* - * Any flag is nothing to clear these mappings but here - * its always safe and strict to set 0. - */ - iommu_identity_mapping(pdev->domain, ctx, p2m_access_x, rmrr->base= _address, - rmrr->end_address, 0); - } - - quarantine_teardown(pdev, drhd); - - return 0; -} - static int __hwdom_init cf_check setup_hwdom_device( u8 devfn, struct pci_dev *pdev) { - struct iommu_context *ctx =3D iommu_default_context(pdev->domain); - - return domain_context_mapping(pdev->domain, ctx, devfn, pdev); + return iommu_attach_context(hardware_domain, pdev, 0); } =20 void clear_fault_bits(struct vtd_iommu *iommu) @@ -2303,30 +1954,6 @@ static int __must_check init_vtd_hw(bool resume) return iommu_flush_all(); } =20 -static void __hwdom_init setup_hwdom_rmrr(struct domain *d) -{ - struct iommu_context *ctx =3D iommu_default_context(d); - struct acpi_rmrr_unit *rmrr; - u16 bdf; - int ret, i; - - pcidevs_lock(); - for_each_rmrr_device ( rmrr, bdf, i ) - { - /* - * Here means we're add a device to the hardware domain. - * Since RMRRs are always reserved in the e820 map for the hardware - * domain, there shouldn't be a conflict. So its always safe and - * strict to set 0. - */ - ret =3D iommu_identity_mapping(d, ctx, p2m_access_rw, rmrr->base_a= ddress, - rmrr->end_address, 0); - if ( ret ) - dprintk(XENLOG_ERR VTDPREFIX, - "IOMMU: mapping reserved region failed\n"); - } - pcidevs_unlock(); -} =20 static struct iommu_state { uint32_t fectl; @@ -2475,173 +2102,6 @@ static int __init cf_check vtd_setup(void) return ret; } =20 -static int cf_check reassign_device_ownership( - struct domain *source, - struct domain *target, - u8 devfn, struct pci_dev *pdev) -{ - int ret; - - if ( !has_arch_pdevs(target) ) - vmx_pi_hooks_assign(target); - -#ifdef CONFIG_PV - /* - * Devices assigned to untrusted domains (here assumed to be any do= mU) - * can attempt to send arbitrary LAPIC/MSI messages. We are unprote= cted - * by the root complex unless interrupt remapping is enabled. - */ - if ( !iommu_intremap && !is_hardware_domain(target) && - !is_system_domain(target) ) - untrusted_msi =3D true; -#endif - - ret =3D domain_context_mapping(target, iommu_default_context(target), = devfn, pdev); - - if ( ret ) - { - if ( !has_arch_pdevs(target) ) - vmx_pi_hooks_deassign(target); - return ret; - } - - if ( devfn =3D=3D pdev->devfn && pdev->domain !=3D target ) - { - write_lock(&source->pci_lock); - list_del(&pdev->domain_list); - write_unlock(&source->pci_lock); - - pdev->domain =3D target; - - write_lock(&target->pci_lock); - list_add(&pdev->domain_list, &target->pdev_list); - write_unlock(&target->pci_lock); - } - - if ( !has_arch_pdevs(source) ) - vmx_pi_hooks_deassign(source); - - /* - * If the device belongs to the hardware domain, and it has RMRR, don't - * remove it from the hardware domain, because BIOS may use RMRR at - * booting time. - */ - if ( !is_hardware_domain(source) ) - { - const struct acpi_rmrr_unit *rmrr; - struct iommu_context *ctx =3D iommu_default_context(source); - u16 bdf; - unsigned int i; - - for_each_rmrr_device( rmrr, bdf, i ) - if ( rmrr->segment =3D=3D pdev->seg && - bdf =3D=3D PCI_BDF(pdev->bus, devfn) ) - { - /* - * Any RMRR flag is always ignored when remove a device, - * but its always safe and strict to set 0. - */ - ret =3D iommu_identity_mapping(source, ctx, p2m_access_x, - rmrr->base_address, - rmrr->end_address, 0); - if ( ret && ret !=3D -ENOENT ) - return ret; - } - } - - return 0; -} - -static int cf_check intel_iommu_assign_device( - struct domain *d, u8 devfn, struct pci_dev *pdev, u32 flag) -{ - struct domain *s =3D pdev->domain; - struct iommu_context *ctx =3D iommu_default_context(d); - struct acpi_rmrr_unit *rmrr; - int ret =3D 0, i; - u16 bdf, seg; - u8 bus; - - if ( list_empty(&acpi_drhd_units) ) - return -ENODEV; - - seg =3D pdev->seg; - bus =3D pdev->bus; - /* - * In rare cases one given rmrr is shared by multiple devices but - * obviously this would put the security of a system at risk. So - * we would prevent from this sort of device assignment. But this - * can be permitted if user set - * "pci =3D [ 'sbdf, rdm_policy=3Drelaxed' ]" - * - * TODO: in the future we can introduce group device assignment - * interface to make sure devices sharing RMRR are assigned to the - * same domain together. - */ - for_each_rmrr_device( rmrr, bdf, i ) - { - if ( rmrr->segment =3D=3D seg && bdf =3D=3D PCI_BDF(bus, devfn) && - rmrr->scope.devices_cnt > 1 ) - { - bool relaxed =3D flag & XEN_DOMCTL_DEV_RDM_RELAXED; - - printk(XENLOG_GUEST "%s" VTDPREFIX - " It's %s to assign %pp" - " with shared RMRR at %"PRIx64" for %pd.\n", - relaxed ? XENLOG_WARNING : XENLOG_ERR, - relaxed ? "risky" : "disallowed", - &PCI_SBDF(seg, bus, devfn), rmrr->base_address, d); - if ( !relaxed ) - return -EPERM; - } - } - - /* Setup rmrr identity mapping */ - for_each_rmrr_device( rmrr, bdf, i ) - { - if ( rmrr->segment =3D=3D seg && bdf =3D=3D PCI_BDF(bus, devfn) ) - { - ret =3D iommu_identity_mapping(d, ctx, p2m_access_rw, rmrr->ba= se_address, - rmrr->end_address, flag); - if ( ret ) - { - printk(XENLOG_G_ERR VTDPREFIX - "%pd: cannot map reserved region [%"PRIx64",%"PRIx6= 4"]: %d\n", - d, rmrr->base_address, rmrr->end_address, ret); - break; - } - } - } - - if ( !ret ) - ret =3D reassign_device_ownership(s, d, devfn, pdev); - - /* See reassign_device_ownership() for the hwdom aspect. */ - if ( !ret || is_hardware_domain(d) ) - return ret; - - for_each_rmrr_device( rmrr, bdf, i ) - { - if ( rmrr->segment =3D=3D seg && bdf =3D=3D PCI_BDF(bus, devfn) ) - { - int rc =3D iommu_identity_mapping(d, ctx, p2m_access_x, - rmrr->base_address, - rmrr->end_address, 0); - - if ( rc && rc !=3D -ENOENT ) - { - printk(XENLOG_ERR VTDPREFIX - "%pd: cannot unmap reserved region [%"PRIx64",%"PRI= x64"]: %d\n", - d, rmrr->base_address, rmrr->end_address, rc); - domain_crash(d); - break; - } - } - } - - return ret; -} - static int cf_check intel_iommu_group_id(u16 seg, u8 bus, u8 devfn) { u8 secbus; @@ -2766,6 +2226,11 @@ static void vtd_dump_page_table_level(paddr_t pt_mad= dr, int level, paddr_t gpa, if ( level < 1 ) return; =20 + if (pt_maddr =3D=3D 0) { + printk(" (empty)\n"); + return; + } + pt_vaddr =3D map_vtd_domain_page(pt_maddr); =20 next_level =3D level - 1; @@ -2797,20 +2262,296 @@ static void vtd_dump_page_table_level(paddr_t pt_m= addr, int level, paddr_t gpa, static void cf_check vtd_dump_page_tables(struct domain *d) { const struct domain_iommu *hd =3D dom_iommu(d); - struct iommu_context *ctx =3D iommu_default_context(d); + unsigned int i; =20 - printk(VTDPREFIX" %pd table has %d levels\n", d, + printk(VTDPREFIX " %pd table has %d levels\n", d, agaw_to_level(hd->arch.vtd.agaw)); - vtd_dump_page_table_level(ctx->arch.vtd.pgd_maddr, - agaw_to_level(hd->arch.vtd.agaw), 0, 0); + + for (i =3D 1; i < (1 + hd->other_contexts.count); ++i) + { + struct iommu_context *ctx =3D iommu_get_context(d, i); + + printk(VTDPREFIX " %pd context %d: %s\n", d, i, + ctx ? "allocated" : "non-allocated"); + + if (ctx) + { + vtd_dump_page_table_level(ctx->arch.vtd.pgd_maddr, + agaw_to_level(hd->arch.vtd.agaw), 0,= 0); + iommu_put_context(ctx); + } + } +} + +static int intel_iommu_cleanup_pte(uint64_t pte_maddr, bool preempt) +{ + size_t i; + struct dma_pte *pte =3D map_vtd_domain_page(pte_maddr); + + for (i =3D 0; i < (1 << PAGETABLE_ORDER); ++i) + if ( dma_pte_present(pte[i]) ) + { + /* Remove the reference of the target mapping (if needed) */ + mfn_t mfn =3D maddr_to_mfn(dma_pte_addr(pte[i])); + + if ( mfn_valid(mfn) ) + put_page(mfn_to_page(mfn)); + + if ( preempt ) + dma_clear_pte(pte[i]); + } + + unmap_vtd_domain_page(pte); + + return 0; +} + +/** + * Cleanup logic : + * Walk through the entire page table, progressively removing mappings if = preempt. + * + * Return values : + * - Report preemption with -ERESTART. + * - Report empty pte/pgd with 0. + * + * When preempted during superpage operation, store state in vtd.superpage= _progress. + */ + +static int intel_iommu_cleanup_superpage(struct iommu_context *ctx, + unsigned int page_order, uint64_= t pte_maddr, + bool preempt) +{ + size_t i =3D 0, page_count =3D 1 << page_order; + struct page_info *page =3D maddr_to_page(pte_maddr); + + if ( preempt ) + i =3D ctx->arch.vtd.superpage_progress; + + for (; i < page_count; page++) + { + put_page(page); + + if ( preempt && (i & 0xff) && general_preempt_check() ) + { + ctx->arch.vtd.superpage_progress =3D i + 1; + return -ERESTART; + } + } + + if ( preempt ) + ctx->arch.vtd.superpage_progress =3D 0; + + return 0; +} + +static int intel_iommu_cleanup_mappings(struct iommu_context *ctx, + unsigned int nr_pt_levels, uint64= _t pgd_maddr, + bool preempt) +{ + size_t i; + int rc; + struct dma_pte *pgd; + + if ( ctx->opaque ) + /* don't touch opaque contexts */ + return 0; + + pgd =3D map_vtd_domain_page(pgd_maddr); + + for (i =3D 0; i < (1 << PAGETABLE_ORDER); ++i) + { + if ( dma_pte_present(pgd[i]) ) + { + uint64_t pte_maddr =3D dma_pte_addr(pgd[i]); + + if ( dma_pte_superpage(pgd[i]) ) + rc =3D intel_iommu_cleanup_superpage(ctx, nr_pt_levels * S= UPERPAGE_ORDER, + pte_maddr, preempt); + else if ( nr_pt_levels > 2 ) + /* Next level is not PTE */ + rc =3D intel_iommu_cleanup_mappings(ctx, nr_pt_levels - 1, + pte_maddr, preempt); + else + rc =3D intel_iommu_cleanup_pte(pte_maddr, preempt); + + if ( preempt && !rc ) + /* Fold pgd (no more mappings in it) */ + dma_clear_pte(pgd[i]); + else if ( preempt && (rc =3D=3D -ERESTART || general_preempt_c= heck()) ) + { + unmap_vtd_domain_page(pgd); + return -ERESTART; + } + } + } + + unmap_vtd_domain_page(pgd); + + return 0; +} + +static int cf_check intel_iommu_context_teardown(struct domain *d, + struct iommu_context *ctx, u32 fla= gs) +{ + struct acpi_drhd_unit *drhd; + pcidevs_lock(); + + // Cleanup mappings + if ( intel_iommu_cleanup_mappings(ctx, agaw_to_level(d->iommu.arch.vtd= .agaw), + ctx->arch.vtd.pgd_maddr, + flags & IOMMUF_preempt) < 0 ) + { + pcidevs_unlock(); + return -ERESTART; + } + + ASSERT(ctx->arch.vtd.didmap); + + for_each_drhd_unit(drhd) + { + unsigned long index =3D drhd->iommu->index; + + iommu_free_domid(ctx->arch.vtd.didmap[index], drhd->iommu->domid_b= itmap); + } + + xfree(ctx->arch.vtd.didmap); + + pcidevs_unlock(); + return arch_iommu_context_teardown(d, ctx, flags); } =20 -static int cf_check intel_iommu_quarantine_init(struct pci_dev *pdev, - bool scratch_page) +static int intel_iommu_dev_rmrr(struct domain *d, struct pci_dev *pdev, + struct iommu_context *ctx, bool unmap) { + struct acpi_rmrr_unit *rmrr; + u16 bdf; + int ret, i; + + for_each_rmrr_device(rmrr, bdf, i) + { + if ( PCI_SBDF(rmrr->segment, bdf).sbdf =3D=3D pdev->sbdf.sbdf ) + { + ret =3D iommu_identity_mapping(d, ctx, + unmap ? p2m_access_x : p2m_access= _rw, + rmrr->base_address, rmrr->end_add= ress, + 0); + + if ( ret < 0 ) + return ret; + } + } + return 0; } =20 +static int cf_check intel_iommu_attach(struct domain *d, struct pci_dev *p= dev, + struct iommu_context *ctx) +{ + int ret; + const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); + + if ( !pdev || !drhd ) + return -EINVAL; + + ret =3D intel_iommu_dev_rmrr(d, pdev, ctx, false); + + if ( ret ) + return ret; + + ret =3D apply_context(d, ctx, pdev, pdev->devfn, NULL); + + if ( ret ) + return ret; + + pci_vtd_quirk(pdev); + + return ret; +} + +static int cf_check intel_iommu_detach(struct domain *d, struct pci_dev *p= dev, + struct iommu_context *prev_ctx) +{ + int ret, rc; + const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); + + if ( !pdev || !drhd ) + return -EINVAL; + + ret =3D unapply_context_single(d, drhd->iommu, prev_ctx, pdev->bus, pd= ev->devfn); + + if ( ret ) + return ret; + + if ( (rc =3D intel_iommu_dev_rmrr(d, pdev, prev_ctx, true)) ) + printk(XENLOG_WARNING VTDPREFIX + " Unable to unmap RMRR from d%dc%d for %pp (%d)\n", + d->domain_id, prev_ctx->id, &pdev->sbdf, rc); + + return ret; +} + +static int cf_check intel_iommu_reattach(struct domain *d, + struct pci_dev *pdev, + struct iommu_context *prev_ctx, + struct iommu_context *ctx) +{ + int ret, rc; + const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); + + if ( !pdev || !drhd ) + return -EINVAL; + + ret =3D intel_iommu_dev_rmrr(d, pdev, ctx, false); + + if ( ret ) + return ret; + + ret =3D apply_context(d, ctx, pdev, pdev->devfn, prev_ctx); + + if ( ret ) + { + /* Remove the RMRR we just added */ + if ( (rc =3D intel_iommu_dev_rmrr(d, pdev, ctx, true)) ) + printk(XENLOG_WARNING VTDPREFIX + " Unable to unmap RMRR from d%dc%d for %pp (%d)\n", + d->domain_id, prev_ctx->id, &pdev->sbdf, rc); + + return ret; + } + + /* Remove previous context RMRR */ + if ( (rc =3D intel_iommu_dev_rmrr(d, pdev, prev_ctx, true)) ) + printk(XENLOG_WARNING VTDPREFIX + " Unable to unmap previous RMRR for %pp (%d)\n", &pdev->sbd= f, rc); + + pci_vtd_quirk(pdev); + + return ret; +} + +static int cf_check intel_iommu_add_devfn(struct domain *d, + struct pci_dev *pdev, u16 devfn, + struct iommu_context *ctx) +{ + const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); + + if ( !pdev || !drhd ) + return -EINVAL; + + return apply_context(d, ctx, pdev, devfn, NULL); +} + +static int cf_check intel_iommu_remove_devfn(struct domain *d, struct pci_= dev *pdev, + u16 devfn, struct iommu_conte= xt *prev_ctx) +{ + const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); + + if ( !pdev || !drhd ) + return -EINVAL; + + return unapply_context_single(d, drhd->iommu, prev_ctx, pdev->bus, dev= fn); +} + static void cf_check vtd_quiesce(void) { const struct acpi_drhd_unit *drhd; @@ -2833,17 +2574,18 @@ static const struct iommu_ops __initconst_cf_clobbe= r vtd_ops =3D { .page_sizes =3D PAGE_SIZE_4K, .init =3D intel_iommu_domain_init, .hwdom_init =3D intel_iommu_hwdom_init, - .quarantine_init =3D intel_iommu_quarantine_init, - .add_device =3D intel_iommu_add_device, - .enable_device =3D intel_iommu_enable_device, - .remove_device =3D intel_iommu_remove_device, - .assign_device =3D intel_iommu_assign_device, + .context_init =3D intel_iommu_context_init, + .context_teardown =3D intel_iommu_context_teardown, + .attach =3D intel_iommu_attach, + .detach =3D intel_iommu_detach, + .reattach =3D intel_iommu_reattach, + .add_devfn =3D intel_iommu_add_devfn, + .remove_devfn =3D intel_iommu_remove_devfn, .teardown =3D iommu_domain_teardown, .clear_root_pgtable =3D iommu_clear_root_pgtable, .map_page =3D intel_iommu_map_page, .unmap_page =3D intel_iommu_unmap_page, .lookup_page =3D intel_iommu_lookup_page, - .reassign_device =3D reassign_device_ownership, .get_device_group_id =3D intel_iommu_group_id, .enable_x2apic =3D intel_iommu_enable_eim, .disable_x2apic =3D intel_iommu_disable_eim, diff --git a/xen/drivers/passthrough/vtd/quirks.c b/xen/drivers/passthrough= /vtd/quirks.c index 195fc5c08f..517d0e7a13 100644 --- a/xen/drivers/passthrough/vtd/quirks.c +++ b/xen/drivers/passthrough/vtd/quirks.c @@ -409,9 +409,9 @@ void __init platform_quirks_init(void) =20 static int __must_check map_me_phantom_function(struct domain *domain, unsigned int dev, - domid_t domid, - paddr_t pgd_maddr, - unsigned int mode) + unsigned int mode, + struct iommu_context *ctx, + struct iommu_context *prev= _ctx) { struct acpi_drhd_unit *drhd; struct pci_dev *pdev; @@ -423,19 +423,17 @@ static int __must_check map_me_phantom_function(struc= t domain *domain, =20 /* map or unmap ME phantom function */ if ( !(mode & UNMAP_ME_PHANTOM_FUNC) ) - rc =3D domain_context_mapping_one(domain, iommu_default_context(do= main), - drhd->iommu, 0, - PCI_DEVFN(dev, 7), NULL, - domid, pgd_maddr, mode); + rc =3D apply_context_single(domain, ctx, drhd->iommu, 0, + PCI_DEVFN(dev, 7), prev_ctx); else - rc =3D domain_context_unmap_one(domain, drhd->iommu, 0, - PCI_DEVFN(dev, 7)); + rc =3D unapply_context_single(domain, drhd->iommu, prev_ctx, 0, PC= I_DEVFN(dev, 7)); =20 return rc; } =20 int me_wifi_quirk(struct domain *domain, uint8_t bus, uint8_t devfn, - domid_t domid, paddr_t pgd_maddr, unsigned int mode) + domid_t domid, unsigned int mode, + struct iommu_context *ctx, struct iommu_context *prev_ct= x) { u32 id; int rc =3D 0; @@ -459,7 +457,7 @@ int me_wifi_quirk(struct domain *domain, uint8_t bus, u= int8_t devfn, case 0x423b8086: case 0x423c8086: case 0x423d8086: - rc =3D map_me_phantom_function(domain, 3, domid, pgd_maddr= , mode); + rc =3D map_me_phantom_function(domain, 3, mode, ctx, prev_= ctx); break; default: break; @@ -485,7 +483,7 @@ int me_wifi_quirk(struct domain *domain, uint8_t bus, u= int8_t devfn, case 0x42388086: /* Puma Peak */ case 0x422b8086: case 0x422c8086: - rc =3D map_me_phantom_function(domain, 22, domid, pgd_madd= r, mode); + rc =3D map_me_phantom_function(domain, 22, mode, ctx, prev= _ctx); break; default: break; diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/= x86/iommu.c index 98cca92dc3..d8becfa869 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -12,6 +12,12 @@ * this program; If not, see . */ =20 +#include +#include +#include +#include +#include +#include #include #include #include @@ -19,7 +25,6 @@ #include #include #include -#include #include #include #include @@ -29,6 +34,9 @@ #include #include #include +#include +#include +#include =20 const struct iommu_init_ops *__initdata iommu_init_ops; struct iommu_ops __ro_after_init iommu_ops; @@ -192,8 +200,6 @@ int arch_iommu_domain_init(struct domain *d) =20 int arch_iommu_context_init(struct domain *d, struct iommu_context *ctx, u= 32 flags) { - spin_lock_init(&ctx->arch.mapping_lock); - INIT_PAGE_LIST_HEAD(&ctx->arch.pgtables); INIT_LIST_HEAD(&ctx->arch.identity_maps); =20 @@ -220,6 +226,95 @@ struct identity_map { unsigned int count; }; =20 +static int unmap_identity_region(struct domain *d, struct iommu_context *c= tx, + unsigned int base_pfn, unsigned int end_p= fn) +{ + int ret =3D 0; + + if ( ctx->opaque && !ctx->id ) + { + #ifdef CONFIG_HVM + this_cpu(iommu_dont_flush_iotlb) =3D true; + while ( base_pfn < end_pfn ) + { + if ( p2m_remove_identity_entry(d, base_pfn) ) + ret =3D -ENXIO; + + base_pfn++; + } + this_cpu(iommu_dont_flush_iotlb) =3D false; + #else + ASSERT_UNREACHABLE(); + #endif + } + else + { + size_t page_count =3D end_pfn - base_pfn + 1; + unsigned int flush_flags; + + ret =3D iommu_unmap(d, _dfn(base_pfn), page_count, 0, &flush_flags, + ctx->id); + + if ( ret ) + return ret; + + ret =3D iommu_iotlb_flush(d, _dfn(base_pfn), page_count, + flush_flags, ctx->id); + } + + return ret; +} + +static int map_identity_region(struct domain *d, struct iommu_context *ctx, + unsigned int base_pfn, unsigned int end_pfn, + p2m_access_t p2ma, unsigned int flag) +{ + int ret =3D 0; + unsigned int flush_flags =3D 0; + size_t page_count =3D end_pfn - base_pfn + 1; + + if ( ctx->opaque && !ctx->id ) + { + #ifdef CONFIG_HVM + int i; + this_cpu(iommu_dont_flush_iotlb) =3D true; + + for (i =3D 0; i < page_count; i++) + { + ret =3D p2m_add_identity_entry(d, base_pfn + i, p2ma, flag); + + if ( ret ) + break; + + base_pfn++; + } + this_cpu(iommu_dont_flush_iotlb) =3D false; + #else + ASSERT_UNREACHABLE(); + #endif + } + else + { + int i; + + for (i =3D 0; i < page_count; i++) + { + ret =3D iommu_map(d, _dfn(base_pfn + i), _mfn(base_pfn + i), 1, + p2m_access_to_iommu_flags(p2ma), &flush_flags, + ctx->id); + + if ( ret ) + break; + } + } + + ret =3D iommu_iotlb_flush(d, _dfn(base_pfn), page_count, flush_flags, + ctx->id); + + return ret; +} + +/* p2m_access_x removes the mapping */ int iommu_identity_mapping(struct domain *d, struct iommu_context *ctx, p2m_access_t p2ma, paddr_t base, paddr_t end, unsigned int flag) @@ -227,24 +322,20 @@ int iommu_identity_mapping(struct domain *d, struct i= ommu_context *ctx, unsigned long base_pfn =3D base >> PAGE_SHIFT_4K; unsigned long end_pfn =3D PAGE_ALIGN_4K(end) >> PAGE_SHIFT_4K; struct identity_map *map; + int ret =3D 0; =20 ASSERT(pcidevs_locked()); ASSERT(base < end); =20 - /* - * No need to acquire hd->arch.mapping_lock: Both insertion and removal - * get done while holding pcidevs_lock. - */ list_for_each_entry( map, &ctx->arch.identity_maps, list ) { if ( map->base =3D=3D base && map->end =3D=3D end ) { - int ret =3D 0; - if ( p2ma !=3D p2m_access_x ) { if ( map->access !=3D p2ma ) return -EADDRINUSE; + ++map->count; return 0; } @@ -252,12 +343,9 @@ int iommu_identity_mapping(struct domain *d, struct io= mmu_context *ctx, if ( --map->count ) return 0; =20 - while ( base_pfn < end_pfn ) - { - if ( clear_identity_p2m_entry(d, base_pfn) ) - ret =3D -ENXIO; - base_pfn++; - } + printk("Unmapping [%"PRI_mfn"x:%"PRI_mfn"] for d%dc%d\n", base= _pfn, end_pfn, + d->domain_id, ctx->id); + ret =3D unmap_identity_region(d, ctx, base_pfn, end_pfn); =20 list_del(&map->list); xfree(map); @@ -281,27 +369,17 @@ int iommu_identity_mapping(struct domain *d, struct i= ommu_context *ctx, map->access =3D p2ma; map->count =3D 1; =20 - /* - * Insert into list ahead of mapping, so the range can be found when - * trying to clean up. - */ - list_add_tail(&map->list, &ctx->arch.identity_maps); + printk("Mapping [%"PRI_mfn"x:%"PRI_mfn"] for d%dc%d\n", base_pfn, end_= pfn, + d->domain_id, ctx->id); + ret =3D map_identity_region(d, ctx, base_pfn, end_pfn, p2ma, flag); =20 - for ( ; base_pfn < end_pfn; ++base_pfn ) + if ( ret ) { - int err =3D set_identity_p2m_entry(d, base_pfn, p2ma, flag); - - if ( !err ) - continue; - - if ( (map->base >> PAGE_SHIFT_4K) =3D=3D base_pfn ) - { - list_del(&map->list); - xfree(map); - } - return err; + xfree(map); + return ret; } =20 + list_add(&map->list, &ctx->arch.identity_maps); return 0; } =20 @@ -385,7 +463,7 @@ static int __hwdom_init cf_check identity_map(unsigned = long s, unsigned long e, if ( iomem_access_permitted(d, s, s) ) { rc =3D iommu_map(d, _dfn(s), _mfn(s), 1, perms, - &info->flush_flags); + &info->flush_flags, 0); if ( rc < 0 ) break; /* Must map a frame at least, which is what we request for= . */ @@ -395,7 +473,7 @@ static int __hwdom_init cf_check identity_map(unsigned = long s, unsigned long e, s++; } while ( (rc =3D iommu_map(d, _dfn(s), _mfn(s), e - s + 1, - perms, &info->flush_flags)) > 0 ) + perms, &info->flush_flags, 0)) > 0 ) { s +=3D rc; process_pending_softirqs(); @@ -522,7 +600,7 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *= d) map_data.mmio_ro ? "read-only " : "", rc); =20 /* Use if to avoid compiler warning */ - if ( iommu_iotlb_flush_all(d, map_data.flush_flags) ) + if ( iommu_iotlb_flush_all(d, 0, map_data.flush_flags) ) return; } =20 @@ -579,14 +657,11 @@ int iommu_free_pgtables(struct domain *d, struct iomm= u_context *ctx) if ( !is_iommu_enabled(d) ) return 0; =20 - /* After this barrier, no new IOMMU mappings can be inserted. */ - spin_barrier(&ctx->arch.mapping_lock); - /* * Pages will be moved to the free list below. So we want to * clear the root page-table to avoid any potential use after-free. */ - iommu_vcall(hd->platform_ops, clear_root_pgtable, d); + iommu_vcall(hd->platform_ops, clear_root_pgtable, d, ctx); =20 while ( (pg =3D page_list_remove_head(&ctx->arch.pgtables)) ) { diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h index 91f106968e..8c20f575ee 100644 --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -187,11 +187,10 @@ enum */ long __must_check iommu_map(struct domain *d, dfn_t dfn0, mfn_t mfn0, unsigned long page_count, unsigned int flags, - unsigned int *flush_flags); + unsigned int *flush_flags, u16 ctx_id); long __must_check iommu_unmap(struct domain *d, dfn_t dfn0, unsigned long page_count, unsigned int flags, - unsigned int *flush_flags); - + unsigned int *flush_flags, u16 ctx_id); int __must_check iommu_legacy_map(struct domain *d, dfn_t dfn, mfn_t mfn, unsigned long page_count, unsigned int flags); @@ -199,12 +198,13 @@ int __must_check iommu_legacy_unmap(struct domain *d,= dfn_t dfn, unsigned long page_count); =20 int __must_check iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn, - unsigned int *flags); + unsigned int *flags, u16 ctx_id); =20 int __must_check iommu_iotlb_flush(struct domain *d, dfn_t dfn, unsigned long page_count, - unsigned int flush_flags); -int __must_check iommu_iotlb_flush_all(struct domain *d, + unsigned int flush_flags, + u16 ctx_id); +int __must_check iommu_iotlb_flush_all(struct domain *d, u16 ctx_id, unsigned int flush_flags); =20 enum iommu_feature @@ -321,20 +321,32 @@ struct page_info; */ typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, u32 id, void *ct= xt); =20 +#define IOMMU_INVALID_CONTEXT_ID 0xFFFF + +struct iommu_context; + struct iommu_ops { unsigned long page_sizes; int (*init)(struct domain *d); void (*hwdom_init)(struct domain *d); - int (*quarantine_init)(device_t *dev, bool scratch_page); - int (*add_device)(uint8_t devfn, device_t *dev); - int (*enable_device)(device_t *dev); - int (*remove_device)(uint8_t devfn, device_t *dev); - int (*assign_device)(struct domain *d, uint8_t devfn, device_t *dev, - uint32_t flag); - int (*reassign_device)(struct domain *s, struct domain *t, - uint8_t devfn, device_t *dev); + int (*context_init)(struct domain *d, struct iommu_context *ctx, + u32 flags); + int (*context_teardown)(struct domain *d, struct iommu_context *ctx, + u32 flags); + int (*attach)(struct domain *d, device_t *dev, + struct iommu_context *ctx); + int (*detach)(struct domain *d, device_t *dev, + struct iommu_context *prev_ctx); + int (*reattach)(struct domain *d, device_t *dev, + struct iommu_context *prev_ctx, + struct iommu_context *ctx); + #ifdef CONFIG_HAS_PCI int (*get_device_group_id)(uint16_t seg, uint8_t bus, uint8_t devfn); + int (*add_devfn)(struct domain *d, struct pci_dev *pdev, u16 devfn, + struct iommu_context *ctx); + int (*remove_devfn)(struct domain *d, struct pci_dev *pdev, u16 devfn, + struct iommu_context *prev_ctx); #endif /* HAS_PCI */ =20 void (*teardown)(struct domain *d); @@ -345,12 +357,15 @@ struct iommu_ops { */ int __must_check (*map_page)(struct domain *d, dfn_t dfn, mfn_t mfn, unsigned int flags, - unsigned int *flush_flags); + unsigned int *flush_flags, + struct iommu_context *ctx); int __must_check (*unmap_page)(struct domain *d, dfn_t dfn, unsigned int order, - unsigned int *flush_flags); + unsigned int *flush_flags, + struct iommu_context *ctx); int __must_check (*lookup_page)(struct domain *d, dfn_t dfn, mfn_t *mf= n, - unsigned int *flags); + unsigned int *flags, + struct iommu_context *ctx); =20 #ifdef CONFIG_X86 int (*enable_x2apic)(void); @@ -363,14 +378,15 @@ struct iommu_ops { int (*setup_hpet_msi)(struct msi_desc *msi_desc); =20 void (*adjust_irq_affinities)(void); - void (*clear_root_pgtable)(struct domain *d); + void (*clear_root_pgtable)(struct domain *d, struct iommu_context *ctx= ); int (*update_ire_from_msi)(struct msi_desc *msi_desc, struct msi_msg *= msg); #endif /* CONFIG_X86 */ =20 int __must_check (*suspend)(void); void (*resume)(void); void (*crash_shutdown)(void); - int __must_check (*iotlb_flush)(struct domain *d, dfn_t dfn, + int __must_check (*iotlb_flush)(struct domain *d, + struct iommu_context *ctx, dfn_t dfn, unsigned long page_count, unsigned int flush_flags); int (*get_reserved_device_memory)(iommu_grdm_t *func, void *ctxt); @@ -418,16 +434,37 @@ extern int iommu_get_extra_reserved_device_memory(iom= mu_grdm_t *func, =20 struct iommu_context { #ifdef CONFIG_HAS_PASSTHROUGH - u16 id; /* Context id (0 means default context) */ + uint16_t id; /* Context id (0 means default context) */ + rspinlock_t lock; /* context lock */ + + struct list_head devices; =20 struct arch_iommu_context arch; + + bool opaque; /* context can't be modified nor accessed (e.g HAP) */ + bool dying; /* the context is tearing down */ #endif }; =20 +struct iommu_context_list { + atomic_t initialized; /* has/is context list being initialized ? */ + rwlock_t lock; /* prevent concurrent destruction and access of context= s */ + uint16_t count; /* Context count excluding default context */ + + /* if count > 0 */ + + unsigned long *bitmap; /* bitmap of context allocation */ + struct iommu_context *map; /* Map of contexts */ +}; + + struct domain_iommu { + #ifdef CONFIG_HAS_PASSTHROUGH struct arch_iommu arch; + struct iommu_context default_ctx; + struct iommu_context_list other_contexts; #endif =20 /* iommu_ops */ @@ -491,6 +528,8 @@ void iommu_resume(void); void iommu_crash_shutdown(void); void iommu_quiesce(void); int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt); + +int __init iommu_quarantine_init(void); int iommu_quarantine_dev_init(device_t *dev); =20 #ifdef CONFIG_HAS_PCI @@ -500,6 +539,26 @@ int iommu_do_pci_domctl(struct xen_domctl *domctl, str= uct domain *d, =20 void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev); =20 + +struct iommu_context *iommu_get_context(struct domain *d, u16 ctx_id); +void iommu_put_context(struct iommu_context *ctx); + +#define IOMMU_CONTEXT_INIT_default (1 << 0) +#define IOMMU_CONTEXT_INIT_quarantine (1 << 1) +int iommu_context_init(struct domain *d, struct iommu_context *ctx, u16 ct= x_id, u32 flags); + +#define IOMMU_TEARDOWN_REATTACH_DEFAULT (1 << 0) +#define IOMMU_TEARDOWN_PREEMPT (1 << 1) +int iommu_context_teardown(struct domain *d, struct iommu_context *ctx, u3= 2 flags); + +int iommu_context_alloc(struct domain *d, u16 *ctx_id, u32 flags); +int iommu_context_free(struct domain *d, u16 ctx_id, u32 flags); + +int iommu_reattach_context(struct domain *prev_dom, struct domain *next_do= m, + device_t *dev, u16 ctx_id); +int iommu_attach_context(struct domain *d, device_t *dev, u16 ctx_id); +int iommu_detach_context(struct domain *d, device_t *dev); + /* * The purpose of the iommu_dont_flush_iotlb optional cpu flag is to * avoid unecessary iotlb_flush in the low level IOMMU code. diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 130c2a8c1a..acc2229ccf 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -102,6 +102,7 @@ struct pci_dev_info { struct pci_dev { struct list_head alldevs_list; struct list_head domain_list; + struct list_head context_list; =20 struct list_head msi_list; =20 @@ -109,6 +110,8 @@ struct pci_dev { =20 struct domain *domain; =20 + uint16_t context; /* IOMMU context number of domain */ + const union { struct { uint8_t devfn; --=20 2.51.2 -- Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sun Feb 8 01:17:05 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=reject dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1763637033; cv=none; d=zohomail.com; s=zohoarc; b=Vx8jdrsBVbjUZDNKkRIN+jdL2a8S9PtEcAKX+0UtMapxCZYCKbXeI/j0ECLl2z2SYqEGSB8JcYneHevii+5vOlmehdlt0SxNHzUHhjXRBgAZ1DOMeXJNcYXUFqzMmcyO/j2AjzRyVtHQEQHVfCLYbA6+FbufC8foSiWYBk+usOI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1763637033; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=qKPhb2hXtbAfqWjmWJKnX0A4SzctJuOE8fF03rvyg9I=; b=h//LHpop2byDKygZHCyWCQHKsBrDjCaYrwPjoBVA2T1wj8+1nOcWEpHqDgLb4uQUvXzGL1yrcq4VkgW9PE4rOA8aKZ+Qq87jexeGaHeXNfP6qqy284hpVY5wrH9mf6ymslmQT7DzAvzRFr1x2YcLyl4CKXhSe0JDP86+McmlW5Q= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1763637033148405.193318499298; Thu, 20 Nov 2025 03:10:33 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.1166865.1493354 (Exim 4.92) (envelope-from ) id 1vM2Y9-00046k-Lf; Thu, 20 Nov 2025 11:10:05 +0000 Received: by outflank-mailman (output) from mailman id 1166865.1493354; Thu, 20 Nov 2025 11:10:05 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2Y9-00046G-Gf; Thu, 20 Nov 2025 11:10:05 +0000 Received: by outflank-mailman (input) for mailman id 1166865; Thu, 20 Nov 2025 11:10:04 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2Y8-0001P8-0K for xen-devel@lists.xenproject.org; Thu, 20 Nov 2025 11:10:04 +0000 Received: from mail180-15.suw31.mandrillapp.com (mail180-15.suw31.mandrillapp.com [198.2.180.15]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id 7557369c-c601-11f0-980a-7dc792cee155; Thu, 20 Nov 2025 12:10:02 +0100 (CET) Received: from pmta11.mandrill.prod.suw01.rsglab.com (localhost [127.0.0.1]) by mail180-15.suw31.mandrillapp.com (Mailchimp) with ESMTP id 4dBwc91dwXzPm1CGs for ; Thu, 20 Nov 2025 11:10:01 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id c483da0948f24feab05d81858fa32721; Thu, 20 Nov 2025 11:10:01 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 7557369c-c601-11f0-980a-7dc792cee155 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1763637001; x=1763907001; bh=qKPhb2hXtbAfqWjmWJKnX0A4SzctJuOE8fF03rvyg9I=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=FTLge2cHpiKJnRrHDJ64MR5X79WaTt/46yLUsLJ/IkjJo6bBMcQDcr0ZJ+73JkNPx 5IRHzQ1uP+dbYCgnhGrgzqxT5C0j8H7MPDfdZQSN+9cL/5Cdb/mxN2OQUAtIJ8qGcL bIsLTeVE/1DoG2lCekEmau3uGIBvSAXv+MTIHCdqVqHxHa8lPTozn/m5YKaVg4KXlD wanSpmcnN2Gsiu0jY3fwXISq/kaP3yBItQC88Ec3kk2/qDvntp0oqfWDebLA2buarW 7x+Jj6ZKfHuvG9kVfXXoXPUer8l020osO3/C6f0RMfveVAEUcPv1WkkIss5oVHx2pb QRbx1L+fP4fBg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1763637001; x=1763897501; i=teddy.astie@vates.tech; bh=qKPhb2hXtbAfqWjmWJKnX0A4SzctJuOE8fF03rvyg9I=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=gwmx7H9UBGXIJ+ejSPA3g9q4F97LbnTMialepsat3rAEf+1L7HHXoacLOmpAGMmWK q+ReRr1OqytsIeXklVcidxI7SV90ZIJsVoBtn165vfxUF4g+6MUasdnKrTmBOeXlAq oarzUSaRYI3bOK8p8xl6OyVWc4OId22Yk6XSBNEGtUCb4PwhDTTsjOIFzQ7C/pfANH 8bbGR6t+8ylsyRlg3EGkPG5ScXrUquiw5FY/W0b8pCUIkN7wbmetNkJlboYbAsrIs9 T3ilDvIpKSwPj0dU1xK2xlTplgooAmaz8KvBYlBeET5913icfBwotY0cYe8l9Bd9qe +vvcuLj3NTQQQ== From: "Teddy Astie" Subject: =?utf-8?Q?[RFC=20PATCH=20v7=2009/14]=20iommu:=20Provide=20'X'=20debug=20key=20to=20dump=20IOMMU=20context=20infos?= X-Mailer: git-send-email 2.51.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1763637000187 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Jan Beulich" , "Andrew Cooper" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" , "Jason Andryuk" Message-Id: <6433e2b223d610ee7705abaa49bb27fc4233bf60.1763569135.git.teddy.astie@vates.tech> In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.c483da0948f24feab05d81858fa32721?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20251120:md Date: Thu, 20 Nov 2025 11:10:01 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity teddy.astie@vates.tech) (identity @mandrillapp.com) X-ZM-MESSAGEID: 1763637035665018900 Content-Type: text/plain; charset="utf-8" It is often useful to know the state of all contexts and what devices are in it. It can also be used to know what device belong to which domains. Signed-off-by: Teddy Astie --- v7: introduced xen/drivers/passthrough/amd/iommu_init.c | 46 ++++++++++++++++++++++++ xen/drivers/passthrough/vtd/iommu.c | 45 ++++++++++++++++++++++- 2 files changed, 90 insertions(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthr= ough/amd/iommu_init.c index bf32b6c718..1c38ac0369 100644 --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -1332,6 +1332,51 @@ static int __init cf_check amd_iommu_setup_device_ta= ble( return 0; } =20 +static void cf_check amd_dump_domain_iommu_contexts(struct domain *d) +{ + unsigned int i, iommu_no; + struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx; + struct pci_dev *pdev; + + if (d =3D=3D dom_io) + printk("d[IO] contexts\n"); + else + printk("d%hu contexts\n", d->domain_id); + + for (i =3D 0; i < (1 + hd->other_contexts.count); ++i) + { + if ( (ctx =3D iommu_get_context(d, i)) ) + { + printk(" Context %d (%"PRI_mfn")\n", i, + mfn_x(page_to_mfn(ctx->arch.amd.root_table))); + + for (iommu_no =3D 0; iommu_no < nr_amd_iommus; iommu_no++) + printk(" IOMMU %u (used=3D%lu; did=3D%hu)\n", iommu_no, + ctx->arch.amd.iommu_dev_cnt[iommu_no], + ctx->arch.amd.didmap[iommu_no]); + + list_for_each_entry(pdev, &ctx->devices, context_list) + { + printk(" - %pp\n", &pdev->sbdf); + } + + iommu_put_context(ctx); + } + } +} + +static void cf_check amd_dump_iommu_contexts(unsigned char key) +{ + struct domain *d; + + for_each_domain(d) + if (is_iommu_enabled(d)) + amd_dump_domain_iommu_contexts(d); + + amd_dump_domain_iommu_contexts(dom_io); +} + /* Check whether SP5100 SATA Combined mode is on */ static bool __init amd_sp5100_erratum28(void) { @@ -1486,6 +1531,7 @@ int __init amd_iommu_init(bool xt) register_keyhandler('V', &amd_iommu_dump_intremap_tables, "dump IOMMU intremap tables", 0); =20 + register_keyhandler('X', amd_dump_iommu_contexts, "dump iommu contexts= ", 1); return 0; =20 error_out: diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/= vtd/iommu.c index 3319903297..a602edd755 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1954,6 +1954,49 @@ static int __must_check init_vtd_hw(bool resume) return iommu_flush_all(); } =20 +static void cf_check vtd_dump_domain_contexts(struct domain *d) +{ + unsigned int i, iommu_no; + struct pci_dev *pdev; + struct iommu_context *ctx; + struct domain_iommu *hd =3D dom_iommu(d); + + if (d =3D=3D dom_io) + printk("d[IO] contexts\n"); + else + printk("d%hu contexts\n", d->domain_id); + + for (i =3D 0; i < (1 + hd->other_contexts.count); ++i) + { + if ( (ctx =3D iommu_get_context(d, i)) ) + { + printk(" Context %d (%"PRIx64")\n", i, ctx->arch.vtd.pgd_maddr= ); + + for (iommu_no =3D 0; iommu_no < nr_iommus; iommu_no++) + printk(" IOMMU %u (used=3D%lu; did=3D%hu)\n", iommu_no, + ctx->arch.vtd.iommu_dev_cnt[iommu_no], + ctx->arch.vtd.didmap[iommu_no]); + + list_for_each_entry(pdev, &ctx->devices, context_list) + { + printk(" - %pp\n", &pdev->sbdf); + } + + iommu_put_context(ctx); + } + } +} + +static void cf_check vtd_dump_contexts(unsigned char key) +{ + struct domain *d; + + for_each_domain(d) + if (is_iommu_enabled(d)) + vtd_dump_domain_contexts(d); + + vtd_dump_domain_contexts(dom_io); +} =20 static struct iommu_state { uint32_t fectl; @@ -2088,7 +2131,7 @@ static int __init cf_check vtd_setup(void) iommu_ops.page_sizes |=3D large_sizes; =20 register_keyhandler('V', vtd_dump_iommu_info, "dump iommu info", 1); - + register_keyhandler('X', vtd_dump_contexts, "dump iommu contexts", 1); return 0; =20 error: --=20 2.51.2 -- Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sun Feb 8 01:17:05 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=reject dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1763637034; cv=none; d=zohomail.com; s=zohoarc; b=RGlHcsCKr7xb/X1MHCKcuw1NYnvnqlfde8mG5oIkN8oaSbeF5q9y3xjKyM/PtfIdTOyeVBDx76UisTR6NDh1HfJH7XTjIzPONyX4/njqUHvcv+LedymHM6pFGtcaVynbqkyhwIsoy1DTwSwaHCaTnz+CD7FPG68GARcFTwsHrTI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1763637034; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=hTbgBPnZ4+R3igGlDozv+3Pws+sR6+GzT16fzmrSWCI=; b=QsFPalX9XL6rAQM9SrISHQz5KKszXM3vxGYsD2uebPHUArKb/FXBVrDzYrXSxN9l4eyJ/Rns9/GwUOD0AlDXGhQHNLUsDdRrEZq+3jO2pUkSac7wZFZ/7pSt5LhKOgpPkSHnirvuy5KuoaDWj7YZ4xq7mreBWJtnVQ1IGSSzQ/g= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1763637034599505.3290053269005; Thu, 20 Nov 2025 03:10:34 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.1166867.1493367 (Exim 4.92) (envelope-from ) id 1vM2YB-0004LE-5t; Thu, 20 Nov 2025 11:10:07 +0000 Received: by outflank-mailman (output) from mailman id 1166867.1493367; Thu, 20 Nov 2025 11:10:06 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2YA-0004JQ-NJ; Thu, 20 Nov 2025 11:10:06 +0000 Received: by outflank-mailman (input) for mailman id 1166867; Thu, 20 Nov 2025 11:10:05 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2Y9-0001P8-A8 for xen-devel@lists.xenproject.org; Thu, 20 Nov 2025 11:10:05 +0000 Received: from mail128-17.atl41.mandrillapp.com (mail128-17.atl41.mandrillapp.com [198.2.128.17]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id 761cd388-c601-11f0-980a-7dc792cee155; Thu, 20 Nov 2025 12:10:03 +0100 (CET) Received: from pmta08.mandrill.prod.atl01.rsglab.com (localhost [127.0.0.1]) by mail128-17.atl41.mandrillapp.com (Mailchimp) with ESMTP id 4dBwcB3knszCf9RNd for ; Thu, 20 Nov 2025 11:10:02 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id 518358f759774a67b695c403d1301a41; Thu, 20 Nov 2025 11:10:02 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 761cd388-c601-11f0-980a-7dc792cee155 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1763637002; x=1763907002; bh=hTbgBPnZ4+R3igGlDozv+3Pws+sR6+GzT16fzmrSWCI=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=bAViuhfZH1RbhwQuqnOiQN/RLA/q3Yv/gHT8p9EwM/ohFbekiVmDwuChfbV3fZpLU bJ9a4H6i70BveMoVdKshoXGCdc0spTTagw9ZrKXHgUj5uiuR0ULEV2Yu4nonuKfY5L pCb21nW7as7zy6bAglVexOiGuavI1F1v+GFV/NwRZgBZDcqbYFFs1QAMhTJrJ0jbMm sqnVqZJ9nqwQ9LXbaMWZqjBgrmdUk8V0dKouQhhmoUdyk9E4vWMrc0ID3GF6ip4A2g gcTphcgRzhx/EDowHhk6gzlcGnFYOF9EJSFFd7PcCLKkv8S7n6BhJK352YVBzAv6pa dRVT1HSp6lBbQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1763637002; x=1763897502; i=teddy.astie@vates.tech; bh=hTbgBPnZ4+R3igGlDozv+3Pws+sR6+GzT16fzmrSWCI=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=yYkFGdMli4ZwogOhffPELT0dthYz9sMnFjmERUKx4NWVci7+EM8xQZ+icP6Qf9xpw 14Iy/Zz87NoRzwbD5phFRv5rY+z7l+0lbrx9TiI2h4T+t2XM16tB76Z9CIDri8mFtD dTxAdP8Repxkr5bWjlQOQ8oxTDbsGK44+08gwkWJEuL1Fm6i7+sbMhSCC7ML8W64fg gvnXn8Ycf1K/DCkMDIezP5Mh11963dsOWta7cHFgwYQrfOkU89lIAJO6WTNXhjm6xL lMFvSX5yzQpZr9dwIxkhISinWibFYJEgjONEYmcsNigcelnR2LwqJxVGnUcBp9Mg9a I9tKAakhEM3gg== From: "Teddy Astie" Subject: =?utf-8?Q?[RFC=20PATCH=20v7=2010/14]=20amd/iommu:=20Introduce=20lookup=20implementation?= X-Mailer: git-send-email 2.51.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1763637001462 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Jan Beulich" , "Andrew Cooper" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" , "Jason Andryuk" Message-Id: <6bee56dd62ad755fdaa41b60fedcfd50082ac6ad.1763569135.git.teddy.astie@vates.tech> In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.518358f759774a67b695c403d1301a41?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20251120:md Date: Thu, 20 Nov 2025 11:10:02 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @mandrillapp.com) (identity teddy.astie@vates.tech) X-ZM-MESSAGEID: 1763637035697018900 Content-Type: text/plain; charset="utf-8" PV-IOMMU requires a lookup implementation to behave properly, provide one. Signed-off-by: Teddy Astie --- v7: introduced --- xen/drivers/passthrough/amd/iommu.h | 2 + xen/drivers/passthrough/amd/iommu_map.c | 90 +++++++++++++++++++++ xen/drivers/passthrough/amd/pci_amd_iommu.c | 1 + 3 files changed, 93 insertions(+) diff --git a/xen/drivers/passthrough/amd/iommu.h b/xen/drivers/passthrough/= amd/iommu.h index 0bd0f15a72..de1442af1b 100644 --- a/xen/drivers/passthrough/amd/iommu.h +++ b/xen/drivers/passthrough/amd/iommu.h @@ -202,6 +202,8 @@ int __must_check cf_check amd_iommu_map_page( int __must_check cf_check amd_iommu_unmap_page( struct domain *d, dfn_t dfn, unsigned int order, unsigned int *flush_flags, struct iommu_context *ctx); +int cf_check amd_iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn, + unsigned int *flags, struct iommu_conte= xt *ctx); int amd_iommu_reserve_domain_unity_map(struct domain *d, struct iommu_cont= ext *ctx, const struct ivrs_unity_map *map, unsigned int flag); diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthro= ugh/amd/iommu_map.c index 01b36fdf4f..82f8eb85c8 100644 --- a/xen/drivers/passthrough/amd/iommu_map.c +++ b/xen/drivers/passthrough/amd/iommu_map.c @@ -622,6 +622,96 @@ int cf_check amd_iommu_flush_iotlb_pages( return 0; } =20 +static int lookup_pagewalk(struct page_info *table, dfn_t dfn, unsigned lo= ng level, + mfn_t *mfn, unsigned int *flags) +{ + int rc =3D 0; + union amd_iommu_pte entry, *pagetable; + + pagetable =3D __map_domain_page(table); + if ( !pagetable ) + return -ENOMEM; + + entry =3D pagetable[pfn_to_pde_idx(dfn_x(dfn), level)]; + + if ( !entry.pr || WARN_ON(!entry.mfn) ) + { + /* Missing mapping has no flag. */ + *flags =3D 0; + rc =3D -ENOENT; + goto out; + } + + /* + * AMD-Vi Specification, 2.2.3 I/O Page Tables for Host Translations + * + * Effective write permission is calculated using the IW(resp IR) bits= in the DTE, + * the I/O PDEs, and the I/O PTE. At each step of the translation proc= ess, + * I/O write permission (IW) bits (resp IR) from fetched page table en= tries are + * logically ANDed into cumulative I/O write permissions for the trans= lation + * including the IW (resp IR) bit in the DTE. + */ + + if ( !entry.ir ) + *flags &=3D ~IOMMUF_readable; + + if ( !entry.iw ) + *flags &=3D ~IOMMUF_writable; + + if ( entry.next_level ) + { + /* Go to the next mapping */ + if ( WARN_ON(entry.next_level >=3D level) ) + { + rc =3D -EILSEQ; + goto out; + } + + unmap_domain_page(pagetable); + return lookup_pagewalk(mfn_to_page(_mfn(entry.mfn)), dfn, entry.ne= xt_level, + mfn, flags); + } + else + { + /* + * Terminal mapping (either superpage or PTE). Compute that by com= bining entry + * address with dfn (for taking account of sub-entry frames in cas= e of a superpage). + */ + *mfn =3D _mfn(entry.mfn | + (dfn_x(dfn) & ((1ULL << ((level - 1) * PTE_PER_TABLE_SHIFT)= ) - 1))); + } + +out: + unmap_domain_page(pagetable); + return rc; +} + +int cf_check amd_iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn, + unsigned int *flags, struct iommu_conte= xt *ctx) +{ + struct page_info *root_table; + unsigned long level; + + if ( ctx->opaque ) + return -EOPNOTSUPP; + + if ( !ctx->arch.amd.root_table ) + return -ENOENT; + + root_table =3D ctx->arch.amd.root_table; + level =3D dom_iommu(d)->arch.amd.paging_mode; + + if ( dfn_x(dfn) >> (PTE_PER_TABLE_SHIFT * level) ) + return -ENOENT; + + /* + * We initially consider the page writable and readable, lookup_pagewa= lk will + * remove these flags if it is not actually the case. + */ + *flags |=3D IOMMUF_writable | IOMMUF_readable; + return lookup_pagewalk(root_table, dfn, level, mfn, flags); +} + int amd_iommu_reserve_domain_unity_map(struct domain *d, struct iommu_cont= ext *ctx, const struct ivrs_unity_map *map, unsigned int flag) diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/pass= through/amd/pci_amd_iommu.c index 3c17d78caf..3d08a925d6 100644 --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -732,6 +732,7 @@ static const struct iommu_ops __initconst_cf_clobber _i= ommu_ops =3D { .suspend =3D amd_iommu_suspend, .resume =3D amd_iommu_resume, .crash_shutdown =3D amd_iommu_crash_shutdown, + .lookup_page =3D amd_iommu_lookup_page, .get_reserved_device_memory =3D amd_iommu_get_reserved_device_memory, .dump_page_tables =3D amd_dump_page_tables, .quiesce =3D amd_iommu_quiesce, --=20 2.51.2 -- Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sun Feb 8 01:17:05 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=reject dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1763637030; cv=none; d=zohomail.com; s=zohoarc; b=EsT7gs0FuAH89IW9IQvvZTpEdVNJP/d/+D5QRL4M4iaON2iYXiJP7DOdaWRm/8G85pdnQ+fFk6EkKmGCCPQctb54pK9W4KiaFgYs4r0d8xTr1vnV2xqM20vG2PIup9+7/6CLbLhTOuqnAjq+i3Fv79sRtKFncrhigZ5iqNwchrk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1763637030; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=OapH2MHIAp2S9SzCUt8Ia8Mv+id8AIAMINW1iJ5dAfE=; b=SoWwszbqN1zH8eDzWrNGAUk/yDp6ROt9Q146TdYtWUNI+SBcIg9TgFQBW/2LZu0vP7zrpI3TJHlOkACl1W1AaxuLA1fs+8r1mQ5FL55yFWHSINRgQ7J5GoAqoL+/tuwU3x7vjnqKb0nmQDl8ocFnDXkvTCrLn6PcA97k8db3qPc= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1763637030398280.35211326145827; Thu, 20 Nov 2025 03:10:30 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.1166871.1493378 (Exim 4.92) (envelope-from ) id 1vM2YD-0004op-2j; Thu, 20 Nov 2025 11:10:09 +0000 Received: by outflank-mailman (output) from mailman id 1166871.1493378; Thu, 20 Nov 2025 11:10:09 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2YC-0004oA-Rp; Thu, 20 Nov 2025 11:10:08 +0000 Received: by outflank-mailman (input) for mailman id 1166871; Thu, 20 Nov 2025 11:10:06 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2YA-0001PI-HQ for xen-devel@lists.xenproject.org; Thu, 20 Nov 2025 11:10:06 +0000 Received: from mail128-17.atl41.mandrillapp.com (mail128-17.atl41.mandrillapp.com [198.2.128.17]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 773bb9eb-c601-11f0-9d18-b5c5bf9af7f9; Thu, 20 Nov 2025 12:10:05 +0100 (CET) Received: from pmta08.mandrill.prod.atl01.rsglab.com (localhost [127.0.0.1]) by mail128-17.atl41.mandrillapp.com (Mailchimp) with ESMTP id 4dBwcC6CBBzCf9RNt for ; Thu, 20 Nov 2025 11:10:03 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id 333301a5be55468c8714a95308cc1089; Thu, 20 Nov 2025 11:10:03 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 773bb9eb-c601-11f0-9d18-b5c5bf9af7f9 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1763637003; x=1763907003; bh=OapH2MHIAp2S9SzCUt8Ia8Mv+id8AIAMINW1iJ5dAfE=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=Jp2p+1KeTvywnHXgSG0oxl/a8lGjcpFy4rFBghie0PnFIpRiZQt2WTeyhgwqzDJ6F avpDRRFEqFc/SUEyQ2wlZVj0FVpTf4RYBAtBPkcSyY6+7PdQVViToBCCwQjp0HKMJ9 ePzbvESK6hKxEz27HGxiDUMVBZS+ctvcXKXFxawAY+oQB+M5USMIG8rBOaJwcnSetp cT9sAvA2wssSrckTqRaTFJQirtUhLOCIL10/2hK61Gyrt/+cVSptA/CaCvN8Wj75Zt BPekcCwXMJOWpXgrMHllpM3QBrbFKUbx3fwtGc15rmXJXZpqactD94qJnfAaM/DjJl IRsU4Zc6Ye+fQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1763637003; x=1763897503; i=teddy.astie@vates.tech; bh=OapH2MHIAp2S9SzCUt8Ia8Mv+id8AIAMINW1iJ5dAfE=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=uO4O5eVg6yTsY160J+28UUzEm9nDlhADy386XkTaha/1jnjIwd6dbf6QVg9RYngng cgYD3VSNGu1Da16Va2ehHUpwl7bwUX+8wMeJ/xG3TcJnimAZ3uS0dIbmPRcIy5HItu 31KGPrUZR1menlWjhGIaXOAN57oyI9rRUYDpfuNuSXnJyS6ncrXvFKr2GQ1OOsE4OV v3kwkXw6vdT+r/6YYTXMqtihg4XYfPZlHCGsO8gAB4ev5vvGZxFddhGA1y261uIorY fU7LnQe0P3aZ6WcdT6meFh8dzKLarsZkAhDv6k04j2x6rQCguwqCUS681g1ggvQTvQ A0camm/PTgl5g== From: "Teddy Astie" Subject: =?utf-8?Q?[RFC=20PATCH=20v7=2011/14]=20iommu:=20Introduce=20iommu=5Fget=5Fmax=5Fiova?= X-Mailer: git-send-email 2.51.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1763637002749 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Jan Beulich" , "Andrew Cooper" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" , "Jason Andryuk" Message-Id: In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.333301a5be55468c8714a95308cc1089?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20251120:md Date: Thu, 20 Nov 2025 11:10:03 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @mandrillapp.com) (identity teddy.astie@vates.tech) X-ZM-MESSAGEID: 1763637031737018900 Content-Type: text/plain; charset="utf-8" Provide a way to know what maximum IOVA is usable for DMA. This will be later used by PV-IOMMU to provide limits to guest. Signed-off-by: Teddy Astie --- v7: introduced --- xen/drivers/passthrough/amd/pci_amd_iommu.c | 15 +++++++++++++++ xen/drivers/passthrough/iommu.c | 10 ++++++++++ xen/drivers/passthrough/vtd/iommu.c | 8 ++++++++ xen/include/xen/iommu.h | 3 +++ 4 files changed, 36 insertions(+) diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/pass= through/amd/pci_amd_iommu.c index 3d08a925d6..4185e4cd64 100644 --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -706,6 +706,20 @@ static void cf_check amd_dump_page_tables(struct domai= n *d) } } =20 +static uint64_t cf_check amd_iommu_get_max_iova(struct domain *d) +{ + struct domain_iommu *hd =3D dom_iommu(d); + unsigned int bits =3D 12 + hd->arch.amd.paging_mode * 9; + + /* If paging_mode =3D=3D 6, which indicates 6-level page tables, + we have bits =3D=3D 66 while the GPA space is still 64-bits + */ + if (bits >=3D 64) + return ~0LLU; + + return (1LLU << bits) - 1; +} + static const struct iommu_ops __initconst_cf_clobber _iommu_ops =3D { .page_sizes =3D PAGE_SIZE_4K | PAGE_SIZE_2M | PAGE_SIZE_1G, .init =3D amd_iommu_domain_init, @@ -736,6 +750,7 @@ static const struct iommu_ops __initconst_cf_clobber _i= ommu_ops =3D { .get_reserved_device_memory =3D amd_iommu_get_reserved_device_memory, .dump_page_tables =3D amd_dump_page_tables, .quiesce =3D amd_iommu_quiesce, + .get_max_iova =3D amd_iommu_get_max_iova, }; =20 static const struct iommu_init_ops __initconstrel _iommu_init_ops =3D { diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iomm= u.c index feda2e390b..4434a9dcd0 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -1229,6 +1229,16 @@ bool iommu_has_feature(struct domain *d, enum iommu_= feature feature) return is_iommu_enabled(d) && test_bit(feature, dom_iommu(d)->features= ); } =20 +uint64_t iommu_get_max_iova(struct domain *d) +{ + struct domain_iommu *hd =3D dom_iommu(d); + + if ( !hd->platform_ops->get_max_iova ) + return 0; + + return iommu_call(hd->platform_ops, get_max_iova, d); +} + #define MAX_EXTRA_RESERVED_RANGES 20 struct extra_reserved_range { unsigned long start; diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/= vtd/iommu.c index a602edd755..af3c6fb178 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -2595,6 +2595,13 @@ static int cf_check intel_iommu_remove_devfn(struct = domain *d, struct pci_dev *p return unapply_context_single(d, drhd->iommu, prev_ctx, pdev->bus, dev= fn); } =20 +static uint64_t cf_check intel_iommu_get_max_iova(struct domain *d) +{ + struct domain_iommu *hd =3D dom_iommu(d); + + return (1LLU << agaw_to_width(hd->arch.vtd.agaw)) - 1; +} + static void cf_check vtd_quiesce(void) { const struct acpi_drhd_unit *drhd; @@ -2644,6 +2651,7 @@ static const struct iommu_ops __initconst_cf_clobber = vtd_ops =3D { .get_reserved_device_memory =3D intel_iommu_get_reserved_device_memory, .dump_page_tables =3D vtd_dump_page_tables, .quiesce =3D vtd_quiesce, + .get_max_iova =3D intel_iommu_get_max_iova, }; =20 const struct iommu_init_ops __initconstrel intel_iommu_init_ops =3D { diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h index 8c20f575ee..66951c9809 100644 --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -403,6 +403,9 @@ struct iommu_ops { #endif /* Inhibit all interrupt generation, to be used at shutdown. */ void (*quiesce)(void); + + /* Get maximum domain device address (IOVA). */ + uint64_t (*get_max_iova)(struct domain *d); }; =20 /* --=20 2.51.2 -- Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sun Feb 8 01:17:05 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=reject dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1763638897; cv=none; d=zohomail.com; s=zohoarc; b=R1iUbuNYZ3kBvmxa12GPd2W3+PBmLk+k6T7M6pLTalcU6nt/Wces5bgGIPF9N2uB37zy0uiwFWTqgyT+4a4GI9yg2gL7HyC+yQ7qUXN3CAOKr5SuvhxGIWI00jKTnSZgvNd6+RU3Bptd0+BBhFkPVC3ZVUvyot1Uqf0PDNJIdzc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1763638897; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=z1hG9N4QPItf8MjLZ/v2lUzA7qCjlYVIHr5mohm0MpM=; b=kRTfTuHp2PVqCTwto7/zSaFHfziUnx+YnyrqPA5ilPRliH+gyFJBV2HssW9QRpSPvAvZ8jEMWit2MYvdN/opi473Td/9hkAGeSwvYugqpjYv2OhlVlrKKykunu4zFIerdewjmebxgdX4PkGIlhUwG+bd3flGARPTvZsYveda1hU= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1763638897394167.7721091006032; Thu, 20 Nov 2025 03:41:37 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.1167037.1493414 (Exim 4.92) (envelope-from ) id 1vM2io-0003FK-V6; Thu, 20 Nov 2025 11:21:06 +0000 Received: by outflank-mailman (output) from mailman id 1167037.1493414; Thu, 20 Nov 2025 11:21:06 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2io-0003FD-SB; Thu, 20 Nov 2025 11:21:06 +0000 Received: by outflank-mailman (input) for mailman id 1167037; Thu, 20 Nov 2025 11:21:05 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2YD-0001PI-IP for xen-devel@lists.xenproject.org; Thu, 20 Nov 2025 11:10:09 +0000 Received: from mail128-17.atl41.mandrillapp.com (mail128-17.atl41.mandrillapp.com [198.2.128.17]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 77b5de54-c601-11f0-9d18-b5c5bf9af7f9; Thu, 20 Nov 2025 12:10:06 +0100 (CET) Received: from pmta08.mandrill.prod.atl01.rsglab.com (localhost [127.0.0.1]) by mail128-17.atl41.mandrillapp.com (Mailchimp) with ESMTP id 4dBwcD5JjyzCf9RNh for ; Thu, 20 Nov 2025 11:10:04 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id bf55f820ed1749f9b0cf76669c1d72e5; Thu, 20 Nov 2025 11:10:04 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 77b5de54-c601-11f0-9d18-b5c5bf9af7f9 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1763637004; x=1763907004; bh=z1hG9N4QPItf8MjLZ/v2lUzA7qCjlYVIHr5mohm0MpM=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=Uf31AYLMol5RGUzToVdTZk6SizCLgsO7ABI2Jp8WIeAq2riKw7lGhL+uEsDAVEN3g 0nHMh5ycgtGATI3j/pcHJbRmIQKkWuvsWbtYaSwJe3DOGrCvWQYRAT/JobA92bR6CZ Pme4SaBYYgX+IBko/+cYRoNwy739J9cwbQ3SJnEZ3OZgUIK3Gn4TaI6T/1l2TSrQkq 0Fzkw2RjYaO8DFc2Tx14l8/cB3OENnsegmmvyRM2Asu09wYS398aXHaPTBYJbWicer rIW4h/rboZh3Ojg+9QmZ+ei4EfKuXuNQ91KjWxvx37zGmTmsQh0DzE3eSsbD25A/a+ R2MujedILLEOw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1763637004; x=1763897504; i=teddy.astie@vates.tech; bh=z1hG9N4QPItf8MjLZ/v2lUzA7qCjlYVIHr5mohm0MpM=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=CDozOPM2eVx0uc5qA54CxxiJIaWD0mnUFZmwoWKwcAEhmryz6Pa70VNNVHsfj3nKW c98S37cgQzSwX/F0DUEm7ZWZMOJfg3DKs6yiOank5em4oFCOBsWhqdqsuEyTl/TnIv eRloXmhqBSg14P+q/Hza67G4jLMM9jegOJQiijWLt9Cc+arnKTn6byTKsvw5WF0FD6 EAUTXKZpikRqsbCO9IBvvaNX/qVY6kuC5EFKCoXakTTai2kMy5gT3WxXlBlJg4/LDs UfLXBNRKqOC4zwEuJaIqGWSaiCQHSfCq6ime60i24s9TonmsaV/GciysfbphzVnzfc Y8lI5g/WZ5a1A== From: "Teddy Astie" Subject: =?utf-8?Q?[RFC=20PATCH=20v7=2012/14]=20x86/iommu:=20Introduce=20IOMMU=20arena?= X-Mailer: git-send-email 2.51.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1763637003914 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Jan Beulich" , "Andrew Cooper" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" Message-Id: <08caa05849ee029c409b11e9783e2b8237632a94.1763569135.git.teddy.astie@vates.tech> In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.bf55f820ed1749f9b0cf76669c1d72e5?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20251120:md Date: Thu, 20 Nov 2025 11:10:04 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @mandrillapp.com) (identity teddy.astie@vates.tech) X-ZM-MESSAGEID: 1763638900321018900 Content-Type: text/plain; charset="utf-8" Introduce a new facility that reserves a fixed amount of contiguous pages and provide a way to allocate them. It is used to ensure that the guest cannot cause the hypervisor to OOM with unconstrained allocations by abusing the PV-IOMMU interface. Signed-off-by Teddy Astie --- xen/arch/x86/include/asm/arena.h | 54 +++++++++ xen/arch/x86/include/asm/iommu.h | 3 + xen/drivers/passthrough/x86/Makefile | 1 + xen/drivers/passthrough/x86/arena.c | 157 +++++++++++++++++++++++++++ 4 files changed, 215 insertions(+) create mode 100644 xen/arch/x86/include/asm/arena.h create mode 100644 xen/drivers/passthrough/x86/arena.c diff --git a/xen/arch/x86/include/asm/arena.h b/xen/arch/x86/include/asm/ar= ena.h new file mode 100644 index 0000000000..7555b100e0 --- /dev/null +++ b/xen/arch/x86/include/asm/arena.h @@ -0,0 +1,54 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/** + * Simple arena-based page allocator. + */ + +#ifndef __XEN_IOMMU_ARENA_H__ +#define __XEN_IOMMU_ARENA_H__ + +#include "xen/domain.h" +#include "xen/atomic.h" +#include "xen/mm-frame.h" +#include "xen/types.h" + +/** + * struct page_arena: Page arena structure + */ +struct iommu_arena { + /* mfn of the first page of the memory region */ + mfn_t region_start; + /* bitmap of allocations */ + unsigned long *map; + + /* Order of the arena */ + unsigned int order; + + /* Used page count */ + atomic_t used_pages; +}; + +/** + * Initialize a arena using domheap allocator. + * @param [out] arena Arena to allocate + * @param [in] domain domain that has ownership of arena pages + * @param [in] order order of the arena (power of two of the size) + * @param [in] memflags Flags for domheap_alloc_pages() + * @return -ENOMEM on arena allocation error, 0 otherwise + */ +int iommu_arena_initialize(struct iommu_arena *arena, struct domain *domai= n, + unsigned int order, unsigned int memflags); + +/** + * Teardown a arena. + * @param [out] arena arena to allocate + * @param [in] check check for existing allocations + * @return -EBUSY if check is specified + */ +int iommu_arena_teardown(struct iommu_arena *arena, bool check); + +struct page_info *iommu_arena_allocate_page(struct iommu_arena *arena); +bool iommu_arena_free_page(struct iommu_arena *arena, struct page_info *pa= ge); + +#define iommu_arena_size(arena) (1LLU << (arena)->order) + +#endif diff --git a/xen/arch/x86/include/asm/iommu.h b/xen/arch/x86/include/asm/io= mmu.h index 654a07b9b2..452b98b42d 100644 --- a/xen/arch/x86/include/asm/iommu.h +++ b/xen/arch/x86/include/asm/iommu.h @@ -12,6 +12,8 @@ #include #include =20 +#include "arena.h" + #define DEFAULT_DOMAIN_ADDRESS_WIDTH 48 =20 struct g2m_ioport { @@ -62,6 +64,7 @@ struct arch_iommu { /* Queue for freeing pages */ struct page_list_head free_queue; + struct iommu_arena pt_arena; /* allocator for non-default contexts */ =20 union { /* Intel VT-d */ diff --git a/xen/drivers/passthrough/x86/Makefile b/xen/drivers/passthrough= /x86/Makefile index 75b2885336..1614f3d284 100644 --- a/xen/drivers/passthrough/x86/Makefile +++ b/xen/drivers/passthrough/x86/Makefile @@ -1,2 +1,3 @@ obj-y +=3D iommu.o +obj-y +=3D arena.o obj-$(CONFIG_HVM) +=3D hvm.o diff --git a/xen/drivers/passthrough/x86/arena.c b/xen/drivers/passthrough/= x86/arena.c new file mode 100644 index 0000000000..984bc4d643 --- /dev/null +++ b/xen/drivers/passthrough/x86/arena.c @@ -0,0 +1,157 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/** + * Simple arena-based page allocator. + * + * Allocate a large block using alloc_domheam_pages and allocate single pa= ges + * using iommu_arena_allocate_page and iommu_arena_free_page functions. + * + * Concurrent {allocate/free}_page is thread-safe + * iommu_arena_teardown during {allocate/free}_page is not thread-safe. + * + * Written by Teddy Astie + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +/* Maximum of scan tries if the bit found not available */ +#define ARENA_TSL_MAX_TRIES 5 + +int iommu_arena_initialize(struct iommu_arena *arena, struct domain *d, + unsigned int order, unsigned int memflags) +{ + struct page_info *page; + + /* TODO: Maybe allocate differently ? */ + page =3D alloc_domheap_pages(d, order, memflags); + + if ( !page ) + return -ENOMEM; + + arena->map =3D xzalloc_array(unsigned long, BITS_TO_LONGS(1LLU << orde= r)); + arena->order =3D order; + arena->region_start =3D page_to_mfn(page); + + _atomic_set(&arena->used_pages, 0); + bitmap_zero(arena->map, iommu_arena_size(arena)); + + printk(XENLOG_DEBUG "IOMMU: Allocated arena (%llu pages, start=3D%"PRI= _mfn")\n", + iommu_arena_size(arena), mfn_x(arena->region_start)); + return 0; +} + +int iommu_arena_teardown(struct iommu_arena *arena, bool check) +{ + BUG_ON(mfn_x(arena->region_start) =3D=3D 0); + + /* Check for allocations if check is specified */ + if ( check && (atomic_read(&arena->used_pages) > 0) ) + return -EBUSY; + + free_domheap_pages(mfn_to_page(arena->region_start), arena->order); + + arena->region_start =3D _mfn(0); + _atomic_set(&arena->used_pages, 0); + xfree(arena->map); + arena->map =3D NULL; + + return 0; +} + +struct page_info *iommu_arena_allocate_page(struct iommu_arena *arena) +{ + unsigned int index; + unsigned int tsl_tries =3D 0; + + BUG_ON(mfn_x(arena->region_start) =3D=3D 0); + + if ( atomic_read(&arena->used_pages) =3D=3D iommu_arena_size(arena) ) + /* All pages used */ + return NULL; + + do + { + index =3D find_first_zero_bit(arena->map, iommu_arena_size(arena)); + + if ( index >=3D iommu_arena_size(arena) ) + /* No more free pages */ + return NULL; + + /* + * While there shouldn't be a lot of retries in practice, this loop + * *may* run indefinetly if the found bit is never free due to bei= ng + * overwriten by another CPU core right after. Add a safeguard for + * such very rare cases. + */ + tsl_tries++; + + if ( unlikely(tsl_tries =3D=3D ARENA_TSL_MAX_TRIES) ) + { + printk(XENLOG_ERR "ARENA: Too many TSL retries !"); + return NULL; + } + + /* Make sure that the bit we found is still free */ + } while ( test_and_set_bit(index, arena->map) ); + + atomic_inc(&arena->used_pages); + + return mfn_to_page(mfn_add(arena->region_start, index)); +} + +bool iommu_arena_free_page(struct iommu_arena *arena, struct page_info *pa= ge) +{ + unsigned long index; + mfn_t frame; + + if ( !page ) + { + printk(XENLOG_WARNING "IOMMU: Trying to free NULL page"); + WARN(); + return false; + } + + frame =3D page_to_mfn(page); + + /* Check if page belongs to our arena */ + if ( (mfn_x(frame) < mfn_x(arena->region_start)) + || (mfn_x(frame) >=3D (mfn_x(arena->region_start) + iommu_arena_si= ze(arena))) ) + { + printk(XENLOG_WARNING + "IOMMU: Trying to free outside arena region [mfn=3D%"PRI_mf= n"]", + mfn_x(frame)); + WARN(); + return false; + } + + index =3D mfn_x(frame) - mfn_x(arena->region_start); + + /* Sanity check in case of underflow. */ + ASSERT(index < iommu_arena_size(arena)); + + if ( !test_and_clear_bit(index, arena->map) ) + { + /* + * Bit was free during our arena_free_page, which means that + * either this page was never allocated, or we are in a double-free + * situation. + */ + printk(XENLOG_WARNING + "IOMMU: Freeing non-allocated region (double-free?) [mfn=3D= %"PRI_mfn"]", + mfn_x(frame)); + WARN(); + return false; + } + + atomic_dec(&arena->used_pages); + + return true; +} \ No newline at end of file --=20 2.51.2 -- Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sun Feb 8 01:17:05 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=reject dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1763637261; cv=none; d=zohomail.com; s=zohoarc; b=Ycijhk5RxGst41Ova28wpGXRU0rxhoGfudgbCL8nKFScYlTvqKYdGLf+mb8HKw7B+Kyl1Uihq+QWaV7p0s1UVDBcd87gX7xA+j/sBoN85mgvk1VB/eXooKFogroUqdBCgpJmG4mMq6uKFE38HKnYPAOy1itf0JLRQ8WaHnq7dro= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1763637261; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=c/JljpXkKEnMLI4s/6MD5SMOH2GQOJrsu01e1YgqiEs=; b=FC7ywiRFkFSYsgXCBCR3NzJilu2WEF//Tu6xwIoLo76BEDODd9Fsu0LXVG49J+zPuVbCabn08WoyC8/91QZVU1+x3Wb5QTp2V9Ka7oWeIsmeHadmnzVckMdG+s/z7YsNzDoXIwzQPA72trwqacsXPT1HpSlwVwOE0t5MebhQuAM= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1763637261244959.3935504073132; Thu, 20 Nov 2025 03:14:21 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.1166982.1493405 (Exim 4.92) (envelope-from ) id 1vM2bt-00013M-4k; Thu, 20 Nov 2025 11:13:57 +0000 Received: by outflank-mailman (output) from mailman id 1166982.1493405; Thu, 20 Nov 2025 11:13:57 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2bt-00013A-1P; Thu, 20 Nov 2025 11:13:57 +0000 Received: by outflank-mailman (input) for mailman id 1166982; Thu, 20 Nov 2025 11:13:56 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2YE-0001PI-IZ for xen-devel@lists.xenproject.org; Thu, 20 Nov 2025 11:10:10 +0000 Received: from mail128-17.atl41.mandrillapp.com (mail128-17.atl41.mandrillapp.com [198.2.128.17]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 788e0caf-c601-11f0-9d18-b5c5bf9af7f9; Thu, 20 Nov 2025 12:10:06 +0100 (CET) Received: from pmta08.mandrill.prod.atl01.rsglab.com (localhost [127.0.0.1]) by mail128-17.atl41.mandrillapp.com (Mailchimp) with ESMTP id 4dBwcG4z7NzCf9P1H for ; Thu, 20 Nov 2025 11:10:06 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id 88d1e4a663ac4727b5f279336b0d5faf; Thu, 20 Nov 2025 11:10:06 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 788e0caf-c601-11f0-9d18-b5c5bf9af7f9 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1763637006; x=1763907006; bh=c/JljpXkKEnMLI4s/6MD5SMOH2GQOJrsu01e1YgqiEs=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=ymzj/HXOkuTCq1wb2WeX2Ao5eeu0Oi/3/LBM6D0Fj7WNYd2vXRUoSGtX6qsCxFhK8 oCcqQ6TBdxIoUjXev8X8Ny/sy1Zl+mUW9Ycrk2J+v/KD1i9w4kzUrwDCvLAHp4jvw7 wbTQMJsnRl7qitnXEitp0zarhc7Rbi+Y6d2MEfPi+DqfRxbzlaYsc8o6A67KldW8Kv uN8tyQZubk0v/OsnXMDDIyT38LT+Qd6u8PFmx6411+zAv5Ifi325+r4Jk4lZ2dfJ6b CNeQiWEOcs0UKyi6192kQd5zmrHyyiVqPxuTIpuqmqQzI75B3VJuu9BKtsvNDfavIU gYmdp0XlqRlpA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1763637006; x=1763897506; i=teddy.astie@vates.tech; bh=c/JljpXkKEnMLI4s/6MD5SMOH2GQOJrsu01e1YgqiEs=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=N0NEkiqt086+XFv0AB6djIFsDHrzq5eOY8FZQkn/UhEI4yZvLGEAV8++vBIBFbJ9A 0OCMBDGWTE7rER6K3lZvvukE7xTpXA8QWImSDinoWWAELYije6Nq9sMkT7dJGjW0sc OuaAQ38/BkHZM5JGBXarfSLu7j7QSNdiO+mZtDKxxunaoDwHoiNsz1lo0D2PwAnz/n 4V9sSvhF6SVlTG+nbih67GHTH+v560xQpbDTr/AVMmfTn695ERmiWDCui1cUbmQrg/ ojd3nLFUZ6mO+CIOgapRzxWbgAI77TBysSc/jOhJjSdf6jzmvNN0KcUNJBRaZdePw0 xt+w7pYK/tVWg== From: "Teddy Astie" Subject: =?utf-8?Q?[RFC=20PATCH=20v7=2013/14]=20iommu:=20Introduce=20PV-IOMMU?= X-Mailer: git-send-email 2.51.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1763637005106 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Jan Beulich" , "Andrew Cooper" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" , "Anthony PERARD" , "Michal Orzel" , "Julien Grall" , "Stefano Stabellini" Message-Id: <73a34b0236aec756738e0073b75495dcb214a74b.1763569135.git.teddy.astie@vates.tech> In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.88d1e4a663ac4727b5f279336b0d5faf?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20251120:md Date: Thu, 20 Nov 2025 11:10:06 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @mandrillapp.com) (identity teddy.astie@vates.tech) X-ZM-MESSAGEID: 1763637262536018900 Content-Type: text/plain; charset="utf-8" Introduce the PV-IOMMU subsystem as defined in docs/designs/pv-iommu.md. Signed-off-by Teddy Astie --- xen/arch/x86/include/asm/iommu.h | 5 +- xen/common/Makefile | 1 + xen/common/pv-iommu.c | 551 ++++++++++++++++++++++++++++ xen/drivers/passthrough/iommu.c | 95 +++++ xen/drivers/passthrough/x86/iommu.c | 61 ++- xen/include/hypercall-defs.c | 6 + xen/include/public/pv-iommu.h | 343 +++++++++++++++++ xen/include/public/xen.h | 1 + xen/include/xen/iommu.h | 9 + 9 files changed, 1066 insertions(+), 6 deletions(-) create mode 100644 xen/common/pv-iommu.c create mode 100644 xen/include/public/pv-iommu.h diff --git a/xen/arch/x86/include/asm/iommu.h b/xen/arch/x86/include/asm/io= mmu.h index 452b98b42d..c1d19baa13 100644 --- a/xen/arch/x86/include/asm/iommu.h +++ b/xen/arch/x86/include/asm/iommu.h @@ -136,6 +136,9 @@ int iommu_identity_mapping(struct domain *d, struct iom= mu_context *ctx, p2m_access_t p2ma, paddr_t base, paddr_t end, unsigned int flag); void iommu_identity_map_teardown(struct domain *d, struct iommu_context *c= tx); +bool iommu_identity_map_check(struct domain *d, struct iommu_context *ctx, + mfn_t mfn); + =20 extern bool untrusted_msi; =20 @@ -151,7 +154,7 @@ unsigned long *iommu_init_domid(domid_t reserve); domid_t iommu_alloc_domid(unsigned long *map); void iommu_free_domid(domid_t domid, unsigned long *map); =20 -int __must_check iommu_free_pgtables(struct domain *d, struct iommu_contex= t *ctx); +int __must_check cf_check iommu_free_pgtables(struct domain *d, struct iom= mu_context *ctx); struct domain_iommu; struct page_info *__must_check iommu_alloc_pgtable(struct domain_iommu *hd, struct iommu_context *c= tx, diff --git a/xen/common/Makefile b/xen/common/Makefile index 0c7d0f5d46..e2180b382e 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -39,6 +39,7 @@ obj-y +=3D percpu.o obj-$(CONFIG_PERF_COUNTERS) +=3D perfc.o obj-bin-$(CONFIG_HAS_PMAP) +=3D pmap.init.o obj-y +=3D preempt.o +obj-y +=3D pv-iommu.o obj-y +=3D random.o obj-y +=3D rangeset.o obj-y +=3D radix-tree.o diff --git a/xen/common/pv-iommu.c b/xen/common/pv-iommu.c new file mode 100644 index 0000000000..4cdb30a031 --- /dev/null +++ b/xen/common/pv-iommu.c @@ -0,0 +1,551 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * xen/common/pv_iommu.c + * + * PV-IOMMU hypercall interface. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define PVIOMMU_PREFIX "[PV-IOMMU] " + +static int get_paged_frame(struct domain *d, gfn_t gfn, mfn_t *mfn, + struct page_info **page, bool readonly) +{ + int ret =3D 0; + p2m_type_t p2mt =3D p2m_invalid; + + #ifdef CONFIG_X86 + p2m_query_t query =3D P2M_ALLOC; + + if ( !readonly ) + query |=3D P2M_UNSHARE; + + *mfn =3D get_gfn_type(d, gfn_x(gfn), &p2mt, query); + #else + *mfn =3D p2m_lookup(d, gfn, &p2mt); + #endif + + if ( mfn_eq(*mfn, INVALID_MFN) ) + { + /* No mapping ? */ + gprintk(XENLOG_G_WARNING, PVIOMMU_PREFIX + "Trying to map to non-backed page frame (gfn=3D%"PRI_gfn" = p2mt=3D%d)\n", + gfn_x(gfn), p2mt); + + ret =3D -ENOENT; + } + else if ( p2m_is_any_ram(p2mt) && mfn_valid(*mfn) ) + { + struct domain *owner; + + *page =3D mfn_to_page(*mfn); + owner =3D page_get_owner_and_reference(*page); + if ( !owner || (owner !=3D d && !is_hardware_domain(d)) ) + { + /* TODO: foreign mappings when d is not privileged ? */ + put_page(*page); + *page =3D NULL; + return -EPERM; + } + ret =3D 0; + } + else if ( p2m_is_mmio(p2mt) || + iomem_access_permitted(d, mfn_x(*mfn),mfn_x(*mfn)) ) + { + *page =3D NULL; + ret =3D 0; + } + else + { + gprintk(XENLOG_WARNING, PVIOMMU_PREFIX + "Unexpected p2mt %d (gfn=3D%"PRI_gfn" mfn=3D%"PRI_mfn")\n", + p2mt, gfn_x(gfn), mfn_x(*mfn)); + + ret =3D -EPERM; + } + + put_gfn(d, gfn_x(gfn)); + return ret; +} + +static bool can_use_iommu_check(struct domain *d) +{ + if ( !is_iommu_enabled(d) ) + { + gprintk(XENLOG_WARNING, PVIOMMU_PREFIX "IOMMU disabled for this do= main\n"); + return false; + } + + if ( !dom_iommu(d)->allow_pv_iommu ) + { + gprintk(XENLOG_WARNING, PVIOMMU_PREFIX "PV-IOMMU disabled for this= domain\n"); + return false; + } + + return true; +} + +static long capabilities_op(struct pv_iommu_capabilities *cap, struct doma= in *d) +{ + cap->max_ctx_no =3D d->iommu.other_contexts.count; + cap->max_iova_addr =3D iommu_get_max_iova(d); + + cap->max_pasid =3D 0; /* TODO */ + cap->cap_flags =3D 0; + + cap->pgsize_mask =3D PAGE_SIZE_4K; + + return 0; +} + +static long init_op(struct pv_iommu_init *init, struct domain *d) +{ + if (init->max_ctx_no =3D=3D UINT32_MAX) + return -E2BIG; + + return iommu_domain_pviommu_init(d, init->max_ctx_no + 1, init->arena_= order); +} + +static long alloc_context_op(struct pv_iommu_alloc *alloc, struct domain *= d) +{ + uint16_t ctx_no =3D 0; + int status =3D 0; + + status =3D iommu_context_alloc(d, &ctx_no, 0); + + if ( status ) + return status; + + gprintk(XENLOG_INFO, PVIOMMU_PREFIX "Created IOMMU context %hu\n", ctx= _no); + + alloc->ctx_no =3D ctx_no; + return 0; +} + +static long free_context_op(struct pv_iommu_free *free, struct domain *d) +{ + int flags =3D IOMMU_TEARDOWN_PREEMPT; + + if ( !free->ctx_no ) + return -EINVAL; + + if ( free->free_flags & IOMMU_FREE_reattach_default ) + flags |=3D IOMMU_TEARDOWN_REATTACH_DEFAULT; + + return iommu_context_free(d, free->ctx_no, flags); +} + +static long reattach_device_op(struct pv_iommu_reattach_device *reattach, + struct domain *d) +{ + int ret; + device_t *pdev; + struct physdev_pci_device dev =3D reattach->dev; + + pcidevs_lock(); + pdev =3D pci_get_pdev(d, PCI_SBDF(dev.seg, dev.bus, dev.devfn)); + + if ( !pdev ) + { + pcidevs_unlock(); + return -ENODEV; + } + + ret =3D iommu_reattach_context(d, d, pdev, reattach->ctx_no); + + pcidevs_unlock(); + return ret; +} + +static long map_pages_op(struct pv_iommu_map_pages *map, struct domain *d) +{ + struct iommu_context *ctx; + int ret =3D 0, flush_ret; + struct page_info *page =3D NULL; + mfn_t mfn, mfn_lookup; + unsigned int lookup_flags, flags =3D 0, flush_flags =3D 0; + size_t i =3D 0; + dfn_t dfn0 =3D _dfn(map->dfn); /* original map->dfn */ + + if ( !map->ctx_no || !(ctx =3D iommu_get_context(d, map->ctx_no)) ) + return -EINVAL; + + if ( map->map_flags & IOMMU_MAP_readable ) + flags |=3D IOMMUF_readable; + + if ( map->map_flags & IOMMU_MAP_writeable ) + flags |=3D IOMMUF_writable; + + for (i =3D 0; i < map->nr_pages; i++) + { + gfn_t gfn =3D _gfn(map->gfn + i); + dfn_t dfn =3D _dfn(map->dfn + i); + +#ifdef CONFIG_X86 + if ( iommu_identity_map_check(d, ctx, _mfn(map->dfn)) ) + { + ret =3D -EADDRNOTAVAIL; + break; + } +#endif + + ret =3D get_paged_frame(d, gfn, &mfn, &page, 0); + + if ( ret ) + break; + + /* Check for conflict with existing mappings */ + if ( !iommu_lookup_page(d, dfn, &mfn_lookup, &lookup_flags, map->c= tx_no) ) + { + if ( page && mfn_valid(mfn) ) + put_page(page); + + ret =3D -EADDRINUSE; + break; + } + + ret =3D iommu_map(d, dfn, mfn, 1, flags, &flush_flags, map->ctx_no= ); + + if ( ret ) + { + if ( page && mfn_valid(mfn) ) + put_page(page); + + break; + } + + map->mapped++; + + if ( (i & 0xff) && hypercall_preempt_check() ) + { + i++; + + map->gfn +=3D i; + map->dfn +=3D i; + map->nr_pages -=3D i; + + ret =3D -ERESTART; + break; + } + } + + flush_ret =3D iommu_iotlb_flush(d, dfn0, i, flush_flags, map->ctx_no); + + iommu_put_context(ctx); + + if ( flush_ret ) + gprintk(XENLOG_G_WARNING, PVIOMMU_PREFIX + "Flush operation failed for %d (%d)\n", ctx->id, flush_ret= ); + + return ret; +} + +static long unmap_pages_op(struct pv_iommu_unmap_pages *unmap, struct doma= in *d) +{ + struct iommu_context *ctx; + mfn_t mfn; + int ret =3D 0, flush_ret; + unsigned int flags, flush_flags =3D 0; + size_t i =3D 0; + dfn_t dfn0 =3D _dfn(unmap->dfn); /* original unmap->dfn */ + + if ( !unmap->ctx_no || !(ctx =3D iommu_get_context(d, unmap->ctx_no)) ) + return -EINVAL; + + for (i =3D 0; i < unmap->nr_pages; i++) + { + dfn_t dfn =3D _dfn(unmap->dfn + i); + +#ifdef CONFIG_X86 + if ( iommu_identity_map_check(d, ctx, _mfn(dfn_x(dfn))) ) + { + ret =3D -EADDRNOTAVAIL; + break; + } +#endif + + /* Check if there is a valid mapping for this domain */ + if ( iommu_lookup_page(d, dfn, &mfn, &flags, unmap->ctx_no) ) { + ret =3D -ENOENT; + break; + } + + ret =3D iommu_unmap(d, dfn, 1, 0, &flush_flags, unmap->ctx_no); + + if ( ret ) + break; + + unmap->unmapped++; + + /* Decrement reference counter (if needed) */ + if ( mfn_valid(mfn) ) + put_page(mfn_to_page(mfn)); + + if ( (i & 0xff) && hypercall_preempt_check() ) + { + i++; + + unmap->dfn +=3D i; + unmap->nr_pages -=3D i; + + ret =3D -ERESTART; + break; + } + } + + flush_ret =3D iommu_iotlb_flush(d, dfn0, i, flush_flags, unmap->ctx_no= ); + + iommu_put_context(ctx); + + if ( flush_ret ) + printk(XENLOG_WARNING PVIOMMU_PREFIX + "Flush operation failed for c%d (%d)\n", ctx->id, flush_ret= ); + + return ret; +} + +static long do_iommu_subop(int subop, XEN_GUEST_HANDLE_PARAM(void) arg, + struct domain *d, bool remote); + +static long remote_cmd_op(struct pv_iommu_remote_cmd *remote_cmd, + struct domain *current_domain) +{ + long ret =3D 0; + struct domain *d; + + /* TODO: use a better permission logic */ + if ( !is_hardware_domain(current_domain) ) + return -EPERM; + + d =3D get_domain_by_id(remote_cmd->domid); + + if ( !d ) + return -ENOENT; + + ret =3D do_iommu_subop(remote_cmd->subop, remote_cmd->arg, d, true); + + put_domain(d); + + return ret; +} + +static long do_iommu_subop(int subop, XEN_GUEST_HANDLE_PARAM(void) arg, + struct domain *d, bool remote) +{ + long ret =3D 0; + + switch ( subop ) + { + case IOMMU_noop: + break; + + case IOMMU_query_capabilities: + { + struct pv_iommu_capabilities cap; + + ret =3D capabilities_op(&cap, d); + + if ( unlikely(copy_to_guest(arg, &cap, 1)) ) + ret =3D -EFAULT; + + break; + } + + case IOMMU_init: + { + struct pv_iommu_init init; + + if ( unlikely(copy_from_guest(&init, arg, 1)) ) + { + ret =3D -EFAULT; + break; + } + + ret =3D init_op(&init, d); + gdprintk(XENLOG_INFO, PVIOMMU_PREFIX "init -> %ld\n", ret); + } + + case IOMMU_alloc_context: + { + struct pv_iommu_alloc alloc; + + if ( unlikely(copy_from_guest(&alloc, arg, 1)) ) + { + ret =3D -EFAULT; + break; + } + + ret =3D alloc_context_op(&alloc, d); + + if ( unlikely(copy_to_guest(arg, &alloc, 1)) ) + ret =3D -EFAULT; + + gdprintk(XENLOG_INFO, PVIOMMU_PREFIX + "alloc_context(flags:%x) -> ctx_no: %d, ret=3D%ld\n", + alloc.alloc_flags, alloc.ctx_no, ret); + break; + } + + case IOMMU_free_context: + { + struct pv_iommu_free free; + + if ( unlikely(copy_from_guest(&free, arg, 1)) ) + { + ret =3D -EFAULT; + break; + } + + ret =3D free_context_op(&free, d); + gdprintk(XENLOG_INFO, PVIOMMU_PREFIX + "free_context(ctx_no:%d) -> %ld\n", free.ctx_no, ret); + break; + } + + case IOMMU_reattach_device: + { + struct pv_iommu_reattach_device reattach; + + if ( unlikely(copy_from_guest(&reattach, arg, 1)) ) + { + ret =3D -EFAULT; + break; + } + + ret =3D reattach_device_op(&reattach, d); + gdprintk(XENLOG_INFO, PVIOMMU_PREFIX + "reattach(ctx_no:%d, bus:%02x, devfn:%2x) -> %ld\n", + reattach.ctx_no, reattach.dev.bus, reattach.dev.devfn= , ret); + break; + } + + case IOMMU_map_pages: + { + struct pv_iommu_map_pages map; + + if ( unlikely(copy_from_guest(&map, arg, 1)) ) + { + ret =3D -EFAULT; + break; + } + + ret =3D map_pages_op(&map, d); + + if ( unlikely(copy_to_guest(arg, &map, 1)) ) + ret =3D -EFAULT; + + break; + } + + case IOMMU_unmap_pages: + { + struct pv_iommu_unmap_pages unmap; + + if ( unlikely(copy_from_guest(&unmap, arg, 1)) ) + { + ret =3D -EFAULT; + break; + } + + ret =3D unmap_pages_op(&unmap, d); + + if ( unlikely(copy_to_guest(arg, &unmap, 1)) ) + ret =3D -EFAULT; + + break; + } + + case IOMMU_remote_cmd: + { + struct pv_iommu_remote_cmd remote_cmd; + + if ( remote ) + { + /* Prevent remote_cmd from being called recursively */ + ret =3D -EINVAL; + break; + } + + if ( unlikely(copy_from_guest(&remote_cmd, arg, 1)) ) + { + ret =3D -EFAULT; + break; + } + + ret =3D remote_cmd_op(&remote_cmd, d); + break; + } + + /* + * TODO + */ + case IOMMU_alloc_nested: + { + ret =3D -EOPNOTSUPP; + break; + } + + case IOMMU_flush_nested: + { + ret =3D -EOPNOTSUPP; + break; + } + + case IOMMU_attach_pasid: + { + ret =3D -EOPNOTSUPP; + break; + } + + case IOMMU_detach_pasid: + { + ret =3D -EOPNOTSUPP; + break; + } + + default: + return -EOPNOTSUPP; + } + + return ret; +} + +long do_iommu_op(unsigned int subop, XEN_GUEST_HANDLE_PARAM(void) arg) +{ + long ret =3D 0; + struct domain *d =3D current->domain; + + if ( !can_use_iommu_check(d) ) + return -ENODEV; + + ret =3D do_iommu_subop(subop, arg, d, false); + + if ( ret =3D=3D -ERESTART ) + return hypercall_create_continuation(__HYPERVISOR_iommu_op, "ih", = subop, arg); + + return ret; +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iomm= u.c index 4434a9dcd0..5c6b272697 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -193,6 +193,99 @@ static void __hwdom_init check_hwdom_reqs(struct domai= n *d) arch_iommu_check_autotranslated_hwdom(d); } =20 + +int iommu_domain_pviommu_init(struct domain *d, uint16_t nb_ctx, uint32_t = arena_order) +{ + struct domain_iommu *hd =3D dom_iommu(d); + int rc; + + BUG_ON(nb_ctx =3D=3D 0); /* sanity check (prevent underflow) */ + + /* + * hd->other_contexts.count is always reported as 0 during initializat= ion + * preventing misuse of partially initialized IOMMU contexts. + */ + + if ( atomic_cmpxchg(&hd->other_contexts.initialized, 0, 1) =3D=3D 1 ) + return -EACCES; + + if ( (nb_ctx - 1) > 0 ) { + /* Initialize context bitmap */ + size_t i; + + hd->other_contexts.bitmap =3D xzalloc_array(unsigned long, + BITS_TO_LONGS(nb_ctx - 1= )); + + if (!hd->other_contexts.bitmap) + { + rc =3D -ENOMEM; + goto cleanup; + } + + hd->other_contexts.map =3D xzalloc_array(struct iommu_context, nb_= ctx - 1); + + if (!hd->other_contexts.map) + { + rc =3D -ENOMEM; + goto cleanup; + } + + for (i =3D 0; i < (nb_ctx - 1); i++) + rspin_lock_init(&hd->other_contexts.map[i].lock); + } + + rc =3D arch_iommu_pviommu_init(d, nb_ctx, arena_order); + + if ( rc ) + goto cleanup; + + /* Make sure initialization is complete before making it visible to ot= her CPUs. */ + smp_wmb(); + + hd->other_contexts.count =3D nb_ctx - 1; + + printk(XENLOG_INFO "Dom%d uses %lu IOMMU contexts (%llu pages arena)\n= ", + d->domain_id, (unsigned long)nb_ctx, 1llu << arena_order); + + return 0; + +cleanup: + /* TODO: Reset hd->other_contexts.initialized */ + if ( hd->other_contexts.bitmap ) + { + xfree(hd->other_contexts.bitmap); + hd->other_contexts.bitmap =3D NULL; + } + + if ( hd->other_contexts.map ) + { + xfree(hd->other_contexts.map); + hd->other_contexts.bitmap =3D NULL; + } + + return rc; +} + +int iommu_domain_pviommu_teardown(struct domain *d) +{ + struct domain_iommu *hd =3D dom_iommu(d); + int i; + /* FIXME: Potential race condition with remote_op ? */ + + for (i =3D 0; i < hd->other_contexts.count; i++) + WARN_ON(iommu_context_free(d, i, IOMMU_TEARDOWN_REATTACH_DEFAULT) = !=3D ENOENT); + + hd->other_contexts.count =3D 0; + + if ( hd->other_contexts.bitmap ) + xfree(hd->other_contexts.bitmap); + + if ( hd->other_contexts.map ) + xfree(hd->other_contexts.map); + + return 0; +} + int iommu_domain_init(struct domain *d, unsigned int opts) { struct domain_iommu *hd =3D dom_iommu(d); @@ -238,6 +331,8 @@ int iommu_domain_init(struct domain *d, unsigned int op= ts) =20 ASSERT(!(hd->need_sync && hd->hap_pt_share)); =20 + hd->allow_pv_iommu =3D true; + rspin_lock(&hd->default_ctx.lock); ret =3D iommu_context_init(d, &hd->default_ctx, 0, IOMMU_CONTEXT_INIT_= default); rspin_unlock(&hd->default_ctx.lock); diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/= x86/iommu.c index d8becfa869..ac339a2ed3 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -215,6 +215,32 @@ int arch_iommu_context_teardown(struct domain *d, stru= ct iommu_context *ctx, u32 return 0; } =20 +int arch_iommu_pviommu_init(struct domain *d, uint16_t nb_ctx, uint32_t ar= ena_order) +{ + struct domain_iommu *hd =3D dom_iommu(d); + + if ( arena_order =3D=3D 0 ) + return 0; + + return iommu_arena_initialize(&hd->arch.pt_arena, NULL, arena_order, 0= ); +} + +int arch_iommu_pviommu_teardown(struct domain *d) +{ + struct domain_iommu *hd =3D dom_iommu(d); + + if ( iommu_arena_teardown(&hd->arch.pt_arena, true) ) + { + printk(XENLOG_WARNING "IOMMU Arena used while being destroyed\n"); + WARN(); + + /* Teardown anyway */ + iommu_arena_teardown(&hd->arch.pt_arena, false); + } + + return 0; +} + void arch_iommu_domain_destroy(struct domain *d) { } @@ -394,6 +420,19 @@ void iommu_identity_map_teardown(struct domain *d, str= uct iommu_context *ctx) } } =20 +bool iommu_identity_map_check(struct domain *d, struct iommu_context *ctx, + mfn_t mfn) +{ + struct identity_map *map; + uint64_t addr =3D pfn_to_paddr(mfn_x(mfn)); + + list_for_each_entry ( map, &ctx->arch.identity_maps, list ) + if (addr >=3D map->base && addr < map->end) + return true; + + return false; +} + struct handle_iomemcap { struct rangeset *r; unsigned long last; @@ -648,7 +687,7 @@ void iommu_free_domid(domid_t domid, unsigned long *map) BUG(); } =20 -int iommu_free_pgtables(struct domain *d, struct iommu_context *ctx) +int cf_check iommu_free_pgtables(struct domain *d, struct iommu_context *c= tx) { struct domain_iommu *hd =3D dom_iommu(d); struct page_info *pg; @@ -665,7 +704,10 @@ int iommu_free_pgtables(struct domain *d, struct iommu= _context *ctx) =20 while ( (pg =3D page_list_remove_head(&ctx->arch.pgtables)) ) { - free_domheap_page(pg); + if (ctx->id =3D=3D 0) + free_domheap_page(pg); + else + iommu_arena_free_page(&hd->arch.pt_arena, pg); =20 if ( !(++done & 0xff) && general_preempt_check() ) return -ERESTART; @@ -687,7 +729,11 @@ struct page_info *iommu_alloc_pgtable(struct domain_io= mmu *hd, memflags =3D MEMF_node(hd->node); #endif =20 - pg =3D alloc_domheap_page(NULL, memflags); + if (ctx->id =3D=3D 0) + pg =3D alloc_domheap_page(NULL, memflags); + else + pg =3D iommu_arena_allocate_page(&hd->arch.pt_arena); + if ( !pg ) return NULL; =20 @@ -766,9 +812,14 @@ void iommu_queue_free_pgtable(struct domain *d, struct= iommu_context *ctx, =20 page_list_del(pg, &ctx->arch.pgtables); =20 - page_list_add_tail(pg, &per_cpu(free_pgt_list, cpu)); + if ( !ctx->id ) + { + page_list_add_tail(pg, &per_cpu(free_pgt_list, cpu)); =20 - tasklet_schedule(&per_cpu(free_pgt_tasklet, cpu)); + tasklet_schedule(&per_cpu(free_pgt_tasklet, cpu)); + } + else + iommu_arena_free_page(&dom_iommu(d)->arch.pt_arena, pg); } =20 static int cf_check cpu_callback( diff --git a/xen/include/hypercall-defs.c b/xen/include/hypercall-defs.c index cef08eeec1..0cfda01094 100644 --- a/xen/include/hypercall-defs.c +++ b/xen/include/hypercall-defs.c @@ -213,6 +213,9 @@ hypfs_op(unsigned int cmd, const char *arg1, unsigned l= ong arg2, void *arg3, uns #ifdef CONFIG_X86 xenpmu_op(unsigned int op, xen_pmu_params_t *arg) #endif +#ifdef CONFIG_HAS_PASSTHROUGH +iommu_op(unsigned int subop, void *arg) +#endif =20 #ifdef CONFIG_PV caller: pv64 @@ -301,5 +304,8 @@ mca do do - = - - #if defined(CONFIG_X86) && defined(CONFIG_PAGING) && !defined(CONFIG_PV_SH= IM_EXCLUSIVE) paging_domctl_cont do do do do - #endif +#ifdef CONFIG_HAS_PASSTHROUGH +iommu_op do do do do - +#endif =20 #endif /* !CPPCHECK */ diff --git a/xen/include/public/pv-iommu.h b/xen/include/public/pv-iommu.h new file mode 100644 index 0000000000..6f50aea4b7 --- /dev/null +++ b/xen/include/public/pv-iommu.h @@ -0,0 +1,343 @@ +/* SPDX-License-Identifier: MIT */ +/** + * pv-iommu.h + * + * Paravirtualized IOMMU driver interface. + * + * Copyright (c) 2024 Teddy Astie + */ + +#ifndef __XEN_PUBLIC_PV_IOMMU_H__ +#define __XEN_PUBLIC_PV_IOMMU_H__ + +#include "xen.h" +#include "physdev.h" + +#ifndef uint64_aligned_t +#define uint64_aligned_t uint64_t +#endif + +#define IOMMU_DEFAULT_CONTEXT (0) + +enum pv_iommu_cmd { + /* Basic cmd */ + IOMMU_noop =3D 0, + IOMMU_query_capabilities =3D 1, + IOMMU_init =3D 2, + IOMMU_alloc_context =3D 3, + IOMMU_free_context =3D 4, + IOMMU_reattach_device =3D 5, + IOMMU_map_pages =3D 6, + IOMMU_unmap_pages =3D 7, + IOMMU_remote_cmd =3D 8, + + /* Extended cmd */ + IOMMU_alloc_nested =3D 9, /* if IOMMUCAP_nested */ + IOMMU_flush_nested =3D 10, /* if IOMMUCAP_nested */ + IOMMU_attach_pasid =3D 11, /* if IOMMUCAP_pasid */ + IOMMU_detach_pasid =3D 12, /* if IOMMUCAP_pasid */ +}; + +/** + * If set, default context allow DMA to domain memory. + * If cleared, default context blocks all DMA to domain memory. + */ +#define IOMMUCAP_default_identity (1U << 0) + +/** + * IOMMU_MAP_cache support. + */ +#define IOMMUCAP_cache (1U << 1) + +/** + * If set, IOMMU_alloc_nested and IOMMU_flush_nested are supported. + */ +#define IOMMUCAP_nested (1U << 2) + +/** + * If set, IOMMU_attach_pasid and IOMMU_detach_pasid are supported and + * a device PASID can be specified in reattach_context. + */ +#define IOMMUCAP_pasid (1U << 3) + +/** + * If set, IOMMU_ALLOC_identity is supported in pv_iommu_alloc. + */ +#define IOMMUCAP_identity (1U << 4) + +/** + * IOMMU_query_capabilities + * Query PV-IOMMU capabilities for this domain. + */ +struct pv_iommu_capabilities { + /* + * OUT: Maximum device address (iova) that the guest can use for mappi= ngs. + */ + uint64_aligned_t max_iova_addr; + + /* OUT: IOMMU capabilities flags */ + uint32_t cap_flags; + + /* OUT: Mask of all supported page sizes. */ + uint32_t pgsize_mask; + + /* OUT: Maximum pasid (if IOMMUCAP_pasid) */ + uint32_t max_pasid; + + /* OUT: Maximum number of IOMMU context this domain can use. */ + uint16_t max_ctx_no; + + uint16_t pad0; +}; +typedef struct pv_iommu_capabilities pv_iommu_capabilities_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_capabilities_t); + +/** + * IOMMU_init + * Initialize PV-IOMMU for this domain. + * + * Fails with -EACCESS if PV-IOMMU is already initialized. + */ +struct pv_iommu_init { + /* IN: Maximum number of IOMMU context this domain can use. */ + uint32_t max_ctx_no; + + /* IN: Arena size in pages (in power of two) */ + uint32_t arena_order; +}; +typedef struct pv_iommu_init pv_iommu_init_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_init_t); + +/** + * Create a 1:1 identity mapped context to domain memory + * (needs IOMMUCAP_identity). + */ +#define IOMMU_ALLOC_identity (1 << 0) + +/** + * IOMMU_alloc_context + * Allocate an IOMMU context. + * Fails with -ENOSPC if no context number is available. + */ +struct pv_iommu_alloc { + /* OUT: allocated IOMMU context number */ + uint16_t ctx_no; + + /* IN: allocation flags */ + uint32_t alloc_flags; +}; +typedef struct pv_iommu_alloc pv_iommu_alloc_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_alloc_t); + +/** + * Move all devices to default context before freeing the context. + */ +#define IOMMU_FREE_reattach_default (1 << 0) + +/** + * IOMMU_free_context + * Destroy a IOMMU context. + * + * If IOMMU_FREE_reattach_default is specified, move all context devices to + * default context before destroying this context. + * + * If there are devices in the context and IOMMU_FREE_reattach_default is = not + * specified, fail with -EBUSY. + * + * The default context can't be destroyed. + */ +struct pv_iommu_free { + /* IN: IOMMU context number to free */ + uint16_t ctx_no; + + /* IN: Free operation specific flags */ + uint32_t free_flags; +}; +typedef struct pv_iommu_free pv_iommu_free_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_free_t); + +/* Device has read access */ +#define IOMMU_MAP_readable (1 << 0) + +/* Device has write access */ +#define IOMMU_MAP_writeable (1 << 1) + +/* Enforce DMA coherency */ +#define IOMMU_MAP_cache (1 << 2) + +/** + * IOMMU_map_pages + * Map pages on a IOMMU context. + * + * pgsize must be supported by pgsize_mask. + * Fails with -EINVAL if mapping on top of another mapping. + * Report actually mapped page count in mapped field (regardless of failur= e). + */ +struct pv_iommu_map_pages { + /* IN: IOMMU context number */ + uint16_t ctx_no; + + /* IN: Guest frame number */ + uint64_aligned_t gfn; + + /* IN: Device frame number */ + uint64_aligned_t dfn; + + /* IN: Map flags */ + uint32_t map_flags; + + /* IN: Size of pages to map */ + uint32_t pgsize; + + /* IN: Number of pages to map */ + uint32_t nr_pages; + + /* OUT: Number of pages actually mapped */ + uint32_t mapped; +}; +typedef struct pv_iommu_map_pages pv_iommu_map_pages_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_map_pages_t); + +/** + * IOMMU_unmap_pages + * Unmap pages on a IOMMU context. + * + * pgsize must be supported by pgsize_mask. + * Report actually unmapped page count in mapped field (regardless of fail= ure). + * Fails with -ENOENT when attempting to unmap a page without any mapping + */ +struct pv_iommu_unmap_pages { + /* IN: IOMMU context number */ + uint16_t ctx_no; + + /* IN: Device frame number */ + uint64_aligned_t dfn; + + /* IN: Size of pages to unmap */ + uint32_t pgsize; + + /* IN: Number of pages to unmap */ + uint32_t nr_pages; + + /* OUT: Number of pages actually unmapped */ + uint32_t unmapped; +}; +typedef struct pv_iommu_unmap_pages pv_iommu_unmap_pages_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_unmap_pages_t); + +/** + * IOMMU_reattach_device + * Reattach a device to another IOMMU context. + * Fails with -ENODEV if no such device exist. + */ +struct pv_iommu_reattach_device { + /* IN: Target IOMMU context number */ + uint16_t ctx_no; + + /* IN: Physical device to move */ + struct physdev_pci_device dev; + + /* IN: PASID of the device (if IOMMUCAP_pasid) */ + uint32_t pasid; +}; +typedef struct pv_iommu_reattach_device pv_iommu_reattach_device_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_reattach_device_t); + + +/** + * IOMMU_remote_cmd + * Do a PV-IOMMU operation on another domain. + * Current domain needs to be allowed to act on the target domain, otherwi= se + * fails with -EPERM. + */ +struct pv_iommu_remote_cmd { + /* IN: Target domain to do the subop on */ + uint16_t domid; + + /* IN: Command to do on target domain. */ + uint16_t subop; + + /* INOUT: Command argument from current domain memory */ + XEN_GUEST_HANDLE(void) arg; +}; +typedef struct pv_iommu_remote_cmd pv_iommu_remote_cmd_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_remote_cmd_t); + +/** + * IOMMU_alloc_nested + * Create a nested IOMMU context (needs IOMMUCAP_nested). + * + * This context uses a platform-specific page table from domain address sp= ace + * specified in pgtable_gfn and use it for nested translations. + * + * Explicit flushes needs to be submited with IOMMU_flush_nested on + * modification of the nested pagetable to ensure coherency between IOTLB = and + * nested page table. + * + * This context can be destroyed using IOMMU_free_context. + * This context cannot be modified using map_pages, unmap_pages. + */ +struct pv_iommu_alloc_nested { + /* OUT: allocated IOMMU context number */ + uint16_t ctx_no; + + /* IN: guest frame number of the nested page table */ + uint64_aligned_t pgtable_gfn; + + /* IN: nested mode flags */ + uint64_aligned_t nested_flags; +}; +typedef struct pv_iommu_alloc_nested pv_iommu_alloc_nested_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_alloc_nested_t); + +/** + * IOMMU_flush_nested (needs IOMMUCAP_nested) + * Flush the IOTLB for nested translation. + */ +struct pv_iommu_flush_nested { + /* TODO */ +}; +typedef struct pv_iommu_flush_nested pv_iommu_flush_nested_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_flush_nested_t); + +/** + * IOMMU_attach_pasid (needs IOMMUCAP_pasid) + * Attach a new device-with-pasid to a IOMMU context. + * If a matching device-with-pasid already exists (globally), + * fail with -EEXIST. + * If pasid is 0, fails with -EINVAL. + * If physical device doesn't exist in domain, fail with -ENOENT. + */ +struct pv_iommu_attach_pasid { + /* IN: IOMMU context to add the device-with-pasid in */ + uint16_t ctx_no; + + /* IN: Physical device */ + struct physdev_pci_device dev; + + /* IN: pasid of the device to attach */ + uint32_t pasid; +}; +typedef struct pv_iommu_attach_pasid pv_iommu_attach_pasid_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_attach_pasid_t); + +/** + * IOMMU_detach_pasid (needs IOMMUCAP_pasid) + * detach a device-with-pasid. + * If the device-with-pasid doesn't exist or belong to the domain, + * fail with -ENOENT. + * If pasid is 0, fails with -EINVAL. + */ +struct pv_iommu_detach_pasid { + /* IN: Physical device */ + struct physdev_pci_device dev; + + /* pasid of the device to detach */ + uint32_t pasid; +}; +typedef struct pv_iommu_detach_pasid pv_iommu_detach_pasid_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_detach_pasid_t); + +/* long do_iommu_op(int subop, XEN_GUEST_HANDLE_PARAM(void) arg) */ + +#endif \ No newline at end of file diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h index 82b9c05a76..f0b1860c7c 100644 --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -118,6 +118,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t); #define __HYPERVISOR_xenpmu_op 40 #define __HYPERVISOR_dm_op 41 #define __HYPERVISOR_hypfs_op 42 +#define __HYPERVISOR_iommu_op 43 =20 /* Architecture-specific hypercall definitions. */ #define __HYPERVISOR_arch_0 48 diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h index 66951c9809..3c77dfaf41 100644 --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -487,6 +487,10 @@ struct domain_iommu { /* SAF-2-safe enum constant in arithmetic operation */ DECLARE_BITMAP(features, IOMMU_FEAT_count); =20 + + /* Is the domain allowed to use PV-IOMMU ? */ + bool allow_pv_iommu; + /* Does the guest share HAP mapping with the IOMMU? */ bool hap_pt_share; =20 @@ -526,6 +530,8 @@ static inline int iommu_do_domctl(struct xen_domctl *do= mctl, struct domain *d, } #endif =20 +int iommu_domain_pviommu_init(struct domain *d, uint16_t nb_ctx, uint32_t = arena_order); + int __must_check iommu_suspend(void); void iommu_resume(void); void iommu_crash_shutdown(void); @@ -542,6 +548,7 @@ int iommu_do_pci_domctl(struct xen_domctl *domctl, stru= ct domain *d, =20 void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev); =20 +uint64_t iommu_get_max_iova(struct domain *d); =20 struct iommu_context *iommu_get_context(struct domain *d, u16 ctx_id); void iommu_put_context(struct iommu_context *ctx); @@ -574,6 +581,8 @@ int iommu_detach_context(struct domain *d, device_t *de= v); */ DECLARE_PER_CPU(bool, iommu_dont_flush_iotlb); =20 +int arch_iommu_pviommu_init(struct domain *d, uint16_t nb_ctx, uint32_t ar= ena_order); +int arch_iommu_pviommu_teardown(struct domain *d); bool arch_iommu_use_permitted(const struct domain *d); =20 #ifdef CONFIG_X86 --=20 2.51.2 -- Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sun Feb 8 01:17:05 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=reject dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1763637042; cv=none; d=zohomail.com; s=zohoarc; b=AezYR2a4Q7W7C3JZh8/9ZkM1eNjGCmGn1Tnhntd0xi5gygXPisuFomMCE9IgHsMr3z/a/EMdqh9ReT/YLzwuUhCdKUyVm/Zj/rai9zTbug8qAXjohchvk4mncPXDRMtQyOO1AylF6+XT194BQd3jmIsM+yzDBgWc3sU61X1ipGM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1763637042; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=1JvgRf0rexS39lvXSQ6h0vAJOtgrDkxsYnUvnyHod10=; b=ZoxElfx99dQTuSsqzpbnkpizAknMSRfqMe+MW7RF5NvMD1D+mk2FlBh7IAo3Er+VV0ryA+yKX2Hv3unFPXFIj9EVXk5g4Avyh+fwAEaa39v7BsB1JdaurnD1fZqzLpi+k1WJUbtpyv7B6js/e8T7tltUBoZm8wGBg+PMP9xxYiw= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1763637042966740.1346377478959; Thu, 20 Nov 2025 03:10:42 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.1166879.1493395 (Exim 4.92) (envelope-from ) id 1vM2YG-0005hv-IZ; Thu, 20 Nov 2025 11:10:12 +0000 Received: by outflank-mailman (output) from mailman id 1166879.1493395; Thu, 20 Nov 2025 11:10:12 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2YG-0005gs-As; Thu, 20 Nov 2025 11:10:12 +0000 Received: by outflank-mailman (input) for mailman id 1166879; Thu, 20 Nov 2025 11:10:10 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1vM2YE-0001P8-70 for xen-devel@lists.xenproject.org; Thu, 20 Nov 2025 11:10:10 +0000 Received: from mail180-15.suw31.mandrillapp.com (mail180-15.suw31.mandrillapp.com [198.2.180.15]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id 78f00f05-c601-11f0-980a-7dc792cee155; Thu, 20 Nov 2025 12:10:08 +0100 (CET) Received: from pmta11.mandrill.prod.suw01.rsglab.com (localhost [127.0.0.1]) by mail180-15.suw31.mandrillapp.com (Mailchimp) with ESMTP id 4dBwcH1sDpzPm0bxt for ; Thu, 20 Nov 2025 11:10:07 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id e256d5f039384ed8a42fd4c1aa3b0daf; Thu, 20 Nov 2025 11:10:07 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 78f00f05-c601-11f0-980a-7dc792cee155 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1763637007; x=1763907007; bh=1JvgRf0rexS39lvXSQ6h0vAJOtgrDkxsYnUvnyHod10=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=1wD2iR8HuZyStwZtwpeBjEqMxgilcuNR1GImEOBdsTUU9g7ZdkGB3MGBaHG8UIJ9t bq1QSpMdo+bb0/lBDRUliTCyv/BRQrQy6qRMKn3D3MKjFMpUKQU2udDJ9xhXtPTeR0 C/8GYqybRlrkBWT9cMT694PBoD2dhxhJMzoZVqIfkFJaFjPGJW41Emqy8SlI4+Tlvz PFVus8lk/J75MrBPY7IQ8MPi1JfPX+H4vi02RpmdRCC3UDLKbLvMDRI45hj8X1xw+K MSnSCGQemhbymieRs2tQQfx1nDtl+5xltJi513N1kUqz43lRKSNI8C/Q7cB8tfPH0l BgqzyZZEw3ypQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1763637007; x=1763897507; i=teddy.astie@vates.tech; bh=1JvgRf0rexS39lvXSQ6h0vAJOtgrDkxsYnUvnyHod10=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=SPNktnjj9asuiNq2pvXHP4WmqwJWKdU1aTRYBphxZuIjRTVH/HZfDuVK/R2YdmnBM 2rwV5EU7Rwag2xAzuVFJktnqGG6+wOuCcDJONQqJkclQEA/vJ5x7qbZKrz00Opktc8 Q9Oh+wmTwCkcYHC1Bso3Eg3SpiczXXh84Bzb9c2gtth3zbhV/5C31deym9YxOy+2vq PycWM6u+9RhUy0RbNm/wyPt4v/RRVSr+tfherFU6pEVxoYhI1hieReDBCCpUw2/pye 6gDCoepM0SxhasP/f86WiddSe9tZbeNY3pDw3MXSSV9qAmn10uRwefyagTD6wMHQhM qC8rzshe9QiTg== From: "Teddy Astie" Subject: =?utf-8?Q?[RFC=20PATCH=20v7=2014/14]=20iommu:=20Introduce=20no-dma=20feature?= X-Mailer: git-send-email 2.51.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1763637006009 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Andrew Cooper" , "Anthony PERARD" , "Michal Orzel" , "Jan Beulich" , "Julien Grall" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" , "Stefano Stabellini" Message-Id: <6e1cb09c1543e9f2ca913f1cb6eecaaca7b7a13b.1763569135.git.teddy.astie@vates.tech> In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.e256d5f039384ed8a42fd4c1aa3b0daf?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20251120:md Date: Thu, 20 Nov 2025 11:10:07 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity teddy.astie@vates.tech) (identity @mandrillapp.com) X-ZM-MESSAGEID: 1763637045692018900 Content-Type: text/plain; charset="utf-8" This feature exposed through `dom0-iommu=3Dno-dma` prevents the devices of default context to have access to domain's memory. This basically enforces DMA protection by default. The domain will need to prepare a specific IOMMU context to do DMA. This feature needs the guest to provide a PV-IOMMU driver. Signed-off-by Teddy Astie --- docs/misc/xen-command-line.pandoc | 16 +++++++++++++++- xen/arch/x86/x86_64/mm.c | 3 ++- xen/common/pv-iommu.c | 3 +++ xen/drivers/passthrough/iommu.c | 13 +++++++++++++ xen/drivers/passthrough/x86/iommu.c | 4 ++++ xen/include/xen/iommu.h | 3 +++ 6 files changed, 40 insertions(+), 2 deletions(-) diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line= .pandoc index 34004ce282..b528f626a7 100644 --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -941,7 +941,7 @@ is necessary to fix an issue, please report a bug. =20 ### dom0-iommu =3D List of [ passthrough=3D, strict=3D, map-inclusive=3D<= bool>, - map-reserved=3D, none ] + map-reserved=3D, dma=3D, none ] =20 Controls for the dom0 IOMMU setup. =20 @@ -992,6 +992,20 @@ Controls for the dom0 IOMMU setup. subset of the correction by only mapping reserved memory regions rather than all non-RAM regions. =20 +* The `dma` option determines if the IOMMU is configured to identity map + the "default IOMMU context". If set to false, all devices of the Dom0 = get + their DMA blocked until the IOMMU is properly configured by the guest = (aside + special reserved regions). + + Disabling this can slightly improve performance by removing the need t= o synchronize + p2m modifications with the IOMMU subsystem, moreover, disabling it pro= vides DMA + protection that some operating systems can expect in order to securely= handle some + devices (e.g Thunderbolt). + + Disabling this requires guest support for PV-IOMMU for devices to beha= ve properly. + + This option is enabled by default. + * The `none` option is intended for development purposes only, and skips certain safety checks pertaining to the correct IOMMU configuration for dom0 to boot. diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c index d4e6a9c0a2..00ff5d0b71 100644 --- a/xen/arch/x86/x86_64/mm.c +++ b/xen/arch/x86/x86_64/mm.c @@ -1315,7 +1315,8 @@ int memory_add(unsigned long spfn, unsigned long epfn= , unsigned int pxm) */ if ( is_iommu_enabled(hardware_domain) && !iommu_use_hap_pt(hardware_domain) && - !need_iommu_pt_sync(hardware_domain) ) + !need_iommu_pt_sync(hardware_domain) && + !iommu_hwdom_no_dma ) { for ( i =3D spfn; i < epfn; i++ ) if ( iommu_legacy_map(hardware_domain, _dfn(i), _mfn(i), diff --git a/xen/common/pv-iommu.c b/xen/common/pv-iommu.c index 4cdb30a031..a1d0552a66 100644 --- a/xen/common/pv-iommu.c +++ b/xen/common/pv-iommu.c @@ -107,6 +107,9 @@ static long capabilities_op(struct pv_iommu_capabilitie= s *cap, struct domain *d) cap->max_pasid =3D 0; /* TODO */ cap->cap_flags =3D 0; =20 + if ( !dom_iommu(d)->no_dma ) + cap->cap_flags |=3D IOMMUCAP_default_identity; + cap->pgsize_mask =3D PAGE_SIZE_4K; =20 return 0; diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iomm= u.c index 5c6b272697..81d4cb87cf 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -55,6 +55,7 @@ static bool __hwdom_initdata iommu_hwdom_none; bool __hwdom_initdata iommu_hwdom_strict; bool __read_mostly iommu_hwdom_passthrough; bool __hwdom_initdata iommu_hwdom_inclusive; +bool __read_mostly iommu_hwdom_no_dma =3D false; int8_t __hwdom_initdata iommu_hwdom_reserved =3D -1; =20 #ifndef iommu_hap_pt_share @@ -172,6 +173,8 @@ static int __init cf_check parse_dom0_iommu_param(const= char *s) iommu_hwdom_reserved =3D val; else if ( !cmdline_strcmp(s, "none") ) iommu_hwdom_none =3D true; + else if ( (val =3D parse_boolean("dma", s, ss)) >=3D 0 ) + iommu_hwdom_no_dma =3D !val; else rc =3D -EINVAL; =20 @@ -292,7 +295,10 @@ int iommu_domain_init(struct domain *d, unsigned int o= pts) int ret =3D 0; =20 if ( is_hardware_domain(d) ) + { check_hwdom_reqs(d); /* may modify iommu_hwdom_strict */ + hd->no_dma =3D iommu_hwdom_no_dma; + } =20 if ( !is_iommu_enabled(d) ) return 0; @@ -329,6 +335,13 @@ int iommu_domain_init(struct domain *d, unsigned int o= pts) if ( !is_hardware_domain(d) || iommu_hwdom_strict ) hd->need_sync =3D !iommu_use_hap_pt(d); =20 + if ( hd->no_dma ) + { + /* No-DMA mode is exclusive with HAP and sync_pt. */ + hd->hap_pt_share =3D false; + hd->need_sync =3D false; + } + ASSERT(!(hd->need_sync && hd->hap_pt_share)); =20 hd->allow_pv_iommu =3D true; diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/= x86/iommu.c index ac339a2ed3..b100c55e69 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -542,6 +542,10 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain = *d) if ( iommu_hwdom_reserved =3D=3D -1 ) iommu_hwdom_reserved =3D 1; =20 + if ( iommu_hwdom_no_dma ) + /* Skip special mappings with no-dma mode */ + return; + if ( iommu_hwdom_inclusive ) { printk(XENLOG_WARNING diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h index 3c77dfaf41..55bd9c9704 100644 --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -108,6 +108,7 @@ extern bool iommu_debug; extern bool amd_iommu_perdev_intremap; =20 extern bool iommu_hwdom_strict, iommu_hwdom_passthrough, iommu_hwdom_inclu= sive; +extern bool iommu_hwdom_no_dma; extern int8_t iommu_hwdom_reserved; =20 extern unsigned int iommu_dev_iotlb_timeout; @@ -487,6 +488,8 @@ struct domain_iommu { /* SAF-2-safe enum constant in arithmetic operation */ DECLARE_BITMAP(features, IOMMU_FEAT_count); =20 + /* Do the IOMMU block all DMA on default context (implies !has_pt_shar= e) ? */ + bool no_dma; =20 /* Is the domain allowed to use PV-IOMMU ? */ bool allow_pv_iommu; --=20 2.51.2 -- Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech