From nobody Sat Nov 1 23:25:40 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=quarantine dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1739787520; cv=none; d=zohomail.com; s=zohoarc; b=kaPAQU/OR8Ll/foujs2eJst/i/bhsA61V0HGxU8aYRtVcKDgMUrUFmcuJfRfSy2xfVEjq3Y3IlsWD0TFx5F4PmLjO336bLK5fQXd/0RKgQpBycBr0NPhgqITPLrMpWpGDdPlUOK5vNPQuEyRT005Kn2yCCkOSZOAoarTb6r4rzg= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1739787520; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=XMOtAvrGaM4cVPK4lolM8QY7chB8LCcFir2Q9I4NJug=; b=CbqeaaENpDV8rAZpvD3jymapNQfGZG3kil18mNDC9+gzaOIeFMlyKVybONZqUAVaoS5uutiHklnFJQoKBRr/+pFWkrl5AQ2bEIFh4d6ENmNRDpxvpK1utVyD+9OkKeWjQ2PUKU1pcdPZnBBdhtxaKa40bM3evRD1cQSWvECiB9s= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1739787520173887.9179401756878; Mon, 17 Feb 2025 02:18:40 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.890002.1299035 (Exim 4.92) (envelope-from ) id 1tjyCm-0000CB-Pr; Mon, 17 Feb 2025 10:18:24 +0000 Received: by outflank-mailman (output) from mailman id 890002.1299035; Mon, 17 Feb 2025 10:18:24 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCm-0000C2-Mv; Mon, 17 Feb 2025 10:18:24 +0000 Received: by outflank-mailman (input) for mailman id 890002; Mon, 17 Feb 2025 10:18:23 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCl-0008Nl-8I for xen-devel@lists.xenproject.org; Mon, 17 Feb 2025 10:18:23 +0000 Received: from mail13.wdc04.mandrillapp.com (mail13.wdc04.mandrillapp.com [205.201.139.13]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 8397d8db-ed18-11ef-9aa6-95dc52dad729; Mon, 17 Feb 2025 11:18:22 +0100 (CET) Received: from pmta16.mandrill.prod.suw01.rsglab.com (localhost [127.0.0.1]) by mail13.wdc04.mandrillapp.com (Mailchimp) with ESMTP id 4YxJWx0NVfzNCdCdy for ; Mon, 17 Feb 2025 10:18:21 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id a38c4de8b2df431f987a70508b38062f; Mon, 17 Feb 2025 10:18:21 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 8397d8db-ed18-11ef-9aa6-95dc52dad729 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1739787501; x=1740057501; bh=XMOtAvrGaM4cVPK4lolM8QY7chB8LCcFir2Q9I4NJug=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=ROAaNtvwTHXAwDzvXVj6j7GffxBXuan0pM9RgHBMzssnilUk7aHsT/Di9O3uLPpb2 uCxReKIUXltJwD0UMSnzyfgBeDWoQ9TC+8Z6PNY555GRZs0nakDe3Mq7kk4BgR2rxA mP7KiZdcdPj8pUEn3lMKRJQa7ps2VxKeFdViwhobu8V1UyA3ZOpNlyh3zmVel+CVPy acl08bJq4iIo+ocN7tCANL2Xc8NNrwTosaaZY3lt7WJrxMHrBiGsBC9oeneLHqxZEG q3V+pHvq/5DMCenke3KfBMCKbyYu6wH0UtfsMq0frJq4TqOc//FdgK1YpsnHEJ1x3i pPJwWle/x+Ofg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1739787501; x=1740048001; i=teddy.astie@vates.tech; bh=XMOtAvrGaM4cVPK4lolM8QY7chB8LCcFir2Q9I4NJug=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=h8D8Jc5AiB9BaYZGWjN6u4AHpFFh2o/Ohf6XV8zA6fcy7cbkRTsz9lMnGvh9BmWIJ Yp/q2goaDX0lRxWlWpq8E11BBe6vqTGDdZ1P0qsJaSFg7WJLdTyDK98BmpJF1GkJmC 88z3ORRX5p+L4iak+qJYI1bG5xM1qpgxCYMHoYhiRuG+RSekPK8n7PJG8Chb7He0ES /tqpZ+c5kTiSL5rUoIhp+Q1NOdXgW8SdtUm76b3qLuyimBxpOJBy+lXo0MlPpeHm0a i7kSqKJvehdlom0C8AHJpoBWCz5EibeeHKfSAsfreBVn5VbIbFaye3yNlHpfX3mSfO iAOAwUXhrTlKg== From: "Teddy Astie" Subject: =?utf-8?Q?[XEN=20RFC=20PATCH=20v6=2001/11]=20docs/designs:=20Add=20a=20design=20document=20for=20IOMMU=20subsystem=20redesign?= X-Mailer: git-send-email 2.47.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1739787499431 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Andrew Cooper" , "Anthony PERARD" , "Michal Orzel" , "Jan Beulich" , "Julien Grall" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" , "Stefano Stabellini" Message-Id: In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.a38c4de8b2df431f987a70508b38062f?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20250217:md Date: Mon, 17 Feb 2025 10:18:21 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @mandrillapp.com) (identity teddy.astie@vates.tech) X-ZM-MESSAGEID: 1739787520620019000 Content-Type: text/plain; charset="utf-8" Current IOMMU subsystem has some limitations that make PV-IOMMU practically= impossible. One of them is the assumtion that each domain is bound to a single "IOMMU d= omain", which also causes complications with quarantine implementation. Moreover, current IOMMU subsystem is not entirely well-defined, for instanc= e, the behavior of map_page between ARM SMMUv3 and x86 VT-d/AMD-Vi greatly differs. On ARM,= it can modifies the domain page table while on x86, it may be forbidden (e.g using HAP with= PVH), or only modifying the devices PoV (e.g using PV). The goal of this redesign is to define more explicitely the behavior and in= terface of the IOMMU subsystem while allowing PV-IOMMU to be effectively implemented. Signed-off-by Teddy Astie --- docs/designs/iommu-contexts.md | 403 +++++++++++++++++++++++++++++++++ 1 file changed, 403 insertions(+) create mode 100644 docs/designs/iommu-contexts.md diff --git a/docs/designs/iommu-contexts.md b/docs/designs/iommu-contexts.md new file mode 100644 index 0000000000..d61c5fcde2 --- /dev/null +++ b/docs/designs/iommu-contexts.md @@ -0,0 +1,403 @@ +# IOMMU context management in Xen + +Status: Experimental +Revision: 0 + +# Background + +The design for *IOMMU paravirtualization for Dom0* [1] explains that some = guests may +want to access to IOMMU features. In order to implement this in Xen, sever= al adjustments +needs to be made to the IOMMU subsystem. + +This "hardware IOMMU domain" is currently implemented on a per-domain basi= s such as each +domain actually has a specific *hardware IOMMU domain*, this design aims t= o allow a +single Xen domain to manage several "IOMMU context", and allow some domain= s (e.g Dom0 +[1]) to modify their IOMMU contexts. + +In addition to this, quarantine feature can be refactored into using IOMMU= contexts +to reduce the complexity of platform-specific implementations and ensuring= more +consistency across platforms. + +# IOMMU context + +We define a "IOMMU context" as being a *hardware IOMMU domain*, but named = as a context +to avoid confusion with Xen domains. +It represents some hardware-specific data structure that contains mappings= from a device +frame-number to a machine frame-number (e.g using a pagetable) that can be= applied to +a device using IOMMU hardware. + +This structure is bound to a Xen domain, but a Xen domain may have several= IOMMU context. +These contexts may be modifiable using the interface as defined in [1] asi= de some +specific cases (e.g modifying default context). + +This is implemented in Xen as a new structure that will hold context-speci= fic +data. + +```c +struct iommu_context { + u16 id; /* Context id (0 means default context) */ + struct list_head devices; + + struct arch_iommu_context arch; + + bool opaque; /* context can't be modified nor accessed (e.g HAP) */ +}; +``` + +A context is identified by a number that is domain-specific and may be use= d by IOMMU +users such as PV-IOMMU by the guest. + +struct arch_iommu_context is splited from struct arch_iommu + +```c +struct arch_iommu_context +{ + spinlock_t pgtables_lock; + struct page_list_head pgtables; + + union { + /* Intel VT-d */ + struct { + uint64_t pgd_maddr; /* io page directory machine address */ + domid_t *didmap; /* per-iommu DID */ + unsigned long *iommu_bitmap; /* bitmap of iommu(s) that the co= ntext uses */ + } vtd; + /* AMD IOMMU */ + struct { + struct page_info *root_table; + } amd; + }; +}; + +struct arch_iommu +{ + spinlock_t mapping_lock; /* io page table lock */ + struct { + struct page_list_head list; + spinlock_t lock; + } pgtables; + + struct list_head identity_maps; + + union { + /* Intel VT-d */ + struct { + /* no more context-specific values */ + unsigned int agaw; /* adjusted guest address width, 0 is level= 2 30-bit */ + } vtd; + /* AMD IOMMU */ + struct { + unsigned int paging_mode; + struct guest_iommu *g_iommu; + } amd; + }; +}; +``` + +IOMMU context information is now carried by iommu_context rather than bein= g integrated to +struct arch_iommu. + +# Xen domain IOMMU structure + +`struct domain_iommu` is modified to allow multiples context within a sing= le Xen domain +to exist : + +```c +struct iommu_context_list { + uint16_t count; /* Context count excluding default context */ + + /* if count > 0 */ + + uint64_t *bitmap; /* bitmap of context allocation */ + struct iommu_context *map; /* Map of contexts */ +}; + +struct domain_iommu { + /* ... */ + + struct iommu_context default_ctx; + struct iommu_context_list other_contexts; + + /* ... */ +} +``` + +default_ctx is a special context with id=3D0 that holds the page table map= ping the entire +domain, which basically preserve the previous behavior. All devices are ex= pected to be +bound to this context during initialization. + +Along with this default context that always exist, we use a pool of contex= ts that has a +fixed size at domain initialization, where contexts can be allocated (if p= ossible), and +have a id matching their position in the map (considering that id !=3D 0). +These contexts may be used by IOMMU contexts users such as PV-IOMMU or qua= rantine domain +(DomIO). + +# Platform independent context management interface + +A new platform independant interface is introduced in Xen hypervisor to al= low +IOMMU contexts users to create and manage contexts within domains. + +```c +/* Direct context access functions (not supposed to be used directly) */ +struct iommu_context *iommu_get_context(struct domain *d, u16 ctx_id); +void iommu_put_context(struct iommu_context *ctx); + +/* Flag for default context initialization */ +#define IOMMU_CONTEXT_INIT_default (1 << 0) + +/* Flag for quarantine contexts (scratch page, DMA Abort mode, ...) */ +#define IOMMU_CONTEXT_INIT_quarantine (1 << 1) + +int iommu_context_init(struct domain *d, struct iommu_context *ctx, u16 ct= x_id, u32 flags); + +/* Flag to specify that devices will need to be reattached to default doma= in */ +#define IOMMU_TEARDOWN_REATTACH_DEFAULT (1 << 0) + +/* + * Flag to specify that the context needs to be destroyed preemptively + * (multiple calls to iommu_context_teardown will be required) + */ +#define IOMMU_TEARDOWN_PREEMPT (1 << 1) + +int iommu_context_teardown(struct domain *d, struct iommu_context *ctx, u3= 2 flags); + +/* Allocate a new context, uses CONTEXT_INIT flags */ +int iommu_context_alloc(struct domain *d, u16 *ctx_id, u32 flags); + +/* Free a context, uses CONTEXT_TEARDOWN flags */ +int iommu_context_free(struct domain *d, u16 ctx_id, u32 flags); + +/* Move a device from one context to another, including between different = domains. */ +int iommu_reattach_context(struct domain *prev_dom, struct domain *next_do= m, + device_t *dev, u16 ctx_id); + +/* Add a device to a context for first initialization */ +int iommu_attach_context(struct domain *d, device_t *dev, u16 ctx_id); + +/* Remove a device from a context, effectively removing it from the IOMMU.= */ +int iommu_detach_context(struct domain *d, device_t *dev); +``` + +This interface will use a new interface with drivers to implement these fe= atures. + +Some existing functions will have a new parameter to specify on what conte= xt to do the operation. +- iommu_map (iommu_legacy_map untouched) +- iommu_unmap (iommu_legacy_unmap untouched) +- iommu_lookup_page +- iommu_iotlb_flush + +These functions will modify the iommu_context structure to accomodate with= the +operations applied, these functions will be used to replace some operation= s previously +made in the IOMMU driver. + +# IOMMU platform_ops interface changes + +The IOMMU driver needs to expose a way to create and manage IOMMU contexts= , the approach +taken here is to modify the interface to allow specifying a IOMMU context = on operations, +and at the same time, simplifying the interface by relying more on iommu +platform-independent code. + +Added functions in iommu_ops + +```c +/* Initialize a context (creating page tables, allocating hardware, struct= ures, ...) */ +int (*context_init)(struct domain *d, struct iommu_context *ctx, + u32 flags); +/* Destroy a context, assumes no device is bound to the context. */ +int (*context_teardown)(struct domain *d, struct iommu_context *ctx, + u32 flags); +/* Put a device in a context (assumes the device is not attached to anothe= r context) */ +int (*attach)(struct domain *d, device_t *dev, + struct iommu_context *ctx); +/* Remove a device from a context, and from the IOMMU. */ +int (*detach)(struct domain *d, device_t *dev, + struct iommu_context *prev_ctx); +/* Move the device from a context to another, including if the new context= is in + another domain. d corresponds to the target domain. */ +int (*reattach)(struct domain *d, device_t *dev, + struct iommu_context *prev_ctx, + struct iommu_context *ctx); + +#ifdef CONFIG_HAS_PCI +/* Specific interface for phantom function devices. */ +int (*add_devfn)(struct domain *d, struct pci_dev *pdev, u16 devfn, + struct iommu_context *ctx); +int (*remove_devfn)(struct domain *d, struct pci_dev *pdev, u16 devfn, + struct iommu_context *ctx); +#endif + +/* Changes in existing to use a specified iommu_context. */ +int __must_check (*map_page)(struct domain *d, dfn_t dfn, mfn_t mfn, + unsigned int flags, + unsigned int *flush_flags, + struct iommu_context *ctx); +int __must_check (*unmap_page)(struct domain *d, dfn_t dfn, + unsigned int order, + unsigned int *flush_flags, + struct iommu_context *ctx); +int __must_check (*lookup_page)(struct domain *d, dfn_t dfn, mfn_t *mfn, + unsigned int *flags, + struct iommu_context *ctx); + +int __must_check (*iotlb_flush)(struct domain *d, + struct iommu_context *ctx, dfn_t dfn, + unsigned long page_count, + unsigned int flush_flags); + +void (*clear_root_pgtable)(struct domain *d, struct iommu_context *ctx); +``` + +These functions are redundant with existing functions, therefore, the foll= owing functions +are replaced with new equivalents : +- quarantine_init : platform-independent code and IOMMU_CONTEXT_INIT_quara= ntine flag +- add_device : attach and add_devfn (phantom) +- assign_device : attach and add_devfn (phantom) +- remove_device : detach and remove_devfn (phantom) +- reassign_device : reattach + +Some functionnal differences with previous functions, the following should= be handled +by platform-independent/arch-specific code instead of IOMMU driver : +- identity mappings (unity mappings and rmrr) +- device list in context and domain +- domain of a device +- quarantine + +The idea behind this is to implement IOMMU context features while simplify= ing IOMMU +drivers implementations and ensuring more consistency between IOMMU driver= s. + +## Phantom function handling + +PCI devices may use additionnal devfn to do DMA operations, in order to su= pport such +devices, an interface is added to map specific device functions without im= plying that +the device is mapped to a new context (that may cause duplicates in Xen da= ta structures). + +Functions add_devfn and remove_devfn allows to map a iommu context on spec= ific devfn +for a pci device, without altering platform-independent data structures. + +It is important for the reattach operation to care about these devices, in= order +to prevent devices from being partially reattached to the new context (see= XSA-449 [2]) +by using a all-or-nothing approach for reattaching such devices. + +# Quarantine refactoring using IOMMU contexts + +The quarantine mecanism can be entirely reimplemented using IOMMU context,= making +it simpler, more consistent between platforms, + +Quarantine is currently only supported with x86 platforms and works by cre= ating a +single *hardware IOMMU domain* per quarantined device. All the quarantine = logic is +the implemented in a platform-specific fashion while actually implementing= the same +concepts : + +The *hardware IOMMU context* data structures for quarantine are currently = stored in +the device structure itself (using arch_pci_dev) and IOMMU driver needs to= care about +whether we are dealing with quarantine operations or regular operations (o= ften dealt +using macros such as QUARANTINE_SKIP or DEVICE_PGTABLE). + +The page table that will apply on the quarantined device is created reserv= ed device +regions, and adding mappings to a scratch page if enabled (quarantine=3Dsc= ratch-page). + +A new approach we can use is allowing the quarantine domain (DomIO) to man= age IOMMU +contexts, and implement all the quarantine logic using IOMMU contexts. + +That way, the quarantine implementation can be platform-independent, thus = have a more +consistent implementation between platforms. It will also allows quarantin= e to work +with other IOMMU implementations without having to implement platform-spec= ific behavior. +Moreover, quarantine operations can be implemented using regular context o= perations +instead of relying on driver-specific code. + +Quarantine implementation can be summarised as + +```c +int iommu_quarantine_dev_init(device_t *dev) +{ + int ret; + u16 ctx_id; + + if ( !iommu_quarantine ) + return -EINVAL; + + ret =3D iommu_context_alloc(dom_io, &ctx_id, IOMMU_CONTEXT_INIT_quaran= tine); + + if ( ret ) + return ret; + + /** TODO: Setup scratch page, mappings... */ + + ret =3D iommu_reattach_context(dev->domain, dom_io, dev, ctx_id); + + if ( ret ) + { + ASSERT(!iommu_context_free(dom_io, ctx_id, 0)); + return ret; + } + + return ret; +} +``` + +# Platform-specific considerations + +## Reference counters on target pages + +When mapping a guest page onto a IOMMU context, we need to make sure that +this page is not reused for something else while being actually referenced +by a IOMMU context. One way of doing it is incrementing the reference coun= ter +of each target page we map (excluding reserved regions), and decrementing = it +when the mapping isn't used anymore. + +One consideration to have is when destroying the context while having exis= ting +mappings in it. We can walk through the entire page table and decrement the +reference counter of all mappings. All of that assumes that there is no re= served +region mapped (which should be the case as a requirement of teardown, or a= s a +consequence of REATTACH_DEFAULT flag). + +Another consideration is that the "cleanup mappings" operation may take a = lot +of time depending on the complexity of the page table. Making the teardown= operation preemptable can allow the hypercall to be preempted if needed al= so preventing a malicious +guest from stalling a CPU in a teardown operation with a specially crafted= IOMMU +context (e.g with several 1G superpages). + +## Limit the amount of pages IOMMU contexts can use + +In order to prevent a (eventually malicious) guest from causing too much a= llocations +in Xen, we can enforce limits on the memory the IOMMU subsystem can use fo= r IOMMU context. +A possible implementation can be to preallocate a reasonably large chunk o= f memory +and split it into pages for use by the IOMMU subsystem only for non-defaul= t IOMMU +contexts (e.g PV-IOMMU interface), if this limitation is overcome, some op= erations +may fail from the guest side. These limitations shouldn't impact "usual" o= perations +of the IOMMU subsystem (e.g default context initialization). + +## x86 Architecture + +TODO + +### Intel VT-d + +VT-d uses DID to tag the *IOMMU domain* applied to a device and assumes th= at all entries +with the same DID uses the same page table (i.e same IOMMU context). +Under certain circonstances (e.g DRHD with DID limit below 16-bits), the *= DID* is +transparently converted into a DRHD-specific DID using a map managed inter= nally. + +The current implementation of the code reuses the Xen domain_id as DID. +However, by using multiples IOMMU contexts per domain, we can't use the do= main_id for +contexts (otherwise, different page tables will be mapped with the same DI= D). +The following strategy is used : +- on the default context, reuse the domain_id (the default context is uniq= ue per domain) +- on non-default context, use a id allocated in the pseudo_domid map, (act= ually used by +quarantine) which is a DID outside of Xen domain_id range + +### AMD-Vi + +TODO + +## Device-tree platforms + +### SMMU and SMMUv3 + +TODO + +* * * + +[1] See pv-iommu.md + +[2] pci: phantom functions assigned to incorrect contexts +https://xenbits.xen.org/xsa/advisory-449.html \ No newline at end of file --=20 2.47.2 Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sat Nov 1 23:25:40 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=quarantine dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1739787527; cv=none; d=zohomail.com; s=zohoarc; b=BJT6+LAAL7J6VCfcn+1yn+Xhuq6Ue5mMQCjlP0L+e15bc5C7Ej6u+jCvpPS0ybVAMXiEm/rC/CyMmifjx58pHCwWh0Vr8LIPGdmNNfFqMwjqG9Gpix75hmGotdnEJvRfkyEsjrhueyBo3A8RAhtzFikdvZuTjSBXA+9t1imQZHc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1739787527; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=/PHfFCbkhgewAMWb/3Oro9AHwZl2KWijE79mLerB77A=; b=MH8xsOEfDJ3oeRhV2dSth7DjmRn1Rz/aJJnPhnLTHz9cWGvtzfjnM/FY6rxJ7QYbN9RYqDw1MBLuCe3pezRXqpaV5xiDFGSgjeRpvUpBmahYrjsvWuML3GC0BY9yvXkRvcI6Gw3m4pgCHfoyF/koZHbltC2xdAhPyPeE4K2TiuE= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1739787527528549.4856003390358; Mon, 17 Feb 2025 02:18:47 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.890003.1299041 (Exim 4.92) (envelope-from ) id 1tjyCn-0000Js-7G; Mon, 17 Feb 2025 10:18:25 +0000 Received: by outflank-mailman (output) from mailman id 890003.1299041; Mon, 17 Feb 2025 10:18:25 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCn-0000HP-2e; Mon, 17 Feb 2025 10:18:25 +0000 Received: by outflank-mailman (input) for mailman id 890003; Mon, 17 Feb 2025 10:18:24 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCm-0007sm-59 for xen-devel@lists.xenproject.org; Mon, 17 Feb 2025 10:18:24 +0000 Received: from mail178-27.suw51.mandrillapp.com (mail178-27.suw51.mandrillapp.com [198.2.178.27]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id 835f7ba5-ed18-11ef-9896-31a8f345e629; Mon, 17 Feb 2025 11:18:21 +0100 (CET) Received: from pmta13.mandrill.prod.suw01.rsglab.com (localhost [127.0.0.1]) by mail178-27.suw51.mandrillapp.com (Mailchimp) with ESMTP id 4YxJWx1wGFz6CSstF for ; Mon, 17 Feb 2025 10:18:21 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id 20bbf2969e814bab919bb8bbf7cff5ee; Mon, 17 Feb 2025 10:18:21 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 835f7ba5-ed18-11ef-9896-31a8f345e629 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1739787501; x=1740057501; bh=/PHfFCbkhgewAMWb/3Oro9AHwZl2KWijE79mLerB77A=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=Gp+3+rscyaRz66gfi/zO872Bcr/YgmjpwZwzZMOkb6m1IXCIttyF5ur0N7E2v3j6K 0oG8bVlxW857Vgb7L/8H2UTRLFRKU7a32r769nH51GSCV2In9Y3Rz0p14fzlq94dXZ SRITZI7Xiub5cdQYPBAqi5+O5RzC3mjVEtvULBjktJgvxHitb7joUFxka+OCu+KzsN EoXUA6yARHLxsEdbFtp+3/TUHBfB9k8GVUTZC92WIDqSWome+r/lVvwGBxMifAs1wX aX9l2JkVcEQoFwh+e8jHommyqZZKd9K/x3Ia0kZKZEQyZJM0qCV5w1hLaz4fi7GoPD g68RKIrjpy6vg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1739787501; x=1740048001; i=teddy.astie@vates.tech; bh=/PHfFCbkhgewAMWb/3Oro9AHwZl2KWijE79mLerB77A=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=Xd/q5q5Z9ranuo4FSO1oWWLFZhDc5is3RzZ+d08IqgYJGNkspl1EawLXqJWGcNAC3 e3vgNnOfImGgfuUdjmOTj5VzF2MEZte1J5Q7aHFZZ4KlO+bKk02TQUV4QwDQrxzp8B /2pghenlaZBFuSSV2biN65mX+SOYmbAznVPClLv9hN3AdKUYI5/vIiBz6xVh84364G h210CQz5YNZ+tpEDVA+qBgW4qOhyYNHipLduZKe5df4Ge/aP+ceaueaR7Vny4hj2R4 OScG/BW/RbqFqjTP0An2tBiC8w6Z56pWsAhS4I3DAXRFat7urz1Irk4tdlFKuR3dh2 BmocShdCYrrmg== From: "Teddy Astie" Subject: =?utf-8?Q?[XEN=20RFC=20PATCH=20v6=2002/11]=20docs/designs:=20Add=20a=20design=20document=20for=20PV-IOMMU?= X-Mailer: git-send-email 2.47.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1739787499836 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Andrew Cooper" , "Anthony PERARD" , "Michal Orzel" , "Jan Beulich" , "Julien Grall" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" , "Stefano Stabellini" Message-Id: In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.20bbf2969e814bab919bb8bbf7cff5ee?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20250217:md Date: Mon, 17 Feb 2025 10:18:21 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @mandrillapp.com) (identity teddy.astie@vates.tech) X-ZM-MESSAGEID: 1739787529482019100 Content-Type: text/plain; charset="utf-8" Some operating systems want to use IOMMU to implement various features (e.g VFIO) or DMA protection. This patch introduce a proposal for IOMMU paravirtualization for Dom0. Signed-off-by: Teddy Astie --- docs/designs/pv-iommu.md | 116 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 116 insertions(+) create mode 100644 docs/designs/pv-iommu.md diff --git a/docs/designs/pv-iommu.md b/docs/designs/pv-iommu.md new file mode 100644 index 0000000000..7df9fa0b94 --- /dev/null +++ b/docs/designs/pv-iommu.md @@ -0,0 +1,116 @@ +# IOMMU paravirtualization for Dom0 + +Status: Experimental + +# Background + +By default, Xen only uses the IOMMU for itself, either to make device adre= ss +space coherent with guest adress space (x86 HVM/PVH) or to prevent devices +from doing DMA outside it's expected memory regions including the hypervis= or +(x86 PV). + +A limitation is that guests (especially privildged ones) may want to use +IOMMU hardware in order to implement features such as DMA protection and +VFIO [1] as IOMMU functionality is not available outside of the hypervisor +currently. + +[1] VFIO - "Virtual Function I/O" - https://www.kernel.org/doc/html/latest= /driver-api/vfio.html + +# Design + +The operating system may want to have access to various IOMMU features suc= h as +context management and DMA remapping. We can create a new hypercall that a= llows +the guest to have access to a new paravirtualized IOMMU interface. + +This feature is only meant to be available for the Dom0, as DomU have some +emulated devices that can't be managed on Xen side and are not hardware, we +can't rely on the hardware IOMMU to enforce DMA remapping. + +This interface is exposed under the `iommu_op` hypercall. + +In addition, Xen domains are modified in order to allow existence of sever= al +IOMMU context including a default one that implement default behavior (e.g +hardware assisted paging) and can't be modified by guest. DomU cannot have +contexts, and therefore act as if they only have the default domain. + +Each IOMMU context within a Xen domain is identified using a domain-specif= ic +context number that is used in the Xen IOMMU subsystem and the hypercall +interface. + +The number of IOMMU context a domain is specified by either the toolstack = or +the domain itself. + +# IOMMU operations + +## Initialize PV-IOMMU + +Initialize PV-IOMMU for the domain. +It can only be called once. + +## Alloc context + +Create a new IOMMU context for the guest and return the context number to = the +guest. +Fail if the IOMMU context limit of the guest is reached. + +A flag can be specified to create a identity mapping. + +## Free context + +Destroy a IOMMU context created previously. +It is not possible to free the default context. + +Reattach context devices to default context if specified by the guest. + +Fail if there is a device in the context and reattach-to-default flag is n= ot +specified. + +## Reattach device + +Reattach a device to another IOMMU context (including the default one). +The target IOMMU context number must be valid and the context allocated. + +The guest needs to specify a PCI SBDF of a device he has access to. + +## Map/unmap page + +Map/unmap a page on a context. +The guest needs to specify a gfn and target dfn to map. + +Refuse to create the mapping if one already exist for the same dfn. + +## Lookup page + +Get the gfn mapped by a specific dfn. + +## Remote command + +Make a PV-IOMMU operation on behalf of another domain. +Especially useful for implementing IOMMU emulation (e.g using QEMU) +or initializing PV-IOMMU with enforced limits. + +# Implementation considerations + +## Hypercall batching + +In order to prevent unneeded hypercalls and IOMMU flushing, it is advisabl= e to +be able to batch some critical IOMMU operations (e.g map/unmap multiple pa= ges). + +## Hardware without IOMMU support + +Operating system needs to be aware on PV-IOMMU capability, and whether it = is +able to make contexts. However, some operating system may critically fail = in +case they are able to make a new IOMMU context. Which is supposed to happen +if no IOMMU hardware is available. + +The hypercall interface needs a interface to advertise the ability to crea= te +and manage IOMMU contexts including the amount of context the guest is able +to use. Using these informations, the Dom0 may decide whether to use or not +the PV-IOMMU interface. + +## Page pool for contexts + +In order to prevent unexpected starving on the hypervisor memory with a +buggy Dom0. We can preallocate the pages the contexts will use and make +map/unmap use these pages instead of allocating them dynamically. + --=20 2.47.2 Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sat Nov 1 23:25:40 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=quarantine dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1739787533; cv=none; d=zohomail.com; s=zohoarc; b=eZnc+PEIfMsDc3isNKUwTfbBU8dVfwB1D9t+mNegWYv7+eCuJjv8EnRu2dAA3sOJzxs9J2BTSAmg1s7bKnVMLEMnQAwIy6+/iAuXzO+NVJLyW9nCr+ouFQKtLYmJvJSqJ2qChZaIq5FCEHs/NvX9l/6RgnOpf1jBtfwkZLL8l5g= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1739787533; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=4U0gQ1LwmfSJjBNx+a/TBnpKULERm/cQUAt5K2wmoPM=; b=NaC8hyDVjSgXVJX8Wa08RKAKD3xskazCpow5zaZFDaiu+fi6g4BfytNwiRf/s9ZO7YJzj54EvX4Wm+W7KIbPvYlJITojtvrZJYuwFhx++U9B5JvaM0b4Wy8VaYtZM6YO+wERbNIRMHKqXLOqOlfz8JoZBWrRpZ/8iWFGokcVaPA= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1739787532990347.62665839422027; Mon, 17 Feb 2025 02:18:52 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.890007.1299077 (Exim 4.92) (envelope-from ) id 1tjyCq-000165-1M; Mon, 17 Feb 2025 10:18:28 +0000 Received: by outflank-mailman (output) from mailman id 890007.1299077; Mon, 17 Feb 2025 10:18:27 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCp-00014y-KV; Mon, 17 Feb 2025 10:18:27 +0000 Received: by outflank-mailman (input) for mailman id 890007; Mon, 17 Feb 2025 10:18:26 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCo-0008Nl-8u for xen-devel@lists.xenproject.org; Mon, 17 Feb 2025 10:18:26 +0000 Received: from mail178-27.suw51.mandrillapp.com (mail178-27.suw51.mandrillapp.com [198.2.178.27]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 853ebf0b-ed18-11ef-9aa6-95dc52dad729; Mon, 17 Feb 2025 11:18:25 +0100 (CET) Received: from pmta13.mandrill.prod.suw01.rsglab.com (localhost [127.0.0.1]) by mail178-27.suw51.mandrillapp.com (Mailchimp) with ESMTP id 4YxJWy43Xhz6CPyQG for ; Mon, 17 Feb 2025 10:18:22 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id 83694620b59941faaa58546f2c7bf24c; Mon, 17 Feb 2025 10:18:22 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 853ebf0b-ed18-11ef-9aa6-95dc52dad729 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1739787502; x=1740057502; bh=4U0gQ1LwmfSJjBNx+a/TBnpKULERm/cQUAt5K2wmoPM=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=WHOegyZehYdUxq9ncTPTLcDG+LemZbXY3ePpQlwwB8qNFOcQfLtFZkhtuZ+qEKZa3 D1RzS1xoSgW44cHabMP6M93o2+zWPMWGKNpi4PyRsMJEsatJBhWTN0KnDPB4OpzH+K eafxDJvLsNmt5wyMtj/rtakJkVl57TgBEpo3Bj4xyg5vXOHKnaQZDglh+pBOPojjGt Wiru4nLcKO51RLYDn6O5Y1GzY4EsDOOPp0waVKHiuZiDjduf+xX9+qvOu1tx5WXrm2 Q9GagtHhRNHGHtz8ZmbSwrfXcUCFrV1YxF4wVnU8lcDBA3mJO3RIV3h/VfiGgPJd8g 0RKUOKF69ApbA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1739787502; x=1740048002; i=teddy.astie@vates.tech; bh=4U0gQ1LwmfSJjBNx+a/TBnpKULERm/cQUAt5K2wmoPM=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=qw3ThI/gyKATkVg6M3+CmClz8rrBmKerozE8KvgquMtkdN+gnf8NksoaRsSJttzLn HnnKYzCpMyFy4UerZ33DJRoYI/FQSZypbfREWpYbA2ZDd3S5PBOd3QEpBehBQtWjtt Q/6C6Cbs5x5LMZ23+6FaBjLWJDn3OF1ha5re2EJhGz2rRGCoQwbjvG1uxPMujKGQD0 KbdK/c6Lbhji3jVRy7BU9tAdSb7uT0Nq0YE7Gfa0X8aQFZrIpEoO1tzPU3D2bLD7RM 2C/XKHdYn+DY9CYsV1lE9R0vP5p5NpevBXgwLe6254U9i8qZs60QdSzNITn6njD67J TdmliTRamIFkQ== From: "Teddy Astie" Subject: =?utf-8?Q?[XEN=20RFC=20PATCH=20v6=2003/11]=20x86/domain:=20Defer=20domain=20iommu=20initialization.?= X-Mailer: git-send-email 2.47.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1739787500080 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Jan Beulich" , "Andrew Cooper" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" Message-Id: <03ef72e582221299e12c44176dbbe31ce5da9261.1739785339.git.teddy.astie@vates.tech> In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.83694620b59941faaa58546f2c7bf24c?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20250217:md Date: Mon, 17 Feb 2025 10:18:22 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity teddy.astie@vates.tech) (identity @mandrillapp.com) X-ZM-MESSAGEID: 1739787535417019100 Content-Type: text/plain; charset="utf-8" For the IOMMU redesign, the iommu context pagetable is defined once during initialization. When reusing P2M pagetable, we want to ensure that this pagetable is properly initialized. Signed-off-by: Teddy Astie --- xen/arch/x86/domain.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 78a13e6812..48bf9625e2 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -858,9 +858,6 @@ int arch_domain_create(struct domain *d, if ( (rc =3D init_domain_irq_mapping(d)) !=3D 0 ) goto fail; =20 - if ( (rc =3D iommu_domain_init(d, config->iommu_opts)) !=3D 0 ) - goto fail; - psr_domain_init(d); =20 if ( is_hvm_domain(d) ) @@ -879,6 +876,9 @@ int arch_domain_create(struct domain *d, else ASSERT_UNREACHABLE(); /* Not HVM and not PV? */ =20 + if ( (rc =3D iommu_domain_init(d, config->iommu_opts)) !=3D 0 ) + goto fail; + if ( (rc =3D tsc_set_info(d, XEN_CPUID_TSC_MODE_DEFAULT, 0, 0, 0)) != =3D 0 ) { ASSERT_UNREACHABLE(); --=20 2.47.2 Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sat Nov 1 23:25:40 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=quarantine dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1739787540; cv=none; d=zohomail.com; s=zohoarc; b=kPIiXID5DxPbut/MdzGALbiFvtWQiu939McEyx3QFrsVEeeafm9Jo3AX5vDw8voLaQSjL+CCizXcZj9lwJ9H1jZ/TcCG9mZMmdtSjqUlzHJhInMkfO+tOkCTLLMIenSkOVtWshr/dHBnmLMZRfDjeLVFrSzDftJwYO63LogHws8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1739787540; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=bzfNvWE5kbPAbUkvS+oqP/fh38Hyfkhqt9wTC3oGwMA=; b=KOFIH6O68m2WXGN99CZLOurGb/7Zi226p3HEMhxauXN1j95UWHvafeJVtunVppIx2Zgno4GtUT2KL3r+m0bmLm/Y0SQNw1XVhlFISvi6FCsMyETUqVsRqoYrJcwFZctab+/stjQRJKlZ/M9cpTTdJKaNy3dNF0aMKko2sDIU7Io= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1739787540349372.3294304907166; Mon, 17 Feb 2025 02:19:00 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.890010.1299106 (Exim 4.92) (envelope-from ) id 1tjyCt-0001u7-2D; Mon, 17 Feb 2025 10:18:31 +0000 Received: by outflank-mailman (output) from mailman id 890010.1299106; Mon, 17 Feb 2025 10:18:30 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCs-0001qL-GC; Mon, 17 Feb 2025 10:18:30 +0000 Received: by outflank-mailman (input) for mailman id 890010; Mon, 17 Feb 2025 10:18:29 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCr-0008Nl-9f for xen-devel@lists.xenproject.org; Mon, 17 Feb 2025 10:18:29 +0000 Received: from mail13.wdc04.mandrillapp.com (mail13.wdc04.mandrillapp.com [205.201.139.13]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 852e5ae5-ed18-11ef-9aa6-95dc52dad729; Mon, 17 Feb 2025 11:18:24 +0100 (CET) Received: from pmta16.mandrill.prod.suw01.rsglab.com (localhost [127.0.0.1]) by mail13.wdc04.mandrillapp.com (Mailchimp) with ESMTP id 4YxJWz6N4FzNCdCf0 for ; Mon, 17 Feb 2025 10:18:23 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id a3bc10c83c634101bf8dbbdbe1fbb292; Mon, 17 Feb 2025 10:18:23 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 852e5ae5-ed18-11ef-9aa6-95dc52dad729 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1739787503; x=1740057503; bh=bzfNvWE5kbPAbUkvS+oqP/fh38Hyfkhqt9wTC3oGwMA=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=P4pJcXtqIZb8rjypb3pSidHMF8UD8INqTrYrMJSrQl9wIUyuDlj75Yaz9fPRkOeV7 AtdSBnWYyHD4kuKtNDohL/MyaOd4OFwdSBCy99Q4gYzOfg2/FDyJ7OFda7hLh3oeMr 992fKEERMmk8s+GKgCaAOVd3KdE4NeBBzDNfUokdksPm/1fDwNhLC5X4pSYx+zbI84 qIx49jS4ky5CIXAZwbTE0IT/Tu91YBd6Ty9Oz2Q3r0AuqZp7ytPlqJX4VC959h5oF5 55RHP6O3GiAVBFASY6Lqu9NcLqnuxPt3TaSS1w5ud0uWyAx2ma/8InNOeGMTewFJER zLHJfprVjkrtA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1739787503; x=1740048003; i=teddy.astie@vates.tech; bh=bzfNvWE5kbPAbUkvS+oqP/fh38Hyfkhqt9wTC3oGwMA=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=rJatvqD04TbD9W1Ss7x6UBVWoVwBH8KBw0VueBTnpOTaWYd6OeyADqBs/3BtLd72i /f+yrxBFsGsIndLYAoPKMZCF429PVAvn6oSKU5I1MnkcnvWzKrb2E1iyQXHSEebmpx JSABY4RvZXoulbZcGxO/NnNlisBBpG2oRsdFEFLPii32MoAUG+QuO+kk0r33Kzim01 Z9Fn9WwJ8lKNI2HKRADAj7Q8wiCAvUTuTlvKbnAIHtMP26QvUyt7KZZrTeBPZX+3g9 gtNwSruJwgNOjCQKxTh9usHbkNVi/HJ6AaFp1WISikPaiiWYttAXEjMN1g6QB34d69 aZF8JqmEyt7MA== From: "Teddy Astie" Subject: =?utf-8?Q?[XEN=20RFC=20PATCH=20v6=2004/11]=20iommu:=20Move=20IOMMU=20domain=20related=20structures=20to=20(arch=5F)iommu=5Fcontext?= X-Mailer: git-send-email 2.47.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1739787500484 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Stefano Stabellini" , "Julien Grall" , "Bertrand Marquis" , "Michal Orzel" , "Volodymyr Babchuk" , "Shawn Anastasio" , "Jan Beulich" , "Andrew Cooper" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" , "Lukasz Hawrylko" , "Daniel P. Smith" , "=?utf-8?Q?Mateusz=20M=C3=B3wka?=" Message-Id: <0cd25b4114458dc957c0fb818d01162dfab9548b.1739785339.git.teddy.astie@vates.tech> In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.a3bc10c83c634101bf8dbbdbe1fbb292?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20250217:md Date: Mon, 17 Feb 2025 10:18:23 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity teddy.astie@vates.tech) (identity @mandrillapp.com) X-ZM-MESSAGEID: 1739787540878019000 Content-Type: text/plain; charset="utf-8" Preparatory work for IOMMU redesign. Introduce a new structure (arch_)iommu_context that will hold all per-IOMMU context related informations for the IOMMU drivers. Signed-off-by: Teddy Astie --- xen/arch/arm/include/asm/iommu.h | 4 + xen/arch/ppc/include/asm/iommu.h | 3 + xen/arch/x86/domain.c | 4 +- xen/arch/x86/include/asm/iommu.h | 50 +++-- xen/arch/x86/tboot.c | 3 +- xen/drivers/passthrough/amd/iommu.h | 5 +- xen/drivers/passthrough/amd/iommu_init.c | 8 +- xen/drivers/passthrough/amd/iommu_map.c | 102 +++++----- xen/drivers/passthrough/amd/pci_amd_iommu.c | 81 ++++---- xen/drivers/passthrough/iommu.c | 6 + xen/drivers/passthrough/vtd/extern.h | 4 +- xen/drivers/passthrough/vtd/iommu.c | 208 +++++++++++--------- xen/drivers/passthrough/vtd/quirks.c | 3 +- xen/drivers/passthrough/x86/iommu.c | 62 +++--- xen/include/xen/iommu.h | 10 + 15 files changed, 320 insertions(+), 233 deletions(-) diff --git a/xen/arch/arm/include/asm/iommu.h b/xen/arch/arm/include/asm/io= mmu.h index d57bd8a38c..5ca56cc663 100644 --- a/xen/arch/arm/include/asm/iommu.h +++ b/xen/arch/arm/include/asm/iommu.h @@ -20,6 +20,10 @@ struct arch_iommu void *priv; }; =20 +struct arch_iommu_context +{ +}; + const struct iommu_ops *iommu_get_ops(void); void iommu_set_ops(const struct iommu_ops *ops); =20 diff --git a/xen/arch/ppc/include/asm/iommu.h b/xen/arch/ppc/include/asm/io= mmu.h index 024ead3473..8367505de2 100644 --- a/xen/arch/ppc/include/asm/iommu.h +++ b/xen/arch/ppc/include/asm/iommu.h @@ -5,4 +5,7 @@ struct arch_iommu { }; =20 +struct arch_iommu_context { +}; + #endif /* __ASM_PPC_IOMMU_H__ */ diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 48bf9625e2..26729c879c 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -678,7 +678,7 @@ int arch_sanitise_domain_config(struct xen_domctl_creat= edomain *config) if ( nested_virt && !hvm_nested_virt_supported() ) { dprintk(XENLOG_INFO, "Nested virt requested but not available\n"); - return -EINVAL; =20 + return -EINVAL; } =20 if ( nested_virt && !hap ) @@ -2392,7 +2392,7 @@ int domain_relinquish_resources(struct domain *d) =20 PROGRESS(iommu_pagetables): =20 - ret =3D iommu_free_pgtables(d); + ret =3D iommu_free_pgtables(d, iommu_default_context(d)); if ( ret ) return ret; =20 diff --git a/xen/arch/x86/include/asm/iommu.h b/xen/arch/x86/include/asm/io= mmu.h index 8dc464fbd3..94513ba9dc 100644 --- a/xen/arch/x86/include/asm/iommu.h +++ b/xen/arch/x86/include/asm/iommu.h @@ -31,22 +31,21 @@ typedef uint64_t daddr_t; #define dfn_to_daddr(dfn) __dfn_to_daddr(dfn_x(dfn)) #define daddr_to_dfn(daddr) _dfn(__daddr_to_dfn(daddr)) =20 -struct arch_iommu -{ - spinlock_t mapping_lock; /* io page table lock */ - struct { - struct page_list_head list; - spinlock_t lock; - } pgtables; +struct iommu_context; =20 +struct arch_iommu_context +{ + struct page_list_head pgtables; struct list_head identity_maps; =20 + + spinlock_t mapping_lock; /* io page table lock */ + union { /* Intel VT-d */ struct { uint64_t pgd_maddr; /* io page directory machine address */ - unsigned int agaw; /* adjusted guest address width, 0 is level= 2 30-bit */ - unsigned long *iommu_bitmap; /* bitmap of iommu(s) that the do= main uses */ + unsigned long *iommu_bitmap; /* bitmap of iommu(s) that the co= ntext uses */ } vtd; /* AMD IOMMU */ struct { @@ -56,6 +55,24 @@ struct arch_iommu }; }; =20 +struct arch_iommu +{ + /* Queue for freeing pages */ + struct page_list_head free_queue; + + union { + /* Intel VT-d */ + struct { + unsigned int agaw; /* adjusted guest address width, 0 is level= 2 30-bit */ + } vtd; + /* AMD IOMMU */ + struct { + unsigned int paging_mode; + struct guest_iommu *g_iommu; + }; + }; +}; + extern struct iommu_ops iommu_ops; =20 # include @@ -109,10 +126,10 @@ static inline void iommu_disable_x2apic(void) iommu_vcall(&iommu_ops, disable_x2apic); } =20 -int iommu_identity_mapping(struct domain *d, p2m_access_t p2ma, - paddr_t base, paddr_t end, +int iommu_identity_mapping(struct domain *d, struct iommu_context *ctx, + p2m_access_t p2ma, paddr_t base, paddr_t end, unsigned int flag); -void iommu_identity_map_teardown(struct domain *d); +void iommu_identity_map_teardown(struct domain *d, struct iommu_context *c= tx); =20 extern bool untrusted_msi; =20 @@ -128,14 +145,19 @@ unsigned long *iommu_init_domid(domid_t reserve); domid_t iommu_alloc_domid(unsigned long *map); void iommu_free_domid(domid_t domid, unsigned long *map); =20 -int __must_check iommu_free_pgtables(struct domain *d); +int __must_check iommu_free_pgtables(struct domain *d, struct iommu_contex= t *ctx); struct domain_iommu; struct page_info *__must_check iommu_alloc_pgtable(struct domain_iommu *hd, + struct iommu_context *c= tx, uint64_t contig_mask); -void iommu_queue_free_pgtable(struct domain_iommu *hd, struct page_info *p= g); +void iommu_queue_free_pgtable(struct domain *d, struct iommu_context *ctx, + struct page_info *pg); =20 /* Check [start, end] unity map range for correctness. */ bool iommu_unity_region_ok(const char *prefix, mfn_t start, mfn_t end); +int arch_iommu_context_init(struct domain *d, struct iommu_context *ctx, u= 32 flags); +int arch_iommu_context_teardown(struct domain *d, struct iommu_context *ct= x, u32 flags); +int arch_iommu_flush_free_queue(struct domain *d); =20 #endif /* !__ARCH_X86_IOMMU_H__ */ /* diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c index d5db60d335..0a5aee8b92 100644 --- a/xen/arch/x86/tboot.c +++ b/xen/arch/x86/tboot.c @@ -220,7 +220,8 @@ static void tboot_gen_domain_integrity(const uint8_t ke= y[TB_KEY_SIZE], { const struct domain_iommu *dio =3D dom_iommu(d); =20 - update_iommu_mac(&ctx, dio->arch.vtd.pgd_maddr, + update_iommu_mac(&ctx, + iommu_default_context(d)->arch.vtd.pgd_maddr, agaw_to_level(dio->arch.vtd.agaw)); } } diff --git a/xen/drivers/passthrough/amd/iommu.h b/xen/drivers/passthrough/= amd/iommu.h index c32e9e9a16..6095bc6a21 100644 --- a/xen/drivers/passthrough/amd/iommu.h +++ b/xen/drivers/passthrough/amd/iommu.h @@ -26,6 +26,7 @@ #include #include #include +#include =20 #include #include @@ -199,10 +200,10 @@ int __must_check cf_check amd_iommu_unmap_page( struct domain *d, dfn_t dfn, unsigned int order, unsigned int *flush_flags); int __must_check amd_iommu_alloc_root(struct domain *d); -int amd_iommu_reserve_domain_unity_map(struct domain *d, +int amd_iommu_reserve_domain_unity_map(struct domain *d, struct iommu_cont= ext *ctx, const struct ivrs_unity_map *map, unsigned int flag); -int amd_iommu_reserve_domain_unity_unmap(struct domain *d, +int amd_iommu_reserve_domain_unity_unmap(struct domain *d, struct iommu_co= ntext *ctx, const struct ivrs_unity_map *map); int cf_check amd_iommu_get_reserved_device_memory( iommu_grdm_t *func, void *ctxt); diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthr= ough/amd/iommu_init.c index 3023625020..41e241ccc8 100644 --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -604,7 +604,6 @@ static void iommu_check_event_log(struct amd_iommu *iom= mu) sizeof(event_entry_t), parse_event_log_entry); =20 spin_lock_irqsave(&iommu->lock, flags); - =20 /* Check event overflow. */ entry =3D readl(iommu->mmio_base + IOMMU_STATUS_MMIO_OFFSET); if ( entry & IOMMU_STATUS_EVENT_LOG_OVERFLOW ) @@ -660,9 +659,8 @@ static void iommu_check_ppr_log(struct amd_iommu *iommu) =20 iommu_read_log(iommu, &iommu->ppr_log, sizeof(ppr_entry_t), parse_ppr_log_entry); - =20 - spin_lock_irqsave(&iommu->lock, flags); =20 + spin_lock_irqsave(&iommu->lock, flags); /* Check event overflow. */ entry =3D readl(iommu->mmio_base + IOMMU_STATUS_MMIO_OFFSET); if ( entry & IOMMU_STATUS_PPR_LOG_OVERFLOW ) @@ -1545,7 +1543,7 @@ static void invalidate_all_domain_pages(void) static int cf_check _invalidate_all_devices( u16 seg, struct ivrs_mappings *ivrs_mappings) { - unsigned int bdf;=20 + unsigned int bdf; u16 req_id; struct amd_iommu *iommu; =20 @@ -1595,7 +1593,7 @@ void cf_check amd_iommu_resume(void) for_each_amd_iommu ( iommu ) { /* - * To make sure that iommus have not been touched=20 + * To make sure that iommus have not been touched * before re-enablement */ disable_iommu(iommu); diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthro= ugh/amd/iommu_map.c index dde393645a..7514384789 100644 --- a/xen/drivers/passthrough/amd/iommu_map.c +++ b/xen/drivers/passthrough/amd/iommu_map.c @@ -18,6 +18,7 @@ */ =20 #include +#include =20 #include "iommu.h" =20 @@ -264,9 +265,9 @@ void __init iommu_dte_add_device_entry(struct amd_iommu= _dte *dte, * {Re, un}mapping super page frames causes re-allocation of io * page tables. */ -static int iommu_pde_from_dfn(struct domain *d, unsigned long dfn, - unsigned int target, unsigned long *pt_mfn, - unsigned int *flush_flags, bool map) +static int iommu_pde_from_dfn(struct domain *d, struct iommu_context *ctx, + unsigned long dfn, unsigned int target, + unsigned long *pt_mfn, unsigned int *flush_f= lags, bool map) { union amd_iommu_pte *pde, *next_table_vaddr; unsigned long next_table_mfn; @@ -274,8 +275,8 @@ static int iommu_pde_from_dfn(struct domain *d, unsigne= d long dfn, struct page_info *table; struct domain_iommu *hd =3D dom_iommu(d); =20 - table =3D hd->arch.amd.root_table; - level =3D hd->arch.amd.paging_mode; + table =3D ctx->arch.amd.root_table; + level =3D ctx->arch.amd.paging_mode; =20 if ( !table || target < 1 || level < target || level > 6 ) { @@ -311,7 +312,7 @@ static int iommu_pde_from_dfn(struct domain *d, unsigne= d long dfn, mfn =3D next_table_mfn; =20 /* allocate lower level page table */ - table =3D iommu_alloc_pgtable(hd, IOMMU_PTE_CONTIG_MASK); + table =3D iommu_alloc_pgtable(hd, ctx, IOMMU_PTE_CONTIG_MASK); if ( table =3D=3D NULL ) { AMD_IOMMU_ERROR("cannot allocate I/O page table\n"); @@ -346,7 +347,7 @@ static int iommu_pde_from_dfn(struct domain *d, unsigne= d long dfn, =20 if ( next_table_mfn =3D=3D 0 ) { - table =3D iommu_alloc_pgtable(hd, IOMMU_PTE_CONTIG_MASK); + table =3D iommu_alloc_pgtable(hd, ctx, IOMMU_PTE_CONTIG_MA= SK); if ( table =3D=3D NULL ) { AMD_IOMMU_ERROR("cannot allocate I/O page table\n"); @@ -376,7 +377,8 @@ static int iommu_pde_from_dfn(struct domain *d, unsigne= d long dfn, return 0; } =20 -static void queue_free_pt(struct domain_iommu *hd, mfn_t mfn, unsigned int= level) +static void queue_free_pt(struct domain *d, struct iommu_context *ctx, mfn= _t mfn, + unsigned int level) { if ( level > 1 ) { @@ -387,13 +389,13 @@ static void queue_free_pt(struct domain_iommu *hd, mf= n_t mfn, unsigned int level if ( pt[i].pr && pt[i].next_level ) { ASSERT(pt[i].next_level < level); - queue_free_pt(hd, _mfn(pt[i].mfn), pt[i].next_level); + queue_free_pt(d, ctx, _mfn(pt[i].mfn), pt[i].next_level); } =20 unmap_domain_page(pt); } =20 - iommu_queue_free_pgtable(hd, mfn_to_page(mfn)); + iommu_queue_free_pgtable(d, ctx, mfn_to_page(mfn)); } =20 int cf_check amd_iommu_map_page( @@ -401,6 +403,7 @@ int cf_check amd_iommu_map_page( unsigned int *flush_flags) { struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); unsigned int level =3D (IOMMUF_order(flags) / PTE_PER_TABLE_SHIFT) + 1; bool contig; int rc; @@ -410,7 +413,7 @@ int cf_check amd_iommu_map_page( ASSERT((hd->platform_ops->page_sizes >> IOMMUF_order(flags)) & PAGE_SIZE_4K); =20 - spin_lock(&hd->arch.mapping_lock); + spin_lock(&ctx->arch.mapping_lock); =20 /* * IOMMU mapping request can be safely ignored when the domain is dyin= g. @@ -420,24 +423,24 @@ int cf_check amd_iommu_map_page( */ if ( d->is_dying ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); return 0; } =20 rc =3D amd_iommu_alloc_root(d); if ( rc ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); AMD_IOMMU_ERROR("root table alloc failed, dfn =3D %"PRI_dfn"\n", dfn_x(dfn)); domain_crash(d); return rc; } =20 - if ( iommu_pde_from_dfn(d, dfn_x(dfn), level, &pt_mfn, flush_flags, tr= ue) || + if ( iommu_pde_from_dfn(d, ctx, dfn_x(dfn), level, &pt_mfn, flush_flag= s, true) || !pt_mfn ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); AMD_IOMMU_ERROR("invalid IO pagetable entry dfn =3D %"PRI_dfn"\n", dfn_x(dfn)); domain_crash(d); @@ -449,12 +452,12 @@ int cf_check amd_iommu_map_page( flags & IOMMUF_writable, flags & IOMMUF_readable, &contig); =20 - while ( unlikely(contig) && ++level < hd->arch.amd.paging_mode ) + while ( unlikely(contig) && ++level < ctx->arch.amd.paging_mode ) { struct page_info *pg =3D mfn_to_page(_mfn(pt_mfn)); unsigned long next_mfn; =20 - if ( iommu_pde_from_dfn(d, dfn_x(dfn), level, &pt_mfn, flush_flags, + if ( iommu_pde_from_dfn(d, ctx, dfn_x(dfn), level, &pt_mfn, flush_= flags, false) ) BUG(); BUG_ON(!pt_mfn); @@ -464,11 +467,11 @@ int cf_check amd_iommu_map_page( flags & IOMMUF_writable, flags & IOMMUF_readable, &contig); *flush_flags |=3D IOMMU_FLUSHF_modified | IOMMU_FLUSHF_all; - iommu_queue_free_pgtable(hd, pg); + iommu_queue_free_pgtable(d, ctx, pg); perfc_incr(iommu_pt_coalesces); } =20 - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); =20 *flush_flags |=3D IOMMU_FLUSHF_added; if ( old.pr ) @@ -476,7 +479,7 @@ int cf_check amd_iommu_map_page( *flush_flags |=3D IOMMU_FLUSHF_modified; =20 if ( IOMMUF_order(flags) && old.next_level ) - queue_free_pt(hd, _mfn(old.mfn), old.next_level); + queue_free_pt(d, ctx, _mfn(old.mfn), old.next_level); } =20 return 0; @@ -487,6 +490,7 @@ int cf_check amd_iommu_unmap_page( { unsigned long pt_mfn =3D 0; struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); unsigned int level =3D (order / PTE_PER_TABLE_SHIFT) + 1; union amd_iommu_pte old =3D {}; =20 @@ -496,17 +500,17 @@ int cf_check amd_iommu_unmap_page( */ ASSERT((hd->platform_ops->page_sizes >> order) & PAGE_SIZE_4K); =20 - spin_lock(&hd->arch.mapping_lock); + spin_lock(&ctx->arch.mapping_lock); =20 - if ( !hd->arch.amd.root_table ) + if ( !ctx->arch.amd.root_table ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); return 0; } =20 - if ( iommu_pde_from_dfn(d, dfn_x(dfn), level, &pt_mfn, flush_flags, fa= lse) ) + if ( iommu_pde_from_dfn(d, ctx, dfn_x(dfn), level, &pt_mfn, flush_flag= s, false) ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); AMD_IOMMU_ERROR("invalid IO pagetable entry dfn =3D %"PRI_dfn"\n", dfn_x(dfn)); domain_crash(d); @@ -520,30 +524,30 @@ int cf_check amd_iommu_unmap_page( /* Mark PTE as 'page not present'. */ old =3D clear_iommu_pte_present(pt_mfn, dfn_x(dfn), level, &free); =20 - while ( unlikely(free) && ++level < hd->arch.amd.paging_mode ) + while ( unlikely(free) && ++level < ctx->arch.amd.paging_mode ) { struct page_info *pg =3D mfn_to_page(_mfn(pt_mfn)); =20 - if ( iommu_pde_from_dfn(d, dfn_x(dfn), level, &pt_mfn, + if ( iommu_pde_from_dfn(d, ctx, dfn_x(dfn), level, &pt_mfn, flush_flags, false) ) BUG(); BUG_ON(!pt_mfn); =20 clear_iommu_pte_present(pt_mfn, dfn_x(dfn), level, &free); *flush_flags |=3D IOMMU_FLUSHF_all; - iommu_queue_free_pgtable(hd, pg); + iommu_queue_free_pgtable(d, ctx, pg); perfc_incr(iommu_pt_coalesces); } } =20 - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); =20 if ( old.pr ) { *flush_flags |=3D IOMMU_FLUSHF_modified; =20 if ( order && old.next_level ) - queue_free_pt(hd, _mfn(old.mfn), old.next_level); + queue_free_pt(d, ctx, _mfn(old.mfn), old.next_level); } =20 return 0; @@ -646,7 +650,7 @@ int cf_check amd_iommu_flush_iotlb_pages( return 0; } =20 -int amd_iommu_reserve_domain_unity_map(struct domain *d, +int amd_iommu_reserve_domain_unity_map(struct domain *d, struct iommu_cont= ext *ctx, const struct ivrs_unity_map *map, unsigned int flag) { @@ -664,14 +668,14 @@ int amd_iommu_reserve_domain_unity_map(struct domain = *d, if ( map->write ) p2ma |=3D p2m_access_w; =20 - rc =3D iommu_identity_mapping(d, p2ma, map->addr, + rc =3D iommu_identity_mapping(d, ctx, p2ma, map->addr, map->addr + map->length - 1, flag); } =20 return rc; } =20 -int amd_iommu_reserve_domain_unity_unmap(struct domain *d, +int amd_iommu_reserve_domain_unity_unmap(struct domain *d, struct iommu_co= ntext *ctx, const struct ivrs_unity_map *map) { int rc; @@ -681,7 +685,7 @@ int amd_iommu_reserve_domain_unity_unmap(struct domain = *d, =20 for ( rc =3D 0; map; map =3D map->next ) { - int ret =3D iommu_identity_mapping(d, p2m_access_x, map->addr, + int ret =3D iommu_identity_mapping(d, ctx, p2m_access_x, map->addr, map->addr + map->length - 1, 0); =20 if ( ret && ret !=3D -ENOENT && !rc ) @@ -771,6 +775,7 @@ static int fill_qpt(union amd_iommu_pte *this, unsigned= int level, struct page_info *pgs[IOMMU_MAX_PT_LEVELS]) { struct domain_iommu *hd =3D dom_iommu(dom_io); + struct iommu_context *ctx =3D iommu_default_context(dom_io); unsigned int i; int rc =3D 0; =20 @@ -787,7 +792,7 @@ static int fill_qpt(union amd_iommu_pte *this, unsigned= int level, * page table pages, and the resulting allocations are alw= ays * zeroed. */ - pgs[level] =3D iommu_alloc_pgtable(hd, 0); + pgs[level] =3D iommu_alloc_pgtable(hd, ctx, 0); if ( !pgs[level] ) { rc =3D -ENOMEM; @@ -823,14 +828,15 @@ static int fill_qpt(union amd_iommu_pte *this, unsign= ed int level, int cf_check amd_iommu_quarantine_init(struct pci_dev *pdev, bool scratch_= page) { struct domain_iommu *hd =3D dom_iommu(dom_io); - unsigned int level =3D hd->arch.amd.paging_mode; + struct iommu_context *ctx =3D iommu_default_context(dom_io); + unsigned int level =3D ctx->arch.amd.paging_mode; unsigned int req_id =3D get_dma_requestor_id(pdev->seg, pdev->sbdf.bdf= ); const struct ivrs_mappings *ivrs_mappings =3D get_ivrs_mappings(pdev->= seg); int rc; =20 ASSERT(pcidevs_locked()); - ASSERT(!hd->arch.amd.root_table); - ASSERT(page_list_empty(&hd->arch.pgtables.list)); + ASSERT(!ctx->arch.amd.root_table); + ASSERT(page_list_empty(&ctx->arch.pgtables)); =20 if ( !scratch_page && !ivrs_mappings[req_id].unity_map ) return 0; @@ -843,19 +849,19 @@ int cf_check amd_iommu_quarantine_init(struct pci_dev= *pdev, bool scratch_page) return 0; } =20 - pdev->arch.amd.root_table =3D iommu_alloc_pgtable(hd, 0); + pdev->arch.amd.root_table =3D iommu_alloc_pgtable(hd, ctx, 0); if ( !pdev->arch.amd.root_table ) return -ENOMEM; =20 /* Transiently install the root into DomIO, for iommu_identity_mapping= (). */ - hd->arch.amd.root_table =3D pdev->arch.amd.root_table; + ctx->arch.amd.root_table =3D pdev->arch.amd.root_table; =20 - rc =3D amd_iommu_reserve_domain_unity_map(dom_io, + rc =3D amd_iommu_reserve_domain_unity_map(dom_io, ctx, ivrs_mappings[req_id].unity_ma= p, 0); =20 - iommu_identity_map_teardown(dom_io); - hd->arch.amd.root_table =3D NULL; + iommu_identity_map_teardown(dom_io, ctx); + ctx->arch.amd.root_table =3D NULL; =20 if ( rc ) AMD_IOMMU_WARN("%pp: quarantine unity mapping failed\n", &pdev->sb= df); @@ -871,7 +877,7 @@ int cf_check amd_iommu_quarantine_init(struct pci_dev *= pdev, bool scratch_page) pdev->arch.leaf_mfn =3D page_to_mfn(pgs[0]); } =20 - page_list_move(&pdev->arch.pgtables_list, &hd->arch.pgtables.list); + page_list_move(&pdev->arch.pgtables_list, &ctx->arch.pgtables); =20 if ( rc ) amd_iommu_quarantine_teardown(pdev); @@ -881,16 +887,16 @@ int cf_check amd_iommu_quarantine_init(struct pci_dev= *pdev, bool scratch_page) =20 void amd_iommu_quarantine_teardown(struct pci_dev *pdev) { - struct domain_iommu *hd =3D dom_iommu(dom_io); + struct iommu_context *ctx =3D iommu_default_context(dom_io); =20 ASSERT(pcidevs_locked()); =20 if ( !pdev->arch.amd.root_table ) return; =20 - ASSERT(page_list_empty(&hd->arch.pgtables.list)); - page_list_move(&hd->arch.pgtables.list, &pdev->arch.pgtables_list); - while ( iommu_free_pgtables(dom_io) =3D=3D -ERESTART ) + ASSERT(page_list_empty(&ctx->arch.pgtables)); + page_list_move(&ctx->arch.pgtables, &pdev->arch.pgtables_list); + while ( iommu_free_pgtables(dom_io, ctx) =3D=3D -ERESTART ) /* nothing */; pdev->arch.amd.root_table =3D NULL; } diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/pass= through/amd/pci_amd_iommu.c index f96f59440b..a3815d71be 100644 --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -19,6 +19,7 @@ =20 #include #include +#include =20 #include =20 @@ -86,12 +87,12 @@ int get_dma_requestor_id(uint16_t seg, uint16_t bdf) =20 static int __must_check allocate_domain_resources(struct domain *d) { - struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); int rc; =20 - spin_lock(&hd->arch.mapping_lock); + spin_lock(&ctx->arch.mapping_lock); rc =3D amd_iommu_alloc_root(d); - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); =20 return rc; } @@ -125,7 +126,7 @@ static bool use_ats( } =20 static int __must_check amd_iommu_setup_domain_device( - struct domain *domain, struct amd_iommu *iommu, + struct domain *domain, struct iommu_context *ctx, struct amd_iommu *io= mmu, uint8_t devfn, struct pci_dev *pdev) { struct amd_iommu_dte *table, *dte; @@ -133,7 +134,6 @@ static int __must_check amd_iommu_setup_domain_device( unsigned int req_id, sr_flags; int rc; u8 bus =3D pdev->bus; - struct domain_iommu *hd =3D dom_iommu(domain); const struct ivrs_mappings *ivrs_dev; const struct page_info *root_pg; domid_t domid; @@ -141,7 +141,7 @@ static int __must_check amd_iommu_setup_domain_device( if ( QUARANTINE_SKIP(domain, pdev) ) return 0; =20 - BUG_ON(!hd->arch.amd.paging_mode || !iommu->dev_table.buffer); + BUG_ON(!ctx->arch.amd.paging_mode || !iommu->dev_table.buffer); =20 rc =3D allocate_domain_resources(domain); if ( rc ) @@ -161,7 +161,7 @@ static int __must_check amd_iommu_setup_domain_device( =20 if ( domain !=3D dom_io ) { - root_pg =3D hd->arch.amd.root_table; + root_pg =3D ctx->arch.amd.root_table; domid =3D domain->domain_id; } else @@ -177,7 +177,7 @@ static int __must_check amd_iommu_setup_domain_device( /* bind DTE to domain page-tables */ rc =3D amd_iommu_set_root_page_table( dte, page_to_maddr(root_pg), domid, - hd->arch.amd.paging_mode, sr_flags); + ctx->arch.amd.paging_mode, sr_flags); if ( rc ) { ASSERT(rc < 0); @@ -219,7 +219,7 @@ static int __must_check amd_iommu_setup_domain_device( else rc =3D amd_iommu_set_root_page_table( dte, page_to_maddr(root_pg), domid, - hd->arch.amd.paging_mode, sr_flags); + ctx->arch.amd.paging_mode, sr_flags); if ( rc < 0 ) { spin_unlock_irqrestore(&iommu->lock, flags); @@ -270,7 +270,7 @@ static int __must_check amd_iommu_setup_domain_device( "root table =3D %#"PRIx64", " "domain =3D %d, paging mode =3D %d\n", req_id, pdev->type, page_to_maddr(root_pg), - domid, hd->arch.amd.paging_mode); + domid, ctx->arch.amd.paging_mode); =20 ASSERT(pcidevs_locked()); =20 @@ -352,11 +352,12 @@ static int cf_check iov_enable_xt(void) int amd_iommu_alloc_root(struct domain *d) { struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); =20 - if ( unlikely(!hd->arch.amd.root_table) && d !=3D dom_io ) + if ( unlikely(!ctx->arch.amd.root_table) && d !=3D dom_io ) { - hd->arch.amd.root_table =3D iommu_alloc_pgtable(hd, 0); - if ( !hd->arch.amd.root_table ) + ctx->arch.amd.root_table =3D iommu_alloc_pgtable(hd, ctx, 0); + if ( !ctx->arch.amd.root_table ) return -ENOMEM; } =20 @@ -368,7 +369,7 @@ int __read_mostly amd_iommu_min_paging_mode =3D 1; =20 static int cf_check amd_iommu_domain_init(struct domain *d) { - struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); int pglvl =3D amd_iommu_get_paging_mode( 1UL << (domain_max_paddr_bits(d) - PAGE_SHIFT)); =20 @@ -379,7 +380,7 @@ static int cf_check amd_iommu_domain_init(struct domain= *d) * Choose the number of levels for the IOMMU page tables, taking into * account unity maps. */ - hd->arch.amd.paging_mode =3D max(pglvl, amd_iommu_min_paging_mode); + ctx->arch.amd.paging_mode =3D max(pglvl, amd_iommu_min_paging_mode); =20 return 0; } @@ -455,7 +456,7 @@ static void amd_iommu_disable_domain_device(const struc= t domain *domain, AMD_IOMMU_DEBUG("Disable: device id =3D %#x, " "domain =3D %d, paging mode =3D %d\n", req_id, dte->domain_id, - dom_iommu(domain)->arch.amd.paging_mode); + iommu_default_context(domain)->arch.amd.paging_mod= e); } else spin_unlock_irqrestore(&iommu->lock, flags); @@ -466,6 +467,8 @@ static int cf_check reassign_device( struct pci_dev *pdev) { struct amd_iommu *iommu; + struct iommu_context *target_ctx =3D iommu_default_context(target); + struct iommu_context *source_ctx =3D iommu_default_context(source); int rc; =20 iommu =3D find_iommu_for_device(pdev->seg, pdev->sbdf.bdf); @@ -478,7 +481,7 @@ static int cf_check reassign_device( =20 if ( !QUARANTINE_SKIP(target, pdev) ) { - rc =3D amd_iommu_setup_domain_device(target, iommu, devfn, pdev); + rc =3D amd_iommu_setup_domain_device(target, target_ctx, iommu, de= vfn, pdev); if ( rc ) return rc; } @@ -509,7 +512,7 @@ static int cf_check reassign_device( unsigned int bdf =3D PCI_BDF(pdev->bus, devfn); =20 rc =3D amd_iommu_reserve_domain_unity_unmap( - source, + source, source_ctx, ivrs_mappings[get_dma_requestor_id(pdev->seg, bdf)].unity= _map); if ( rc ) return rc; @@ -528,7 +531,8 @@ static int cf_check amd_iommu_assign_device( unsigned int bdf =3D PCI_BDF(pdev->bus, devfn); int req_id =3D get_dma_requestor_id(pdev->seg, bdf); int rc =3D amd_iommu_reserve_domain_unity_map( - d, ivrs_mappings[req_id].unity_map, flag); + d, iommu_default_context(d), + ivrs_mappings[req_id].unity_map, flag); =20 if ( !rc ) rc =3D reassign_device(pdev->domain, d, devfn, pdev); @@ -536,7 +540,8 @@ static int cf_check amd_iommu_assign_device( if ( rc && !is_hardware_domain(d) ) { int ret =3D amd_iommu_reserve_domain_unity_unmap( - d, ivrs_mappings[req_id].unity_map); + d, iommu_default_context(d), + ivrs_mappings[req_id].unity_map); =20 if ( ret ) { @@ -553,22 +558,25 @@ static int cf_check amd_iommu_assign_device( =20 static void cf_check amd_iommu_clear_root_pgtable(struct domain *d) { - struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); =20 - spin_lock(&hd->arch.mapping_lock); - hd->arch.amd.root_table =3D NULL; - spin_unlock(&hd->arch.mapping_lock); + spin_lock(&ctx->arch.mapping_lock); + ctx->arch.amd.root_table =3D NULL; + spin_unlock(&ctx->arch.mapping_lock); } =20 static void cf_check amd_iommu_domain_destroy(struct domain *d) { - iommu_identity_map_teardown(d); - ASSERT(!dom_iommu(d)->arch.amd.root_table); + struct iommu_context *ctx =3D iommu_default_context(d); + + iommu_identity_map_teardown(d, ctx); + ASSERT(!ctx->arch.amd.root_table); } =20 static int cf_check amd_iommu_add_device(u8 devfn, struct pci_dev *pdev) { struct amd_iommu *iommu; + struct iommu_context *ctx; u16 bdf; struct ivrs_mappings *ivrs_mappings; bool fresh_domid =3D false; @@ -577,6 +585,8 @@ static int cf_check amd_iommu_add_device(u8 devfn, stru= ct pci_dev *pdev) if ( !pdev->domain ) return -EINVAL; =20 + ctx =3D iommu_default_context(pdev->domain); + for_each_amd_iommu(iommu) if ( pdev->seg =3D=3D iommu->seg && pdev->sbdf.bdf =3D=3D iommu->b= df ) return is_hardware_domain(pdev->domain) ? 0 : -ENODEV; @@ -633,7 +643,7 @@ static int cf_check amd_iommu_add_device(u8 devfn, stru= ct pci_dev *pdev) } =20 if ( amd_iommu_reserve_domain_unity_map( - pdev->domain, + pdev->domain, ctx, ivrs_mappings[ivrs_mappings[bdf].dte_requestor_id].unity_map, 0) ) AMD_IOMMU_WARN("%pd: unity mapping failed for %pp\n", @@ -647,7 +657,7 @@ static int cf_check amd_iommu_add_device(u8 devfn, stru= ct pci_dev *pdev) fresh_domid =3D true; } =20 - ret =3D amd_iommu_setup_domain_device(pdev->domain, iommu, devfn, pdev= ); + ret =3D amd_iommu_setup_domain_device(pdev->domain, ctx, iommu, devfn,= pdev); if ( ret && fresh_domid ) { iommu_free_domid(pdev->arch.pseudo_domid, iommu->domid_map); @@ -660,12 +670,15 @@ static int cf_check amd_iommu_add_device(u8 devfn, st= ruct pci_dev *pdev) static int cf_check amd_iommu_remove_device(u8 devfn, struct pci_dev *pdev) { struct amd_iommu *iommu; + struct iommu_context *ctx; u16 bdf; struct ivrs_mappings *ivrs_mappings; =20 if ( !pdev->domain ) return -EINVAL; =20 + ctx =3D iommu_default_context(pdev->domain); + iommu =3D find_iommu_for_device(pdev->seg, pdev->sbdf.bdf); if ( !iommu ) { @@ -680,7 +693,7 @@ static int cf_check amd_iommu_remove_device(u8 devfn, s= truct pci_dev *pdev) bdf =3D PCI_BDF(pdev->bus, devfn); =20 if ( amd_iommu_reserve_domain_unity_unmap( - pdev->domain, + pdev->domain, ctx, ivrs_mappings[ivrs_mappings[bdf].dte_requestor_id].unity_map)= ) AMD_IOMMU_WARN("%pd: unity unmapping failed for %pp\n", pdev->domain, &PCI_SBDF(pdev->seg, bdf)); @@ -755,14 +768,14 @@ static void amd_dump_page_table_level(struct page_inf= o *pg, int level, =20 static void cf_check amd_dump_page_tables(struct domain *d) { - const struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); =20 - if ( !hd->arch.amd.root_table ) + if ( !ctx->arch.amd.root_table ) return; =20 - printk("AMD IOMMU %pd table has %u levels\n", d, hd->arch.amd.paging_m= ode); - amd_dump_page_table_level(hd->arch.amd.root_table, - hd->arch.amd.paging_mode, 0, 0); + printk("AMD IOMMU %pd table has %u levels\n", d, ctx->arch.amd.paging_= mode); + amd_dump_page_table_level(ctx->arch.amd.root_table, + ctx->arch.amd.paging_mode, 0, 0); } =20 static const struct iommu_ops __initconst_cf_clobber _iommu_ops =3D { diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iomm= u.c index 9e74a1fc72..662da49766 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -403,12 +403,15 @@ long iommu_unmap(struct domain *d, dfn_t dfn0, unsign= ed long page_count, unsigned long i; unsigned int order, j =3D 0; int rc =3D 0; + struct iommu_context *ctx; =20 if ( !is_iommu_enabled(d) ) return 0; =20 ASSERT(!(flags & ~IOMMUF_preempt)); =20 + ctx =3D iommu_default_context(d); + for ( i =3D 0; i < page_count; i +=3D 1UL << order ) { dfn_t dfn =3D dfn_add(dfn0, i); @@ -468,10 +471,13 @@ int iommu_lookup_page(struct domain *d, dfn_t dfn, mf= n_t *mfn, unsigned int *flags) { const struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx; =20 if ( !is_iommu_enabled(d) || !hd->platform_ops->lookup_page ) return -EOPNOTSUPP; =20 + ctx =3D iommu_default_context(d); + return iommu_call(hd->platform_ops, lookup_page, d, dfn, mfn, flags); } =20 diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough= /vtd/extern.h index c16583c951..3dcb77c711 100644 --- a/xen/drivers/passthrough/vtd/extern.h +++ b/xen/drivers/passthrough/vtd/extern.h @@ -80,8 +80,8 @@ uint64_t alloc_pgtable_maddr(unsigned long npages, nodeid= _t node); void free_pgtable_maddr(u64 maddr); void *map_vtd_domain_page(u64 maddr); void unmap_vtd_domain_page(const void *va); -int domain_context_mapping_one(struct domain *domain, struct vtd_iommu *io= mmu, - uint8_t bus, uint8_t devfn, +int domain_context_mapping_one(struct domain *domain, struct iommu_context= *ctx, + struct vtd_iommu *iommu, uint8_t bus, uint8= _t devfn, const struct pci_dev *pdev, domid_t domid, paddr_t pgd_maddr, unsigned int mode); int domain_context_unmap_one(struct domain *domain, struct vtd_iommu *iomm= u, diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/= vtd/iommu.c index 9d7a9977a6..f60f39ee1d 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -54,7 +54,7 @@ #define DEVICE_DOMID(d, pdev) ((d) !=3D dom_io ? (d)->domain_id \ : (pdev)->arch.pseudo_domid) #define DEVICE_PGTABLE(d, pdev) ((d) !=3D dom_io \ - ? dom_iommu(d)->arch.vtd.pgd_maddr \ + ? iommu_default_context(d)->arch.vtd.pgd_= maddr \ : (pdev)->arch.vtd.pgd_maddr) =20 bool __read_mostly iommu_igfx =3D true; @@ -227,7 +227,7 @@ static void check_cleanup_domid_map(const struct domain= *d, =20 if ( !found ) { - clear_bit(iommu->index, dom_iommu(d)->arch.vtd.iommu_bitmap); + clear_bit(iommu->index, iommu_default_context(d)->arch.vtd.iommu_b= itmap); cleanup_domid_map(d->domain_id, iommu); } } @@ -315,8 +315,9 @@ static u64 bus_to_context_maddr(struct vtd_iommu *iommu= , u8 bus) * PTE for the requested address, * - for target =3D=3D 0 the full PTE contents below PADDR_BITS limit. */ -static uint64_t addr_to_dma_page_maddr(struct domain *domain, daddr_t addr, - unsigned int target, +static uint64_t addr_to_dma_page_maddr(struct domain *domain, + struct iommu_context *ctx, + daddr_t addr, unsigned int target, unsigned int *flush_flags, bool all= oc) { struct domain_iommu *hd =3D dom_iommu(domain); @@ -326,10 +327,10 @@ static uint64_t addr_to_dma_page_maddr(struct domain = *domain, daddr_t addr, u64 pte_maddr =3D 0; =20 addr &=3D (((u64)1) << addr_width) - 1; - ASSERT(spin_is_locked(&hd->arch.mapping_lock)); + ASSERT(spin_is_locked(&ctx->arch.mapping_lock)); ASSERT(target || !alloc); =20 - if ( !hd->arch.vtd.pgd_maddr ) + if ( !ctx->arch.vtd.pgd_maddr ) { struct page_info *pg; =20 @@ -337,13 +338,13 @@ static uint64_t addr_to_dma_page_maddr(struct domain = *domain, daddr_t addr, goto out; =20 pte_maddr =3D level; - if ( !(pg =3D iommu_alloc_pgtable(hd, 0)) ) + if ( !(pg =3D iommu_alloc_pgtable(hd, ctx, 0)) ) goto out; =20 - hd->arch.vtd.pgd_maddr =3D page_to_maddr(pg); + ctx->arch.vtd.pgd_maddr =3D page_to_maddr(pg); } =20 - pte_maddr =3D hd->arch.vtd.pgd_maddr; + pte_maddr =3D ctx->arch.vtd.pgd_maddr; parent =3D map_vtd_domain_page(pte_maddr); while ( level > target ) { @@ -379,7 +380,7 @@ static uint64_t addr_to_dma_page_maddr(struct domain *d= omain, daddr_t addr, } =20 pte_maddr =3D level - 1; - pg =3D iommu_alloc_pgtable(hd, DMA_PTE_CONTIG_MASK); + pg =3D iommu_alloc_pgtable(hd, ctx, DMA_PTE_CONTIG_MASK); if ( !pg ) break; =20 @@ -431,13 +432,12 @@ static uint64_t addr_to_dma_page_maddr(struct domain = *domain, daddr_t addr, return pte_maddr; } =20 -static paddr_t domain_pgd_maddr(struct domain *d, paddr_t pgd_maddr, - unsigned int nr_pt_levels) +static paddr_t domain_pgd_maddr(struct domain *d, struct iommu_context *ct= x, + paddr_t pgd_maddr, unsigned int nr_pt_leve= ls) { - struct domain_iommu *hd =3D dom_iommu(d); unsigned int agaw; =20 - ASSERT(spin_is_locked(&hd->arch.mapping_lock)); + ASSERT(spin_is_locked(&ctx->arch.mapping_lock)); =20 if ( pgd_maddr ) /* nothing */; @@ -449,19 +449,19 @@ static paddr_t domain_pgd_maddr(struct domain *d, pad= dr_t pgd_maddr, } else { - if ( !hd->arch.vtd.pgd_maddr ) + if ( !ctx->arch.vtd.pgd_maddr ) { /* * Ensure we have pagetables allocated down to the smallest * level the loop below may need to run to. */ - addr_to_dma_page_maddr(d, 0, min_pt_levels, NULL, true); + addr_to_dma_page_maddr(d, ctx, 0, min_pt_levels, NULL, true); =20 - if ( !hd->arch.vtd.pgd_maddr ) + if ( !ctx->arch.vtd.pgd_maddr ) return 0; } =20 - pgd_maddr =3D hd->arch.vtd.pgd_maddr; + pgd_maddr =3D ctx->arch.vtd.pgd_maddr; } =20 /* Skip top level(s) of page tables for less-than-maximum level DRHDs.= */ @@ -734,7 +734,7 @@ static int __must_check cf_check iommu_flush_iotlb(stru= ct domain *d, dfn_t dfn, unsigned long page_coun= t, unsigned int flush_flag= s) { - struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); struct acpi_drhd_unit *drhd; struct vtd_iommu *iommu; bool flush_dev_iotlb; @@ -762,7 +762,7 @@ static int __must_check cf_check iommu_flush_iotlb(stru= ct domain *d, dfn_t dfn, =20 iommu =3D drhd->iommu; =20 - if ( !test_bit(iommu->index, hd->arch.vtd.iommu_bitmap) ) + if ( !test_bit(iommu->index, ctx->arch.vtd.iommu_bitmap) ) continue; =20 flush_dev_iotlb =3D !!find_ats_dev_drhd(iommu); @@ -790,7 +790,8 @@ static int __must_check cf_check iommu_flush_iotlb(stru= ct domain *d, dfn_t dfn, return ret; } =20 -static void queue_free_pt(struct domain_iommu *hd, mfn_t mfn, unsigned int= level) +static void queue_free_pt(struct domain *d, struct iommu_context *ctx, mfn= _t mfn, + unsigned int level) { if ( level > 1 ) { @@ -799,13 +800,13 @@ static void queue_free_pt(struct domain_iommu *hd, mf= n_t mfn, unsigned int level =20 for ( i =3D 0; i < PTE_NUM; ++i ) if ( dma_pte_present(pt[i]) && !dma_pte_superpage(pt[i]) ) - queue_free_pt(hd, maddr_to_mfn(dma_pte_addr(pt[i])), + queue_free_pt(d, ctx, maddr_to_mfn(dma_pte_addr(pt[i])), level - 1); =20 unmap_domain_page(pt); } =20 - iommu_queue_free_pgtable(hd, mfn_to_page(mfn)); + iommu_queue_free_pgtable(d, ctx, mfn_to_page(mfn)); } =20 static int iommu_set_root_entry(struct vtd_iommu *iommu) @@ -1435,10 +1436,11 @@ void __init iommu_free(struct acpi_drhd_unit *drhd) static int cf_check intel_iommu_domain_init(struct domain *d) { struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); =20 - hd->arch.vtd.iommu_bitmap =3D xzalloc_array(unsigned long, + ctx->arch.vtd.iommu_bitmap =3D xzalloc_array(unsigned long, BITS_TO_LONGS(nr_iommus)); - if ( !hd->arch.vtd.iommu_bitmap ) + if ( !ctx->arch.vtd.iommu_bitmap ) return -ENOMEM; =20 hd->arch.vtd.agaw =3D width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH); @@ -1479,11 +1481,11 @@ static void __hwdom_init cf_check intel_iommu_hwdom= _init(struct domain *d) */ int domain_context_mapping_one( struct domain *domain, + struct iommu_context *ctx, struct vtd_iommu *iommu, uint8_t bus, uint8_t devfn, const struct pci_dev *pdev, domid_t domid, paddr_t pgd_maddr, unsigned int mode) { - struct domain_iommu *hd =3D dom_iommu(domain); struct context_entry *context, *context_entries, lctxt; __uint128_t res, old; uint64_t maddr; @@ -1531,12 +1533,12 @@ int domain_context_mapping_one( { paddr_t root; =20 - spin_lock(&hd->arch.mapping_lock); + spin_lock(&ctx->arch.mapping_lock); =20 - root =3D domain_pgd_maddr(domain, pgd_maddr, iommu->nr_pt_levels); + root =3D domain_pgd_maddr(domain, ctx, pgd_maddr, iommu->nr_pt_lev= els); if ( !root ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); spin_unlock(&iommu->lock); unmap_vtd_domain_page(context_entries); if ( prev_dom ) @@ -1550,7 +1552,7 @@ int domain_context_mapping_one( else context_set_translation_type(lctxt, CONTEXT_TT_MULTI_LEVEL); =20 - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); } =20 rc =3D context_set_domain_id(&lctxt, domid, iommu); @@ -1624,7 +1626,7 @@ int domain_context_mapping_one( if ( rc > 0 ) rc =3D 0; =20 - set_bit(iommu->index, hd->arch.vtd.iommu_bitmap); + set_bit(iommu->index, ctx->arch.vtd.iommu_bitmap); =20 unmap_vtd_domain_page(context_entries); =20 @@ -1642,7 +1644,7 @@ int domain_context_mapping_one( (prev_dom =3D=3D dom_io && !pdev) ) ret =3D domain_context_unmap_one(domain, iommu, bus, devfn); else - ret =3D domain_context_mapping_one(prev_dom, iommu, bus, devfn= , pdev, + ret =3D domain_context_mapping_one(prev_dom, ctx, iommu, bus, = devfn, pdev, DEVICE_DOMID(prev_dom, pdev), DEVICE_PGTABLE(prev_dom, pdev= ), (mode & MAP_WITH_RMRR) | @@ -1661,8 +1663,8 @@ int domain_context_mapping_one( static const struct acpi_drhd_unit *domain_context_unmap( struct domain *d, uint8_t devfn, struct pci_dev *pdev); =20 -static int domain_context_mapping(struct domain *domain, u8 devfn, - struct pci_dev *pdev) +static int domain_context_mapping(struct domain *domain, struct iommu_cont= ext *ctx, + u8 devfn, struct pci_dev *pdev) { const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); const struct acpi_rmrr_unit *rmrr; @@ -1731,7 +1733,7 @@ static int domain_context_mapping(struct domain *doma= in, u8 devfn, if ( iommu_debug ) printk(VTDPREFIX "%pd:PCIe: map %pp\n", domain, &PCI_SBDF(seg, bus, devfn)); - ret =3D domain_context_mapping_one(domain, drhd->iommu, bus, devfn= , pdev, + ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, bus, = devfn, pdev, DEVICE_DOMID(domain, pdev), pgd_m= addr, mode); if ( ret > 0 ) @@ -1757,7 +1759,7 @@ static int domain_context_mapping(struct domain *doma= in, u8 devfn, printk(VTDPREFIX "%pd:PCI: map %pp\n", domain, &PCI_SBDF(seg, bus, devfn)); =20 - ret =3D domain_context_mapping_one(domain, drhd->iommu, bus, devfn, + ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, bus, = devfn, pdev, DEVICE_DOMID(domain, pdev), pgd_maddr, mode); if ( ret < 0 ) @@ -1788,7 +1790,7 @@ static int domain_context_mapping(struct domain *doma= in, u8 devfn, * their owner would be the wrong one. Pass NULL instead. */ if ( ret >=3D 0 ) - ret =3D domain_context_mapping_one(domain, drhd->iommu, bus, d= evfn, + ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, b= us, devfn, NULL, DEVICE_DOMID(domain, pd= ev), pgd_maddr, mode); =20 @@ -1804,7 +1806,7 @@ static int domain_context_mapping(struct domain *doma= in, u8 devfn, */ if ( !ret && pdev_type(seg, bus, devfn) =3D=3D DEV_TYPE_PCIe2PCI_B= RIDGE && (secbus !=3D pdev->bus || pdev->devfn !=3D 0) ) - ret =3D domain_context_mapping_one(domain, drhd->iommu, secbus= , 0, + ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, s= ecbus, 0, NULL, DEVICE_DOMID(domain, pd= ev), pgd_maddr, mode); =20 @@ -1813,7 +1815,7 @@ static int domain_context_mapping(struct domain *doma= in, u8 devfn, if ( !prev_present ) domain_context_unmap(domain, devfn, pdev); else if ( pdev->domain !=3D domain ) /* Avoid infinite recursi= on. */ - domain_context_mapping(pdev->domain, devfn, pdev); + domain_context_mapping(pdev->domain, ctx, devfn, pdev); } =20 break; @@ -2001,44 +2003,44 @@ static const struct acpi_drhd_unit *domain_context_= unmap( =20 static void cf_check iommu_clear_root_pgtable(struct domain *d) { - struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); =20 - spin_lock(&hd->arch.mapping_lock); - hd->arch.vtd.pgd_maddr =3D 0; - spin_unlock(&hd->arch.mapping_lock); + spin_lock(&ctx->arch.mapping_lock); + ctx->arch.vtd.pgd_maddr =3D 0; + spin_unlock(&ctx->arch.mapping_lock); } =20 static void cf_check iommu_domain_teardown(struct domain *d) { - struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); const struct acpi_drhd_unit *drhd; =20 if ( list_empty(&acpi_drhd_units) ) return; =20 - iommu_identity_map_teardown(d); + iommu_identity_map_teardown(d, ctx); =20 - ASSERT(!hd->arch.vtd.pgd_maddr); + ASSERT(!ctx->arch.vtd.pgd_maddr); =20 for_each_drhd_unit ( drhd ) cleanup_domid_map(d->domain_id, drhd->iommu); =20 - XFREE(hd->arch.vtd.iommu_bitmap); + XFREE(ctx->arch.vtd.iommu_bitmap); } =20 static void quarantine_teardown(struct pci_dev *pdev, const struct acpi_drhd_unit *drhd) { - struct domain_iommu *hd =3D dom_iommu(dom_io); + struct iommu_context *ctx =3D iommu_default_context(dom_io); =20 ASSERT(pcidevs_locked()); =20 if ( !pdev->arch.vtd.pgd_maddr ) return; =20 - ASSERT(page_list_empty(&hd->arch.pgtables.list)); - page_list_move(&hd->arch.pgtables.list, &pdev->arch.pgtables_list); - while ( iommu_free_pgtables(dom_io) =3D=3D -ERESTART ) + ASSERT(page_list_empty(&ctx->arch.pgtables)); + page_list_move(&ctx->arch.pgtables, &pdev->arch.pgtables_list); + while ( iommu_free_pgtables(dom_io, ctx) =3D=3D -ERESTART ) /* nothing */; pdev->arch.vtd.pgd_maddr =3D 0; =20 @@ -2051,6 +2053,7 @@ static int __must_check cf_check intel_iommu_map_page( unsigned int *flush_flags) { struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); struct dma_pte *page, *pte, old, new =3D {}; u64 pg_maddr; unsigned int level =3D (IOMMUF_order(flags) / LEVEL_STRIDE) + 1; @@ -2067,7 +2070,7 @@ static int __must_check cf_check intel_iommu_map_page( if ( iommu_hwdom_passthrough && is_hardware_domain(d) ) return 0; =20 - spin_lock(&hd->arch.mapping_lock); + spin_lock(&ctx->arch.mapping_lock); =20 /* * IOMMU mapping request can be safely ignored when the domain is dyin= g. @@ -2077,15 +2080,15 @@ static int __must_check cf_check intel_iommu_map_pa= ge( */ if ( d->is_dying ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); return 0; } =20 - pg_maddr =3D addr_to_dma_page_maddr(d, dfn_to_daddr(dfn), level, flush= _flags, + pg_maddr =3D addr_to_dma_page_maddr(d, ctx, dfn_to_daddr(dfn), level, = flush_flags, true); if ( pg_maddr < PAGE_SIZE ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); return -ENOMEM; } =20 @@ -2106,7 +2109,7 @@ static int __must_check cf_check intel_iommu_map_page( =20 if ( !((old.val ^ new.val) & ~DMA_PTE_CONTIG_MASK) ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); unmap_vtd_domain_page(page); return 0; } @@ -2135,7 +2138,7 @@ static int __must_check cf_check intel_iommu_map_page( new.val &=3D ~(LEVEL_MASK << level_to_offset_bits(level)); dma_set_pte_superpage(new); =20 - pg_maddr =3D addr_to_dma_page_maddr(d, dfn_to_daddr(dfn), ++level, + pg_maddr =3D addr_to_dma_page_maddr(d, ctx, dfn_to_daddr(dfn), ++l= evel, flush_flags, false); BUG_ON(pg_maddr < PAGE_SIZE); =20 @@ -2145,11 +2148,11 @@ static int __must_check cf_check intel_iommu_map_pa= ge( iommu_sync_cache(pte, sizeof(*pte)); =20 *flush_flags |=3D IOMMU_FLUSHF_modified | IOMMU_FLUSHF_all; - iommu_queue_free_pgtable(hd, pg); + iommu_queue_free_pgtable(d, ctx, pg); perfc_incr(iommu_pt_coalesces); } =20 - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); unmap_vtd_domain_page(page); =20 *flush_flags |=3D IOMMU_FLUSHF_added; @@ -2158,7 +2161,7 @@ static int __must_check cf_check intel_iommu_map_page( *flush_flags |=3D IOMMU_FLUSHF_modified; =20 if ( IOMMUF_order(flags) && !dma_pte_superpage(old) ) - queue_free_pt(hd, maddr_to_mfn(dma_pte_addr(old)), + queue_free_pt(d, ctx, maddr_to_mfn(dma_pte_addr(old)), IOMMUF_order(flags) / LEVEL_STRIDE); } =20 @@ -2169,6 +2172,7 @@ static int __must_check cf_check intel_iommu_unmap_pa= ge( struct domain *d, dfn_t dfn, unsigned int order, unsigned int *flush_f= lags) { struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); daddr_t addr =3D dfn_to_daddr(dfn); struct dma_pte *page =3D NULL, *pte =3D NULL, old; uint64_t pg_maddr; @@ -2188,12 +2192,12 @@ static int __must_check cf_check intel_iommu_unmap_= page( if ( iommu_hwdom_passthrough && is_hardware_domain(d) ) return 0; =20 - spin_lock(&hd->arch.mapping_lock); + spin_lock(&ctx->arch.mapping_lock); /* get target level pte */ - pg_maddr =3D addr_to_dma_page_maddr(d, addr, level, flush_flags, false= ); + pg_maddr =3D addr_to_dma_page_maddr(d, ctx, addr, level, flush_flags, = false); if ( pg_maddr < PAGE_SIZE ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); return pg_maddr ? -ENOMEM : 0; } =20 @@ -2202,7 +2206,7 @@ static int __must_check cf_check intel_iommu_unmap_pa= ge( =20 if ( !dma_pte_present(*pte) ) { - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); unmap_vtd_domain_page(page); return 0; } @@ -2220,7 +2224,7 @@ static int __must_check cf_check intel_iommu_unmap_pa= ge( =20 unmap_vtd_domain_page(page); =20 - pg_maddr =3D addr_to_dma_page_maddr(d, addr, level, flush_flags, f= alse); + pg_maddr =3D addr_to_dma_page_maddr(d, ctx, addr, level, flush_fla= gs, false); BUG_ON(pg_maddr < PAGE_SIZE); =20 page =3D map_vtd_domain_page(pg_maddr); @@ -2229,18 +2233,18 @@ static int __must_check cf_check intel_iommu_unmap_= page( iommu_sync_cache(pte, sizeof(*pte)); =20 *flush_flags |=3D IOMMU_FLUSHF_all; - iommu_queue_free_pgtable(hd, pg); + iommu_queue_free_pgtable(d, ctx, pg); perfc_incr(iommu_pt_coalesces); } =20 - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); =20 unmap_vtd_domain_page(page); =20 *flush_flags |=3D IOMMU_FLUSHF_modified; =20 if ( order && !dma_pte_superpage(old) ) - queue_free_pt(hd, maddr_to_mfn(dma_pte_addr(old)), + queue_free_pt(d, ctx, maddr_to_mfn(dma_pte_addr(old)), order / LEVEL_STRIDE); =20 return 0; @@ -2249,7 +2253,7 @@ static int __must_check cf_check intel_iommu_unmap_pa= ge( static int cf_check intel_iommu_lookup_page( struct domain *d, dfn_t dfn, mfn_t *mfn, unsigned int *flags) { - struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); uint64_t val; =20 /* @@ -2260,11 +2264,11 @@ static int cf_check intel_iommu_lookup_page( (iommu_hwdom_passthrough && is_hardware_domain(d)) ) return -EOPNOTSUPP; =20 - spin_lock(&hd->arch.mapping_lock); + spin_lock(&ctx->arch.mapping_lock); =20 - val =3D addr_to_dma_page_maddr(d, dfn_to_daddr(dfn), 0, NULL, false); + val =3D addr_to_dma_page_maddr(d, ctx, dfn_to_daddr(dfn), 0, NULL, fal= se); =20 - spin_unlock(&hd->arch.mapping_lock); + spin_unlock(&ctx->arch.mapping_lock); =20 if ( val < PAGE_SIZE ) return -ENOENT; @@ -2285,7 +2289,7 @@ static bool __init vtd_ept_page_compatible(const stru= ct vtd_iommu *iommu) =20 /* EPT is not initialised yet, so we must check the capability in * the MSR explicitly rather than use cpu_has_vmx_ept_*() */ - if ( rdmsr_safe(MSR_IA32_VMX_EPT_VPID_CAP, ept_cap) !=3D 0 )=20 + if ( rdmsr_safe(MSR_IA32_VMX_EPT_VPID_CAP, ept_cap) !=3D 0 ) return false; =20 return (ept_has_2mb(ept_cap) && opt_hap_2mb) <=3D @@ -2297,6 +2301,7 @@ static bool __init vtd_ept_page_compatible(const stru= ct vtd_iommu *iommu) static int cf_check intel_iommu_add_device(u8 devfn, struct pci_dev *pdev) { struct acpi_rmrr_unit *rmrr; + struct iommu_context *ctx; u16 bdf; int ret, i; =20 @@ -2305,6 +2310,8 @@ static int cf_check intel_iommu_add_device(u8 devfn, = struct pci_dev *pdev) if ( !pdev->domain ) return -EINVAL; =20 + ctx =3D iommu_default_context(pdev->domain); + for_each_rmrr_device ( rmrr, bdf, i ) { if ( rmrr->segment =3D=3D pdev->seg && bdf =3D=3D PCI_BDF(pdev->bu= s, devfn) ) @@ -2315,7 +2322,7 @@ static int cf_check intel_iommu_add_device(u8 devfn, = struct pci_dev *pdev) * Since RMRRs are always reserved in the e820 map for the har= dware * domain, there shouldn't be a conflict. */ - ret =3D iommu_identity_mapping(pdev->domain, p2m_access_rw, + ret =3D iommu_identity_mapping(pdev->domain, ctx, p2m_access_r= w, rmrr->base_address, rmrr->end_add= ress, 0); if ( ret ) @@ -2324,7 +2331,7 @@ static int cf_check intel_iommu_add_device(u8 devfn, = struct pci_dev *pdev) } } =20 - ret =3D domain_context_mapping(pdev->domain, devfn, pdev); + ret =3D domain_context_mapping(pdev->domain, ctx, devfn, pdev); if ( ret ) dprintk(XENLOG_ERR VTDPREFIX, "%pd: context mapping failed\n", pdev->domain); @@ -2353,10 +2360,13 @@ static int cf_check intel_iommu_remove_device(u8 de= vfn, struct pci_dev *pdev) struct acpi_rmrr_unit *rmrr; u16 bdf; unsigned int i; + struct iommu_context *ctx; =20 if ( !pdev->domain ) return -EINVAL; =20 + ctx =3D iommu_default_context(pdev->domain); + drhd =3D domain_context_unmap(pdev->domain, devfn, pdev); if ( IS_ERR(drhd) ) return PTR_ERR(drhd); @@ -2370,7 +2380,7 @@ static int cf_check intel_iommu_remove_device(u8 devf= n, struct pci_dev *pdev) * Any flag is nothing to clear these mappings but here * its always safe and strict to set 0. */ - iommu_identity_mapping(pdev->domain, p2m_access_x, rmrr->base_addr= ess, + iommu_identity_mapping(pdev->domain, ctx, p2m_access_x, rmrr->base= _address, rmrr->end_address, 0); } =20 @@ -2389,7 +2399,9 @@ static int cf_check intel_iommu_remove_device(u8 devf= n, struct pci_dev *pdev) static int __hwdom_init cf_check setup_hwdom_device( u8 devfn, struct pci_dev *pdev) { - return domain_context_mapping(pdev->domain, devfn, pdev); + struct iommu_context *ctx =3D iommu_default_context(pdev->domain); + + return domain_context_mapping(pdev->domain, ctx, devfn, pdev); } =20 void clear_fault_bits(struct vtd_iommu *iommu) @@ -2483,7 +2495,7 @@ static int __must_check init_vtd_hw(bool resume) =20 /* * Enable queue invalidation - */ =20 + */ for_each_drhd_unit ( drhd ) { iommu =3D drhd->iommu; @@ -2504,7 +2516,7 @@ static int __must_check init_vtd_hw(bool resume) =20 /* * Enable interrupt remapping - */ =20 + */ if ( iommu_intremap !=3D iommu_intremap_off ) { int apic; @@ -2561,6 +2573,7 @@ static int __must_check init_vtd_hw(bool resume) =20 static void __hwdom_init setup_hwdom_rmrr(struct domain *d) { + struct iommu_context *ctx =3D iommu_default_context(d); struct acpi_rmrr_unit *rmrr; u16 bdf; int ret, i; @@ -2574,7 +2587,7 @@ static void __hwdom_init setup_hwdom_rmrr(struct doma= in *d) * domain, there shouldn't be a conflict. So its always safe and * strict to set 0. */ - ret =3D iommu_identity_mapping(d, p2m_access_rw, rmrr->base_addres= s, + ret =3D iommu_identity_mapping(d, ctx, p2m_access_rw, rmrr->base_a= ddress, rmrr->end_address, 0); if ( ret ) dprintk(XENLOG_ERR VTDPREFIX, @@ -2739,6 +2752,8 @@ static int cf_check reassign_device_ownership( =20 if ( !QUARANTINE_SKIP(target, pdev->arch.vtd.pgd_maddr) ) { + struct iommu_context *target_ctx =3D iommu_default_context(target); + if ( !has_arch_pdevs(target) ) vmx_pi_hooks_assign(target); =20 @@ -2753,7 +2768,7 @@ static int cf_check reassign_device_ownership( untrusted_msi =3D true; #endif =20 - ret =3D domain_context_mapping(target, devfn, pdev); + ret =3D domain_context_mapping(target, target_ctx, devfn, pdev); =20 if ( !ret && pdev->devfn =3D=3D devfn && !QUARANTINE_SKIP(source, pdev->arch.vtd.pgd_maddr) ) @@ -2802,6 +2817,7 @@ static int cf_check reassign_device_ownership( if ( !is_hardware_domain(source) ) { const struct acpi_rmrr_unit *rmrr; + struct iommu_context *ctx =3D iommu_default_context(source); u16 bdf; unsigned int i; =20 @@ -2813,7 +2829,7 @@ static int cf_check reassign_device_ownership( * Any RMRR flag is always ignored when remove a device, * but its always safe and strict to set 0. */ - ret =3D iommu_identity_mapping(source, p2m_access_x, + ret =3D iommu_identity_mapping(source, ctx, p2m_access_x, rmrr->base_address, rmrr->end_address, 0); if ( ret && ret !=3D -ENOENT ) @@ -2828,6 +2844,7 @@ static int cf_check intel_iommu_assign_device( struct domain *d, u8 devfn, struct pci_dev *pdev, u32 flag) { struct domain *s =3D pdev->domain; + struct iommu_context *ctx =3D iommu_default_context(d); struct acpi_rmrr_unit *rmrr; int ret =3D 0, i; u16 bdf, seg; @@ -2875,7 +2892,7 @@ static int cf_check intel_iommu_assign_device( { if ( rmrr->segment =3D=3D seg && bdf =3D=3D PCI_BDF(bus, devfn) ) { - ret =3D iommu_identity_mapping(d, p2m_access_rw, rmrr->base_ad= dress, + ret =3D iommu_identity_mapping(d, ctx, p2m_access_rw, rmrr->ba= se_address, rmrr->end_address, flag); if ( ret ) { @@ -2898,7 +2915,7 @@ static int cf_check intel_iommu_assign_device( { if ( rmrr->segment =3D=3D seg && bdf =3D=3D PCI_BDF(bus, devfn) ) { - int rc =3D iommu_identity_mapping(d, p2m_access_x, + int rc =3D iommu_identity_mapping(d, ctx, p2m_access_x, rmrr->base_address, rmrr->end_address, 0); =20 @@ -3071,10 +3088,11 @@ static void vtd_dump_page_table_level(paddr_t pt_ma= ddr, int level, paddr_t gpa, static void cf_check vtd_dump_page_tables(struct domain *d) { const struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx =3D iommu_default_context(d); =20 printk(VTDPREFIX" %pd table has %d levels\n", d, agaw_to_level(hd->arch.vtd.agaw)); - vtd_dump_page_table_level(hd->arch.vtd.pgd_maddr, + vtd_dump_page_table_level(ctx->arch.vtd.pgd_maddr, agaw_to_level(hd->arch.vtd.agaw), 0, 0); } =20 @@ -3082,6 +3100,7 @@ static int fill_qpt(struct dma_pte *this, unsigned in= t level, struct page_info *pgs[6]) { struct domain_iommu *hd =3D dom_iommu(dom_io); + struct iommu_context *ctx =3D iommu_default_context(dom_io); unsigned int i; int rc =3D 0; =20 @@ -3098,7 +3117,7 @@ static int fill_qpt(struct dma_pte *this, unsigned in= t level, * page table pages, and the resulting allocations are alw= ays * zeroed. */ - pgs[level] =3D iommu_alloc_pgtable(hd, 0); + pgs[level] =3D iommu_alloc_pgtable(hd, ctx, 0); if ( !pgs[level] ) { rc =3D -ENOMEM; @@ -3132,6 +3151,7 @@ static int cf_check intel_iommu_quarantine_init(struc= t pci_dev *pdev, bool scratch_page) { struct domain_iommu *hd =3D dom_iommu(dom_io); + struct iommu_context *ctx =3D iommu_default_context(dom_io); struct page_info *pg; unsigned int agaw =3D hd->arch.vtd.agaw; unsigned int level =3D agaw_to_level(agaw); @@ -3142,8 +3162,8 @@ static int cf_check intel_iommu_quarantine_init(struc= t pci_dev *pdev, int rc; =20 ASSERT(pcidevs_locked()); - ASSERT(!hd->arch.vtd.pgd_maddr); - ASSERT(page_list_empty(&hd->arch.pgtables.list)); + ASSERT(!ctx->arch.vtd.pgd_maddr); + ASSERT(page_list_empty(&ctx->arch.pgtables)); =20 if ( pdev->arch.vtd.pgd_maddr ) { @@ -3155,14 +3175,14 @@ static int cf_check intel_iommu_quarantine_init(str= uct pci_dev *pdev, if ( !drhd ) return -ENODEV; =20 - pg =3D iommu_alloc_pgtable(hd, 0); + pg =3D iommu_alloc_pgtable(hd, ctx, 0); if ( !pg ) return -ENOMEM; =20 rc =3D context_set_domain_id(NULL, pdev->arch.pseudo_domid, drhd->iomm= u); =20 /* Transiently install the root into DomIO, for iommu_identity_mapping= (). */ - hd->arch.vtd.pgd_maddr =3D page_to_maddr(pg); + ctx->arch.vtd.pgd_maddr =3D page_to_maddr(pg); =20 for_each_rmrr_device ( rmrr, bdf, i ) { @@ -3173,7 +3193,7 @@ static int cf_check intel_iommu_quarantine_init(struc= t pci_dev *pdev, { rmrr_found =3D true; =20 - rc =3D iommu_identity_mapping(dom_io, p2m_access_rw, + rc =3D iommu_identity_mapping(dom_io, ctx, p2m_access_rw, rmrr->base_address, rmrr->end_addr= ess, 0); if ( rc ) @@ -3183,8 +3203,8 @@ static int cf_check intel_iommu_quarantine_init(struc= t pci_dev *pdev, } } =20 - iommu_identity_map_teardown(dom_io); - hd->arch.vtd.pgd_maddr =3D 0; + iommu_identity_map_teardown(dom_io, ctx); + ctx->arch.vtd.pgd_maddr =3D 0; pdev->arch.vtd.pgd_maddr =3D page_to_maddr(pg); =20 if ( !rc && scratch_page ) @@ -3199,7 +3219,7 @@ static int cf_check intel_iommu_quarantine_init(struc= t pci_dev *pdev, pdev->arch.leaf_mfn =3D page_to_mfn(pgs[0]); } =20 - page_list_move(&pdev->arch.pgtables_list, &hd->arch.pgtables.list); + page_list_move(&pdev->arch.pgtables_list, &ctx->arch.pgtables); =20 if ( rc || (!scratch_page && !rmrr_found) ) quarantine_teardown(pdev, drhd); diff --git a/xen/drivers/passthrough/vtd/quirks.c b/xen/drivers/passthrough= /vtd/quirks.c index dc3dac749c..7937eb8c2b 100644 --- a/xen/drivers/passthrough/vtd/quirks.c +++ b/xen/drivers/passthrough/vtd/quirks.c @@ -422,7 +422,8 @@ static int __must_check map_me_phantom_function(struct = domain *domain, =20 /* map or unmap ME phantom function */ if ( !(mode & UNMAP_ME_PHANTOM_FUNC) ) - rc =3D domain_context_mapping_one(domain, drhd->iommu, 0, + rc =3D domain_context_mapping_one(domain, iommu_default_context(do= main), + drhd->iommu, 0, PCI_DEVFN(dev, 7), NULL, domid, pgd_maddr, mode); else diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/= x86/iommu.c index 8b1e0596b8..4a3fe059cb 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -185,26 +186,31 @@ void __hwdom_init arch_iommu_check_autotranslated_hwd= om(struct domain *d) =20 int arch_iommu_domain_init(struct domain *d) { - struct domain_iommu *hd =3D dom_iommu(d); + INIT_PAGE_LIST_HEAD(&dom_iommu(d)->arch.free_queue); + return 0; +} =20 - spin_lock_init(&hd->arch.mapping_lock); +int arch_iommu_context_init(struct domain *d, struct iommu_context *ctx, u= 32 flags) +{ + spin_lock_init(&ctx->arch.mapping_lock); =20 - INIT_PAGE_LIST_HEAD(&hd->arch.pgtables.list); - spin_lock_init(&hd->arch.pgtables.lock); - INIT_LIST_HEAD(&hd->arch.identity_maps); + INIT_PAGE_LIST_HEAD(&ctx->arch.pgtables); + INIT_LIST_HEAD(&ctx->arch.identity_maps); + + return 0; +} + +int arch_iommu_context_teardown(struct domain *d, struct iommu_context *ct= x, u32 flags) +{ + /* Cleanup all page tables */ + while ( iommu_free_pgtables(d, ctx) =3D=3D -ERESTART ) + /* nothing */; =20 return 0; } =20 void arch_iommu_domain_destroy(struct domain *d) { - /* - * There should be not page-tables left allocated by the time the - * domain is destroyed. Note that arch_iommu_domain_destroy() is - * called unconditionally, so pgtables may be uninitialized. - */ - ASSERT(!dom_iommu(d)->platform_ops || - page_list_empty(&dom_iommu(d)->arch.pgtables.list)); } =20 struct identity_map { @@ -214,14 +220,13 @@ struct identity_map { unsigned int count; }; =20 -int iommu_identity_mapping(struct domain *d, p2m_access_t p2ma, - paddr_t base, paddr_t end, +int iommu_identity_mapping(struct domain *d, struct iommu_context *ctx, + p2m_access_t p2ma, paddr_t base, paddr_t end, unsigned int flag) { unsigned long base_pfn =3D base >> PAGE_SHIFT_4K; unsigned long end_pfn =3D PAGE_ALIGN_4K(end) >> PAGE_SHIFT_4K; struct identity_map *map; - struct domain_iommu *hd =3D dom_iommu(d); =20 ASSERT(pcidevs_locked()); ASSERT(base < end); @@ -230,7 +235,7 @@ int iommu_identity_mapping(struct domain *d, p2m_access= _t p2ma, * No need to acquire hd->arch.mapping_lock: Both insertion and removal * get done while holding pcidevs_lock. */ - list_for_each_entry( map, &hd->arch.identity_maps, list ) + list_for_each_entry( map, &ctx->arch.identity_maps, list ) { if ( map->base =3D=3D base && map->end =3D=3D end ) { @@ -280,7 +285,7 @@ int iommu_identity_mapping(struct domain *d, p2m_access= _t p2ma, * Insert into list ahead of mapping, so the range can be found when * trying to clean up. */ - list_add_tail(&map->list, &hd->arch.identity_maps); + list_add_tail(&map->list, &ctx->arch.identity_maps); =20 for ( ; base_pfn < end_pfn; ++base_pfn ) { @@ -300,12 +305,11 @@ int iommu_identity_mapping(struct domain *d, p2m_acce= ss_t p2ma, return 0; } =20 -void iommu_identity_map_teardown(struct domain *d) +void iommu_identity_map_teardown(struct domain *d, struct iommu_context *c= tx) { - struct domain_iommu *hd =3D dom_iommu(d); struct identity_map *map, *tmp; =20 - list_for_each_entry_safe ( map, tmp, &hd->arch.identity_maps, list ) + list_for_each_entry_safe ( map, tmp, &ctx->arch.identity_maps, list ) { list_del(&map->list); xfree(map); @@ -603,7 +607,7 @@ void iommu_free_domid(domid_t domid, unsigned long *map) BUG(); } =20 -int iommu_free_pgtables(struct domain *d) +int iommu_free_pgtables(struct domain *d, struct iommu_context *ctx) { struct domain_iommu *hd =3D dom_iommu(d); struct page_info *pg; @@ -613,7 +617,7 @@ int iommu_free_pgtables(struct domain *d) return 0; =20 /* After this barrier, no new IOMMU mappings can be inserted. */ - spin_barrier(&hd->arch.mapping_lock); + spin_barrier(&ctx->arch.mapping_lock); =20 /* * Pages will be moved to the free list below. So we want to @@ -621,7 +625,7 @@ int iommu_free_pgtables(struct domain *d) */ iommu_vcall(hd->platform_ops, clear_root_pgtable, d); =20 - while ( (pg =3D page_list_remove_head(&hd->arch.pgtables.list)) ) + while ( (pg =3D page_list_remove_head(&ctx->arch.pgtables)) ) { free_domheap_page(pg); =20 @@ -633,6 +637,7 @@ int iommu_free_pgtables(struct domain *d) } =20 struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd, + struct iommu_context *ctx, uint64_t contig_mask) { unsigned int memflags =3D 0; @@ -677,9 +682,7 @@ struct page_info *iommu_alloc_pgtable(struct domain_iom= mu *hd, =20 unmap_domain_page(p); =20 - spin_lock(&hd->arch.pgtables.lock); - page_list_add(pg, &hd->arch.pgtables.list); - spin_unlock(&hd->arch.pgtables.lock); + page_list_add(pg, &ctx->arch.pgtables); =20 return pg; } @@ -718,13 +721,12 @@ static void cf_check free_queued_pgtables(void *arg) } } =20 -void iommu_queue_free_pgtable(struct domain_iommu *hd, struct page_info *p= g) +void iommu_queue_free_pgtable(struct domain *d, struct iommu_context *ctx, + struct page_info *pg) { unsigned int cpu =3D smp_processor_id(); =20 - spin_lock(&hd->arch.pgtables.lock); - page_list_del(pg, &hd->arch.pgtables.list); - spin_unlock(&hd->arch.pgtables.lock); + page_list_del(pg, &ctx->arch.pgtables); =20 page_list_add_tail(pg, &per_cpu(free_pgt_list, cpu)); =20 diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h index b928c67e19..11d23cdafb 100644 --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -343,9 +343,18 @@ extern int iommu_get_extra_reserved_device_memory(iomm= u_grdm_t *func, # define iommu_vcall iommu_call #endif =20 +struct iommu_context { + #ifdef CONFIG_HAS_PASSTHROUGH + u16 id; /* Context id (0 means default context) */ + + struct arch_iommu_context arch; + #endif +}; + struct domain_iommu { #ifdef CONFIG_HAS_PASSTHROUGH struct arch_iommu arch; + struct iommu_context default_ctx; #endif =20 /* iommu_ops */ @@ -380,6 +389,7 @@ struct domain_iommu { #define dom_iommu(d) (&(d)->iommu) #define iommu_set_feature(d, f) set_bit(f, dom_iommu(d)->features) #define iommu_clear_feature(d, f) clear_bit(f, dom_iommu(d)->features) +#define iommu_default_context(d) (&dom_iommu(d)->default_ctx) /* does not = lock ! */ =20 /* Are we using the domain P2M table as its IOMMU pagetable? */ #define iommu_use_hap_pt(d) (IS_ENABLED(CONFIG_HVM) && \ --=20 2.47.2 Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sat Nov 1 23:25:40 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=quarantine dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1739787528; cv=none; d=zohomail.com; s=zohoarc; b=mDvry/hDzMThh5JtBe3cKELEQc5E7lDfE4niKxpg/QEJMMg7l3BLIokC8Gswb+IjxbCpxUyLbPKccy6xM3zpYR44ryJiNj1h3Z3rNIMqXo6h/j0O2P+ItOE9Lz+0bcxeCtoKiXRmHBYgjke3fejbpWxISXkZ+ftIY6Rg+mNXuNc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1739787528; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=JKPPS3Q607//7imzk+rG+e9sgZlVye70tpXwjS+krKI=; b=jsffBUvnu4SE/fUyXqGE9O8e/lWfTApEq6yNeF8J0Ql9kXUzVuqv6pKDSl4u3SRTZeq4pbo1xtKryZNuddv0RwzGgCOC1n/H5f+Wie2yY2ueu+k3oe44BWUnp036hgyFVynYl6KUsfpEFxITY56bEC+s5e49lZ7anbmzO15Kapo= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1739787528949513.4268461213179; Mon, 17 Feb 2025 02:18:48 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.890005.1299065 (Exim 4.92) (envelope-from ) id 1tjyCo-0000sf-Te; Mon, 17 Feb 2025 10:18:26 +0000 Received: by outflank-mailman (output) from mailman id 890005.1299065; Mon, 17 Feb 2025 10:18:26 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCo-0000rM-Ob; Mon, 17 Feb 2025 10:18:26 +0000 Received: by outflank-mailman (input) for mailman id 890005; Mon, 17 Feb 2025 10:18:25 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCn-0007sm-A0 for xen-devel@lists.xenproject.org; Mon, 17 Feb 2025 10:18:25 +0000 Received: from mail13.wdc04.mandrillapp.com (mail13.wdc04.mandrillapp.com [205.201.139.13]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id 83cc45cc-ed18-11ef-9896-31a8f345e629; Mon, 17 Feb 2025 11:18:22 +0100 (CET) Received: from pmta16.mandrill.prod.suw01.rsglab.com (localhost [127.0.0.1]) by mail13.wdc04.mandrillapp.com (Mailchimp) with ESMTP id 4YxJWx6zvBzNCd9LM for ; Mon, 17 Feb 2025 10:18:21 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id 86b6fa1377c544e19369cdbc272e29ad; Mon, 17 Feb 2025 10:18:21 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 83cc45cc-ed18-11ef-9896-31a8f345e629 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1739787502; x=1740057502; bh=JKPPS3Q607//7imzk+rG+e9sgZlVye70tpXwjS+krKI=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=bAQIiWz6v01F5RaPZocRDCKJMvWMTDX71rdUYH4L1bPCNR5TUnWOWi/MhhF5KqJqp 9K+NWAM08/+onhvgFW0y50Ac/i7CpbOW1exz+8kJpwOYsyUEuC5QTF+hWUOm62DuzD IKW7pFNwbL6RVkf4dsIVImu4d/FDAnHfIt0LBen46+bsgviH6Z2y2xoke3415e3Jk5 5B9UyJkKqJ91GzS8qmApV+/FzT5kcmXn89xVfXc4zugJ/WGmAkBapqHyOeRsbdn/A6 uJmVYB3cxtVoiDqa60Qsup+xfkkCoDIpDPmKFpLzGwQLJTLV3rxZLQOCdTbinfcXGB gJeudrVBNSQ0A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1739787502; x=1740048002; i=teddy.astie@vates.tech; bh=JKPPS3Q607//7imzk+rG+e9sgZlVye70tpXwjS+krKI=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=eYU+h8+hnuO3ICTeFUBLABVGweIxZwNFQeUUD9l066kubqB5bE6VI5L5FgyP+AeWh iHNPcvjkmSm4+0TXw/9ZMyLGKTGf3/bhqi9RP3s+XMx3q7zRoO+T2z9ZRmLkGQLqp0 Yh3X7kmdCzuqRPzDg9MvaQlassqBYicOS2AfHNiSXOJMyYQfBmLaitJguyOH842yQ7 sr1rQkomAEPNwX6ru+jiK0G1ureeraeJvh0GU08bsBPmYiVi67zNskni6ap62bSlIb pNzeDAxs9LgTiqFlVTRy/oTFNtcum+0BL1QjxPARHVvt3psW5VIj8JKSqlUwpRdCLh 3PXowsgd99XLA== From: "Teddy Astie" Subject: =?utf-8?Q?[XEN=20RFC=20PATCH=20v6=2005/11]=20iommu:=20Simplify=20quarantine=20logic?= X-Mailer: git-send-email 2.47.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1739787500867 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Jan Beulich" , "Andrew Cooper" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" Message-Id: <7ccad8409ffdfc026f86303729f3f45efd9bae3e.1739785339.git.teddy.astie@vates.tech> In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.86b6fa1377c544e19369cdbc272e29ad?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20250217:md Date: Mon, 17 Feb 2025 10:18:21 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity teddy.astie@vates.tech) (identity @mandrillapp.com) X-ZM-MESSAGEID: 1739787531569019100 Content-Type: text/plain; charset="utf-8" Current quarantine code is very hard to change and is very complicated, remove most bits of it and replace it with direct reassignement to dom_io domain instead. Signed-off-by: Teddy Astie --- A idea would be to rework this feature using the new reworked IOMMU subsystem. --- xen/arch/x86/include/asm/pci.h | 17 -- xen/drivers/passthrough/amd/iommu_map.c | 129 +--------- xen/drivers/passthrough/amd/pci_amd_iommu.c | 51 +--- xen/drivers/passthrough/pci.c | 7 +- xen/drivers/passthrough/vtd/iommu.c | 253 ++------------------ xen/drivers/passthrough/x86/iommu.c | 1 - 6 files changed, 29 insertions(+), 429 deletions(-) diff --git a/xen/arch/x86/include/asm/pci.h b/xen/arch/x86/include/asm/pci.h index fd5480d67d..214c1a0948 100644 --- a/xen/arch/x86/include/asm/pci.h +++ b/xen/arch/x86/include/asm/pci.h @@ -15,23 +15,6 @@ =20 struct arch_pci_dev { vmask_t used_vectors; - /* - * These fields are (de)initialized under pcidevs-lock. Other uses of - * them don't race (de)initialization and hence don't strictly need any - * locking. - */ - union { - /* Subset of struct arch_iommu's fields, to be used in dom_io. */ - struct { - uint64_t pgd_maddr; - } vtd; - struct { - struct page_info *root_table; - } amd; - }; - domid_t pseudo_domid; - mfn_t leaf_mfn; - struct page_list_head pgtables_list; }; =20 int pci_conf_write_intercept(unsigned int seg, unsigned int bdf, diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthro= ugh/amd/iommu_map.c index 7514384789..91d8c21048 100644 --- a/xen/drivers/passthrough/amd/iommu_map.c +++ b/xen/drivers/passthrough/amd/iommu_map.c @@ -656,9 +656,6 @@ int amd_iommu_reserve_domain_unity_map(struct domain *d= , struct iommu_context *c { int rc; =20 - if ( d =3D=3D dom_io ) - return 0; - for ( rc =3D 0; !rc && map; map =3D map->next ) { p2m_access_t p2ma =3D p2m_access_n; @@ -680,9 +677,6 @@ int amd_iommu_reserve_domain_unity_unmap(struct domain = *d, struct iommu_context { int rc; =20 - if ( d =3D=3D dom_io ) - return 0; - for ( rc =3D 0; map; map =3D map->next ) { int ret =3D iommu_identity_mapping(d, ctx, p2m_access_x, map->addr, @@ -771,134 +765,15 @@ int cf_check amd_iommu_get_reserved_device_memory( return 0; } =20 -static int fill_qpt(union amd_iommu_pte *this, unsigned int level, - struct page_info *pgs[IOMMU_MAX_PT_LEVELS]) -{ - struct domain_iommu *hd =3D dom_iommu(dom_io); - struct iommu_context *ctx =3D iommu_default_context(dom_io); - unsigned int i; - int rc =3D 0; - - for ( i =3D 0; !rc && i < PTE_PER_TABLE_SIZE; ++i ) - { - union amd_iommu_pte *pte =3D &this[i], *next; - - if ( !pte->pr ) - { - if ( !pgs[level] ) - { - /* - * The pgtable allocator is fine for the leaf page, as wel= l as - * page table pages, and the resulting allocations are alw= ays - * zeroed. - */ - pgs[level] =3D iommu_alloc_pgtable(hd, ctx, 0); - if ( !pgs[level] ) - { - rc =3D -ENOMEM; - break; - } - - if ( level ) - { - next =3D __map_domain_page(pgs[level]); - rc =3D fill_qpt(next, level - 1, pgs); - unmap_domain_page(next); - } - } - - /* - * PDEs are essentially a subset of PTEs, so this function - * is fine to use even at the leaf. - */ - set_iommu_pde_present(pte, mfn_x(page_to_mfn(pgs[level])), lev= el, - true, true); - } - else if ( level && pte->next_level ) - { - next =3D map_domain_page(_mfn(pte->mfn)); - rc =3D fill_qpt(next, level - 1, pgs); - unmap_domain_page(next); - } - } - - return rc; -} - int cf_check amd_iommu_quarantine_init(struct pci_dev *pdev, bool scratch_= page) { - struct domain_iommu *hd =3D dom_iommu(dom_io); - struct iommu_context *ctx =3D iommu_default_context(dom_io); - unsigned int level =3D ctx->arch.amd.paging_mode; - unsigned int req_id =3D get_dma_requestor_id(pdev->seg, pdev->sbdf.bdf= ); - const struct ivrs_mappings *ivrs_mappings =3D get_ivrs_mappings(pdev->= seg); - int rc; + amd_iommu_quarantine_teardown(pdev); =20 - ASSERT(pcidevs_locked()); - ASSERT(!ctx->arch.amd.root_table); - ASSERT(page_list_empty(&ctx->arch.pgtables)); - - if ( !scratch_page && !ivrs_mappings[req_id].unity_map ) - return 0; - - ASSERT(pdev->arch.pseudo_domid !=3D DOMID_INVALID); - - if ( pdev->arch.amd.root_table ) - { - clear_domain_page(pdev->arch.leaf_mfn); - return 0; - } - - pdev->arch.amd.root_table =3D iommu_alloc_pgtable(hd, ctx, 0); - if ( !pdev->arch.amd.root_table ) - return -ENOMEM; - - /* Transiently install the root into DomIO, for iommu_identity_mapping= (). */ - ctx->arch.amd.root_table =3D pdev->arch.amd.root_table; - - rc =3D amd_iommu_reserve_domain_unity_map(dom_io, ctx, - ivrs_mappings[req_id].unity_ma= p, - 0); - - iommu_identity_map_teardown(dom_io, ctx); - ctx->arch.amd.root_table =3D NULL; - - if ( rc ) - AMD_IOMMU_WARN("%pp: quarantine unity mapping failed\n", &pdev->sb= df); - else if ( scratch_page ) - { - union amd_iommu_pte *root; - struct page_info *pgs[IOMMU_MAX_PT_LEVELS] =3D {}; - - root =3D __map_domain_page(pdev->arch.amd.root_table); - rc =3D fill_qpt(root, level - 1, pgs); - unmap_domain_page(root); - - pdev->arch.leaf_mfn =3D page_to_mfn(pgs[0]); - } - - page_list_move(&pdev->arch.pgtables_list, &ctx->arch.pgtables); - - if ( rc ) - amd_iommu_quarantine_teardown(pdev); - - return rc; + return 0; } =20 void amd_iommu_quarantine_teardown(struct pci_dev *pdev) { - struct iommu_context *ctx =3D iommu_default_context(dom_io); - - ASSERT(pcidevs_locked()); - - if ( !pdev->arch.amd.root_table ) - return; - - ASSERT(page_list_empty(&ctx->arch.pgtables)); - page_list_move(&ctx->arch.pgtables, &pdev->arch.pgtables_list); - while ( iommu_free_pgtables(dom_io, ctx) =3D=3D -ERESTART ) - /* nothing */; - pdev->arch.amd.root_table =3D NULL; } =20 /* diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/pass= through/amd/pci_amd_iommu.c index a3815d71be..0008b35162 100644 --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -138,9 +138,6 @@ static int __must_check amd_iommu_setup_domain_device( const struct page_info *root_pg; domid_t domid; =20 - if ( QUARANTINE_SKIP(domain, pdev) ) - return 0; - BUG_ON(!ctx->arch.amd.paging_mode || !iommu->dev_table.buffer); =20 rc =3D allocate_domain_resources(domain); @@ -159,16 +156,8 @@ static int __must_check amd_iommu_setup_domain_device( dte =3D &table[req_id]; ivrs_dev =3D &get_ivrs_mappings(iommu->seg)[req_id]; =20 - if ( domain !=3D dom_io ) - { - root_pg =3D ctx->arch.amd.root_table; - domid =3D domain->domain_id; - } - else - { - root_pg =3D pdev->arch.amd.root_table; - domid =3D pdev->arch.pseudo_domid; - } + root_pg =3D ctx->arch.amd.root_table; + domid =3D domain->domain_id; =20 spin_lock_irqsave(&iommu->lock, flags); =20 @@ -414,9 +403,6 @@ static void amd_iommu_disable_domain_device(const struc= t domain *domain, int req_id; u8 bus =3D pdev->bus; =20 - if ( QUARANTINE_SKIP(domain, pdev) ) - return; - ASSERT(pcidevs_locked()); =20 if ( pci_ats_device(iommu->seg, bus, pdev->devfn) && @@ -479,14 +465,9 @@ static int cf_check reassign_device( return -ENODEV; } =20 - if ( !QUARANTINE_SKIP(target, pdev) ) - { - rc =3D amd_iommu_setup_domain_device(target, target_ctx, iommu, de= vfn, pdev); - if ( rc ) - return rc; - } - else - amd_iommu_disable_domain_device(source, iommu, devfn, pdev); + rc =3D amd_iommu_setup_domain_device(target, target_ctx, iommu, devfn,= pdev); + if ( rc ) + return rc; =20 if ( devfn =3D=3D pdev->devfn && pdev->domain !=3D target ) { @@ -579,8 +560,6 @@ static int cf_check amd_iommu_add_device(u8 devfn, stru= ct pci_dev *pdev) struct iommu_context *ctx; u16 bdf; struct ivrs_mappings *ivrs_mappings; - bool fresh_domid =3D false; - int ret; =20 if ( !pdev->domain ) return -EINVAL; @@ -649,22 +628,7 @@ static int cf_check amd_iommu_add_device(u8 devfn, str= uct pci_dev *pdev) AMD_IOMMU_WARN("%pd: unity mapping failed for %pp\n", pdev->domain, &PCI_SBDF(pdev->seg, bdf)); =20 - if ( iommu_quarantine && pdev->arch.pseudo_domid =3D=3D DOMID_INVALID ) - { - pdev->arch.pseudo_domid =3D iommu_alloc_domid(iommu->domid_map); - if ( pdev->arch.pseudo_domid =3D=3D DOMID_INVALID ) - return -ENOSPC; - fresh_domid =3D true; - } - - ret =3D amd_iommu_setup_domain_device(pdev->domain, ctx, iommu, devfn,= pdev); - if ( ret && fresh_domid ) - { - iommu_free_domid(pdev->arch.pseudo_domid, iommu->domid_map); - pdev->arch.pseudo_domid =3D DOMID_INVALID; - } - - return ret; + return amd_iommu_setup_domain_device(pdev->domain, ctx, iommu, devfn, = pdev); } =20 static int cf_check amd_iommu_remove_device(u8 devfn, struct pci_dev *pdev) @@ -700,9 +664,6 @@ static int cf_check amd_iommu_remove_device(u8 devfn, s= truct pci_dev *pdev) =20 amd_iommu_quarantine_teardown(pdev); =20 - iommu_free_domid(pdev->arch.pseudo_domid, iommu->domid_map); - pdev->arch.pseudo_domid =3D DOMID_INVALID; - if ( amd_iommu_perdev_intremap && ivrs_mappings[bdf].dte_requestor_id =3D=3D bdf && ivrs_mappings[bdf].intremap_table ) diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 777c6b1a7f..e1ca74b477 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -1370,12 +1370,7 @@ static int cf_check _dump_pci_devices(struct pci_seg= *pseg, void *arg) list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list ) { printk("%pp - ", &pdev->sbdf); -#ifdef CONFIG_X86 - if ( pdev->domain =3D=3D dom_io ) - printk("DomIO:%x", pdev->arch.pseudo_domid); - else -#endif - printk("%pd", pdev->domain); + printk("%pd", pdev->domain); printk(" - node %-3d", (pdev->node !=3D NUMA_NO_NODE) ? pdev->node= : -1); pdev_dump_msi(pdev); printk("\n"); diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/= vtd/iommu.c index f60f39ee1d..55562084fc 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -49,14 +49,6 @@ #define CONTIG_MASK DMA_PTE_CONTIG_MASK #include =20 -/* dom_io is used as a sentinel for quarantined devices */ -#define QUARANTINE_SKIP(d, pgd_maddr) ((d) =3D=3D dom_io && !(pgd_maddr)) -#define DEVICE_DOMID(d, pdev) ((d) !=3D dom_io ? (d)->domain_id \ - : (pdev)->arch.pseudo_domid) -#define DEVICE_PGTABLE(d, pdev) ((d) !=3D dom_io \ - ? iommu_default_context(d)->arch.vtd.pgd_= maddr \ - : (pdev)->arch.vtd.pgd_maddr) - bool __read_mostly iommu_igfx =3D true; bool __read_mostly iommu_qinval =3D true; #ifndef iommu_snoop @@ -1494,8 +1486,6 @@ int domain_context_mapping_one( int rc, ret; bool flush_dev_iotlb; =20 - if ( QUARANTINE_SKIP(domain, pgd_maddr) ) - return 0; =20 ASSERT(pcidevs_locked()); spin_lock(&iommu->lock); @@ -1512,8 +1502,6 @@ int domain_context_mapping_one( domid =3D did_to_domain_id(iommu, prev_did); if ( domid < DOMID_FIRST_RESERVED ) prev_dom =3D rcu_lock_domain_by_id(domid); - else if ( pdev ? domid =3D=3D pdev->arch.pseudo_domid : domid > DO= MID_MASK ) - prev_dom =3D rcu_lock_domain(dom_io); if ( !prev_dom ) { spin_unlock(&iommu->lock); @@ -1645,8 +1633,8 @@ int domain_context_mapping_one( ret =3D domain_context_unmap_one(domain, iommu, bus, devfn); else ret =3D domain_context_mapping_one(prev_dom, ctx, iommu, bus, = devfn, pdev, - DEVICE_DOMID(prev_dom, pdev), - DEVICE_PGTABLE(prev_dom, pdev= ), + prev_dom->domain_id, + iommu_default_context(prev_do= m)->arch.vtd.pgd_maddr, (mode & MAP_WITH_RMRR) | MAP_ERROR_RECOVERY) < 0; =20 @@ -1668,8 +1656,8 @@ static int domain_context_mapping(struct domain *doma= in, struct iommu_context *c { const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); const struct acpi_rmrr_unit *rmrr; - paddr_t pgd_maddr =3D DEVICE_PGTABLE(domain, pdev); - domid_t orig_domid =3D pdev->arch.pseudo_domid; + paddr_t pgd_maddr =3D ctx->arch.vtd.pgd_maddr; + domid_t did =3D domain->domain_id; int ret =3D 0; unsigned int i, mode =3D 0; uint16_t seg =3D pdev->seg, bdf; @@ -1722,20 +1710,11 @@ static int domain_context_mapping(struct domain *do= main, struct iommu_context *c if ( !drhd ) return -ENODEV; =20 - if ( iommu_quarantine && orig_domid =3D=3D DOMID_INVALID ) - { - pdev->arch.pseudo_domid =3D - iommu_alloc_domid(drhd->iommu->pseudo_domid_map); - if ( pdev->arch.pseudo_domid =3D=3D DOMID_INVALID ) - return -ENOSPC; - } - if ( iommu_debug ) printk(VTDPREFIX "%pd:PCIe: map %pp\n", domain, &PCI_SBDF(seg, bus, devfn)); ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, bus, = devfn, pdev, - DEVICE_DOMID(domain, pdev), pgd_m= addr, - mode); + did, pgd_maddr, mode); if ( ret > 0 ) ret =3D 0; if ( !ret && devfn =3D=3D pdev->devfn && ats_device(pdev, drhd) > = 0 ) @@ -1747,21 +1726,12 @@ static int domain_context_mapping(struct domain *do= main, struct iommu_context *c if ( !drhd ) return -ENODEV; =20 - if ( iommu_quarantine && orig_domid =3D=3D DOMID_INVALID ) - { - pdev->arch.pseudo_domid =3D - iommu_alloc_domid(drhd->iommu->pseudo_domid_map); - if ( pdev->arch.pseudo_domid =3D=3D DOMID_INVALID ) - return -ENOSPC; - } - if ( iommu_debug ) printk(VTDPREFIX "%pd:PCI: map %pp\n", domain, &PCI_SBDF(seg, bus, devfn)); =20 ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, bus, = devfn, - pdev, DEVICE_DOMID(domain, pdev), - pgd_maddr, mode); + pdev, did, pgd_maddr, mode); if ( ret < 0 ) break; prev_present =3D ret; @@ -1791,8 +1761,7 @@ static int domain_context_mapping(struct domain *doma= in, struct iommu_context *c */ if ( ret >=3D 0 ) ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, b= us, devfn, - NULL, DEVICE_DOMID(domain, pd= ev), - pgd_maddr, mode); + NULL, did, pgd_maddr, mode); =20 /* * Devices behind PCIe-to-PCI/PCIx bridge may generate different @@ -1807,8 +1776,7 @@ static int domain_context_mapping(struct domain *doma= in, struct iommu_context *c if ( !ret && pdev_type(seg, bus, devfn) =3D=3D DEV_TYPE_PCIe2PCI_B= RIDGE && (secbus !=3D pdev->bus || pdev->devfn !=3D 0) ) ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, s= ecbus, 0, - NULL, DEVICE_DOMID(domain, pd= ev), - pgd_maddr, mode); + NULL, did, pgd_maddr, mode); =20 if ( ret ) { @@ -1830,13 +1798,6 @@ static int domain_context_mapping(struct domain *dom= ain, struct iommu_context *c if ( !ret && devfn =3D=3D pdev->devfn ) pci_vtd_quirk(pdev); =20 - if ( ret && drhd && orig_domid =3D=3D DOMID_INVALID ) - { - iommu_free_domid(pdev->arch.pseudo_domid, - drhd->iommu->pseudo_domid_map); - pdev->arch.pseudo_domid =3D DOMID_INVALID; - } - return ret; } =20 @@ -1994,10 +1955,6 @@ static const struct acpi_drhd_unit *domain_context_u= nmap( return ERR_PTR(-EINVAL); } =20 - if ( !ret && pdev->devfn =3D=3D devfn && - !QUARANTINE_SKIP(domain, pdev->arch.vtd.pgd_maddr) ) - check_cleanup_domid_map(domain, pdev, iommu); - return drhd; } =20 @@ -2031,21 +1988,6 @@ static void cf_check iommu_domain_teardown(struct do= main *d) static void quarantine_teardown(struct pci_dev *pdev, const struct acpi_drhd_unit *drhd) { - struct iommu_context *ctx =3D iommu_default_context(dom_io); - - ASSERT(pcidevs_locked()); - - if ( !pdev->arch.vtd.pgd_maddr ) - return; - - ASSERT(page_list_empty(&ctx->arch.pgtables)); - page_list_move(&ctx->arch.pgtables, &pdev->arch.pgtables_list); - while ( iommu_free_pgtables(dom_io, ctx) =3D=3D -ERESTART ) - /* nothing */; - pdev->arch.vtd.pgd_maddr =3D 0; - - if ( drhd ) - cleanup_domid_map(pdev->arch.pseudo_domid, drhd->iommu); } =20 static int __must_check cf_check intel_iommu_map_page( @@ -2386,13 +2328,6 @@ static int cf_check intel_iommu_remove_device(u8 dev= fn, struct pci_dev *pdev) =20 quarantine_teardown(pdev, drhd); =20 - if ( drhd ) - { - iommu_free_domid(pdev->arch.pseudo_domid, - drhd->iommu->pseudo_domid_map); - pdev->arch.pseudo_domid =3D DOMID_INVALID; - } - return 0; } =20 @@ -2750,42 +2685,22 @@ static int cf_check reassign_device_ownership( { int ret; =20 - if ( !QUARANTINE_SKIP(target, pdev->arch.vtd.pgd_maddr) ) - { - struct iommu_context *target_ctx =3D iommu_default_context(target); - - if ( !has_arch_pdevs(target) ) - vmx_pi_hooks_assign(target); + if ( !has_arch_pdevs(target) ) + vmx_pi_hooks_assign(target); =20 #ifdef CONFIG_PV - /* - * Devices assigned to untrusted domains (here assumed to be any d= omU) - * can attempt to send arbitrary LAPIC/MSI messages. We are unprot= ected - * by the root complex unless interrupt remapping is enabled. - */ - if ( !iommu_intremap && !is_hardware_domain(target) && - !is_system_domain(target) ) - untrusted_msi =3D true; + /* + * Devices assigned to untrusted domains (here assumed to be any do= mU) + * can attempt to send arbitrary LAPIC/MSI messages. We are unprote= cted + * by the root complex unless interrupt remapping is enabled. + */ + if ( !iommu_intremap && !is_hardware_domain(target) && + !is_system_domain(target) ) + untrusted_msi =3D true; #endif =20 - ret =3D domain_context_mapping(target, target_ctx, devfn, pdev); - - if ( !ret && pdev->devfn =3D=3D devfn && - !QUARANTINE_SKIP(source, pdev->arch.vtd.pgd_maddr) ) - { - const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_u= nit(pdev); + ret =3D domain_context_mapping(target, iommu_default_context(target), = devfn, pdev); =20 - if ( drhd ) - check_cleanup_domid_map(source, pdev, drhd->iommu); - } - } - else - { - const struct acpi_drhd_unit *drhd; - - drhd =3D domain_context_unmap(source, devfn, pdev); - ret =3D IS_ERR(drhd) ? PTR_ERR(drhd) : 0; - } if ( ret ) { if ( !has_arch_pdevs(target) ) @@ -2884,9 +2799,6 @@ static int cf_check intel_iommu_assign_device( } } =20 - if ( d =3D=3D dom_io ) - return reassign_device_ownership(s, d, devfn, pdev); - /* Setup rmrr identity mapping */ for_each_rmrr_device( rmrr, bdf, i ) { @@ -3096,135 +3008,10 @@ static void cf_check vtd_dump_page_tables(struct d= omain *d) agaw_to_level(hd->arch.vtd.agaw), 0, 0); } =20 -static int fill_qpt(struct dma_pte *this, unsigned int level, - struct page_info *pgs[6]) -{ - struct domain_iommu *hd =3D dom_iommu(dom_io); - struct iommu_context *ctx =3D iommu_default_context(dom_io); - unsigned int i; - int rc =3D 0; - - for ( i =3D 0; !rc && i < PTE_NUM; ++i ) - { - struct dma_pte *pte =3D &this[i], *next; - - if ( !dma_pte_present(*pte) ) - { - if ( !pgs[level] ) - { - /* - * The pgtable allocator is fine for the leaf page, as wel= l as - * page table pages, and the resulting allocations are alw= ays - * zeroed. - */ - pgs[level] =3D iommu_alloc_pgtable(hd, ctx, 0); - if ( !pgs[level] ) - { - rc =3D -ENOMEM; - break; - } - - if ( level ) - { - next =3D map_vtd_domain_page(page_to_maddr(pgs[level])= ); - rc =3D fill_qpt(next, level - 1, pgs); - unmap_vtd_domain_page(next); - } - } - - dma_set_pte_addr(*pte, page_to_maddr(pgs[level])); - dma_set_pte_readable(*pte); - dma_set_pte_writable(*pte); - } - else if ( level && !dma_pte_superpage(*pte) ) - { - next =3D map_vtd_domain_page(dma_pte_addr(*pte)); - rc =3D fill_qpt(next, level - 1, pgs); - unmap_vtd_domain_page(next); - } - } - - return rc; -} - static int cf_check intel_iommu_quarantine_init(struct pci_dev *pdev, bool scratch_page) { - struct domain_iommu *hd =3D dom_iommu(dom_io); - struct iommu_context *ctx =3D iommu_default_context(dom_io); - struct page_info *pg; - unsigned int agaw =3D hd->arch.vtd.agaw; - unsigned int level =3D agaw_to_level(agaw); - const struct acpi_drhd_unit *drhd; - const struct acpi_rmrr_unit *rmrr; - unsigned int i, bdf; - bool rmrr_found =3D false; - int rc; - - ASSERT(pcidevs_locked()); - ASSERT(!ctx->arch.vtd.pgd_maddr); - ASSERT(page_list_empty(&ctx->arch.pgtables)); - - if ( pdev->arch.vtd.pgd_maddr ) - { - clear_domain_page(pdev->arch.leaf_mfn); - return 0; - } - - drhd =3D acpi_find_matched_drhd_unit(pdev); - if ( !drhd ) - return -ENODEV; - - pg =3D iommu_alloc_pgtable(hd, ctx, 0); - if ( !pg ) - return -ENOMEM; - - rc =3D context_set_domain_id(NULL, pdev->arch.pseudo_domid, drhd->iomm= u); - - /* Transiently install the root into DomIO, for iommu_identity_mapping= (). */ - ctx->arch.vtd.pgd_maddr =3D page_to_maddr(pg); - - for_each_rmrr_device ( rmrr, bdf, i ) - { - if ( rc ) - break; - - if ( rmrr->segment =3D=3D pdev->seg && bdf =3D=3D pdev->sbdf.bdf ) - { - rmrr_found =3D true; - - rc =3D iommu_identity_mapping(dom_io, ctx, p2m_access_rw, - rmrr->base_address, rmrr->end_addr= ess, - 0); - if ( rc ) - printk(XENLOG_ERR VTDPREFIX - "%pp: RMRR quarantine mapping failed\n", - &pdev->sbdf); - } - } - - iommu_identity_map_teardown(dom_io, ctx); - ctx->arch.vtd.pgd_maddr =3D 0; - pdev->arch.vtd.pgd_maddr =3D page_to_maddr(pg); - - if ( !rc && scratch_page ) - { - struct dma_pte *root; - struct page_info *pgs[6] =3D {}; - - root =3D map_vtd_domain_page(pdev->arch.vtd.pgd_maddr); - rc =3D fill_qpt(root, level - 1, pgs); - unmap_vtd_domain_page(root); - - pdev->arch.leaf_mfn =3D page_to_mfn(pgs[0]); - } - - page_list_move(&pdev->arch.pgtables_list, &ctx->arch.pgtables); - - if ( rc || (!scratch_page && !rmrr_found) ) - quarantine_teardown(pdev, drhd); - - return rc; + return 0; } =20 static const struct iommu_ops __initconst_cf_clobber vtd_ops =3D { diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/= x86/iommu.c index 4a3fe059cb..a444e5813e 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -549,7 +549,6 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *= d) =20 void arch_pci_init_pdev(struct pci_dev *pdev) { - pdev->arch.pseudo_domid =3D DOMID_INVALID; } =20 unsigned long *__init iommu_init_domid(domid_t reserve) --=20 2.47.2 Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sat Nov 1 23:25:40 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=quarantine dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1739787516; cv=none; d=zohomail.com; s=zohoarc; b=F6R97o1GK45uEeo5V5w/eyJSWqywlGoEgSaamjGxhX2DBTilbpKwlvgqckA8QoE0wQqwfzoDXWMiJqMEilN/IUAcjfx6PzB25xNlCrtOYIkfdT3VT+iaGQ+iQJWJ1eV1rtNWUUNpVz9fzokD9r01uH7mofDdp0Gvx8WDRfChnSk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1739787516; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=Unm7kyf0yewx/61zGSD5OTOtKbpaUHiruf6MwrYYcBM=; b=Uk+rtg9LmE8B5X3MiMYROqGCXCN97Xuh/L5NJfcLPQ/VqHEXssz3qqXM8rfpsnmtdvTuoELSCcUs3TqWl4EmAyIX+QVkxkreDCVDTNDey2YUj79itPzTM/SbGK+3vNEKOStapJeUXpCUS9Dp9yaOYySkUiZ+drhjJibyrlTywOY= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 173978751655690.80443461299262; Mon, 17 Feb 2025 02:18:36 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.890004.1299049 (Exim 4.92) (envelope-from ) id 1tjyCn-0000Q0-NB; Mon, 17 Feb 2025 10:18:25 +0000 Received: by outflank-mailman (output) from mailman id 890004.1299049; Mon, 17 Feb 2025 10:18:25 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCn-0000OG-Eq; Mon, 17 Feb 2025 10:18:25 +0000 Received: by outflank-mailman (input) for mailman id 890004; Mon, 17 Feb 2025 10:18:24 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCm-0008Nl-8a for xen-devel@lists.xenproject.org; Mon, 17 Feb 2025 10:18:24 +0000 Received: from mail178-27.suw51.mandrillapp.com (mail178-27.suw51.mandrillapp.com [198.2.178.27]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 840ba84f-ed18-11ef-9aa6-95dc52dad729; Mon, 17 Feb 2025 11:18:22 +0100 (CET) Received: from pmta13.mandrill.prod.suw01.rsglab.com (localhost [127.0.0.1]) by mail178-27.suw51.mandrillapp.com (Mailchimp) with ESMTP id 4YxJWx75CZz6CPyPD for ; Mon, 17 Feb 2025 10:18:21 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id b78efb45eff9473c8831e3b16f38586e; Mon, 17 Feb 2025 10:18:21 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 840ba84f-ed18-11ef-9aa6-95dc52dad729 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1739787502; x=1740057502; bh=Unm7kyf0yewx/61zGSD5OTOtKbpaUHiruf6MwrYYcBM=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=SG8tm9VogyshjU1b25z0O6UNlPkwoMsobD6Ajus3dNmbDTWak4gOB5gCAISiubeKZ JgBpIjN7WwSs+GPEmd166SfIv8ESxpRh6cH4DYp9W8V9iEd0SofQL1f2HBbRTu4NRg ds1nNz24Hdkyh4GOJBG/0Qy9Ley0gAbZdzQf7R+HSpHHvPlCU6BZ+7pDQDH7tWwEn0 +IUFmNZ/l4CCmX7/iYHonTyiwdyr8SFSJl91LcSsg4S9JWCehOGteZmnhYPEmex13T HleNij7pimLPsv9MZvrCHujogpJMTnlP/RFPDxngpPJ13F972JipWN+qI125PESf7U utBr/LS14Quuw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1739787502; x=1740048002; i=teddy.astie@vates.tech; bh=Unm7kyf0yewx/61zGSD5OTOtKbpaUHiruf6MwrYYcBM=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=C/GcX6fP7EJhylvOP4pihczVMeMNR1Aj0wrHOFg19Swn/Yyy01V6M8vLPiCy10pe2 yIlt1466utkhspApmhpTzrR1bHHWFCBhIfullnq5CDOMlRKWabCG++tnWWCLS8r5J3 HPq90Oe1MLJZo0AqprqpsF8J2inrzO+AhbiAx6sa5y6V/NtRifhMtfJcVAnPKm4zyM IiNv6spF3XPhzwA2zM+Gfpk9k4KFYKA5x7QtYWetsj1JOB72Kocud4ixmYufO4wZKu 237DiM4qlZAH0OgpQgpf2WE6d+Z1hFl7MAj69mt2mbIcwXxud5JvDZwP0WNwhgzAXF rToR5tAqXgFOA== From: "Teddy Astie" Subject: =?utf-8?Q?[XEN=20RFC=20PATCH=20v6=2006/11]=20vtd:=20Remove=20MAP=5FERROR=5FRECOVERY=20code=20path=20in=20domain=5Fcontext=5Fmapping=5Fone?= X-Mailer: git-send-email 2.47.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1739787501062 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Jan Beulich" , "Andrew Cooper" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" Message-Id: In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.b78efb45eff9473c8831e3b16f38586e?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20250217:md Date: Mon, 17 Feb 2025 10:18:21 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @mandrillapp.com) (identity teddy.astie@vates.tech) X-ZM-MESSAGEID: 1739787518580019000 Content-Type: text/plain; charset="utf-8" This logic is almost never called as the only possible failures are - no memory to allocate the pagetable (if it isn't already allocated) this is fixed in this patch serie by ensuring that the pagetable is alloc= ated when entering this function - EILSEQ when there is a race condtion with hardware, which should not happ= en under normal circonstances Remove this logic to simplify the error management of the function. Signed-off-by: Teddy Astie --- xen/drivers/passthrough/vtd/iommu.c | 20 -------------------- xen/drivers/passthrough/vtd/vtd.h | 3 +-- 2 files changed, 1 insertion(+), 22 deletions(-) diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/= vtd/iommu.c index 55562084fc..852994cf97 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1621,26 +1621,6 @@ int domain_context_mapping_one( if ( !seg && !rc ) rc =3D me_wifi_quirk(domain, bus, devfn, domid, pgd_maddr, mode); =20 - if ( rc && !(mode & MAP_ERROR_RECOVERY) ) - { - if ( !prev_dom || - /* - * Unmapping here means DEV_TYPE_PCI devices with RMRRs (if s= uch - * exist) would cause problems if such a region was actually - * accessed. - */ - (prev_dom =3D=3D dom_io && !pdev) ) - ret =3D domain_context_unmap_one(domain, iommu, bus, devfn); - else - ret =3D domain_context_mapping_one(prev_dom, ctx, iommu, bus, = devfn, pdev, - prev_dom->domain_id, - iommu_default_context(prev_do= m)->arch.vtd.pgd_maddr, - (mode & MAP_WITH_RMRR) | - MAP_ERROR_RECOVERY) < 0; - - if ( !ret && pdev && pdev->devfn =3D=3D devfn ) - check_cleanup_domid_map(domain, pdev, iommu); - } =20 if ( prev_dom ) rcu_unlock_domain(prev_dom); diff --git a/xen/drivers/passthrough/vtd/vtd.h b/xen/drivers/passthrough/vt= d/vtd.h index b95124517b..72aa9a70c9 100644 --- a/xen/drivers/passthrough/vtd/vtd.h +++ b/xen/drivers/passthrough/vtd/vtd.h @@ -28,8 +28,7 @@ */ #define MAP_WITH_RMRR (1u << 0) #define MAP_OWNER_DYING (1u << 1) -#define MAP_ERROR_RECOVERY (1u << 2) -#define UNMAP_ME_PHANTOM_FUNC (1u << 3) +#define UNMAP_ME_PHANTOM_FUNC (1u << 2) =20 /* Allow for both IOAPIC and IOSAPIC. */ #define IO_xAPIC_route_entry IO_APIC_route_entry --=20 2.47.2 Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sat Nov 1 23:25:40 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=quarantine dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1739787531; cv=none; d=zohomail.com; s=zohoarc; b=BGy1E0hB0Jllx0WP2q2j0PaXMkM6urUrLPQcNP91BuEoBSI8uvjdUWPITsM7k2MpE4IMC8GV+hB2O3xU1yAcEtNZDS7fgzeTMHqjQyooYhSt8+ce3SAuZvMnMHSDWkunr/zHSJE/Zgh9uu8hC787CnpuCeX7Z9zNVG2xyGBNIOU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1739787531; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=7v5xoQ3DFj0kUTvNDQbvt8cOPjYMlC78Cmix+Q2h9Q8=; b=m1IJS1h+v5VIZL+/4ZQqVR/2YHCmJn4LalSEUI7TJC2737RNqiZWvKmiYUp9Ig/O2ackFKnNGhX+WXa39zZFUGYLCjWO0X6R9/8BYSMjUU0ovRHfkeg1EbOGdPvH+9evs2PaMDKtVjHMuuGoP556qttEt6d1aI2mGfSMvvsATo8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1739787531303472.1699132952285; Mon, 17 Feb 2025 02:18:51 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.890006.1299070 (Exim 4.92) (envelope-from ) id 1tjyCp-0000yq-DN; Mon, 17 Feb 2025 10:18:27 +0000 Received: by outflank-mailman (output) from mailman id 890006.1299070; Mon, 17 Feb 2025 10:18:27 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCp-0000xS-4o; Mon, 17 Feb 2025 10:18:27 +0000 Received: by outflank-mailman (input) for mailman id 890006; Mon, 17 Feb 2025 10:18:25 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCn-0008Nl-LQ for xen-devel@lists.xenproject.org; Mon, 17 Feb 2025 10:18:25 +0000 Received: from mail178-27.suw51.mandrillapp.com (mail178-27.suw51.mandrillapp.com [198.2.178.27]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 84f67d9f-ed18-11ef-9aa6-95dc52dad729; Mon, 17 Feb 2025 11:18:24 +0100 (CET) Received: from pmta13.mandrill.prod.suw01.rsglab.com (localhost [127.0.0.1]) by mail178-27.suw51.mandrillapp.com (Mailchimp) with ESMTP id 4YxJWy3Z94z6CPyQ6 for ; Mon, 17 Feb 2025 10:18:22 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id 042407c48e3e4b35b078042d33e14209; Mon, 17 Feb 2025 10:18:22 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 84f67d9f-ed18-11ef-9aa6-95dc52dad729 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1739787502; x=1740057502; bh=7v5xoQ3DFj0kUTvNDQbvt8cOPjYMlC78Cmix+Q2h9Q8=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=pP8tmNwoO9QjOGqQP6wLLkpW87FdWqXVPkIm0A6UKd/obodqtNBHc7kV1Rl1Xa+MJ C4J/NqXWCiPzc2J8jyeW8CoMWM8T+dRwipjbi7xS6LDzZOhSWlCFGxIqYwHcqv5D0i Tb0FlmIEmdASuk0Jd/uPJP2zIWqyAV00L3AqlLKIOjnQ2Qsm94JWqtvuhzHTPc7nCi U5FvKSQxaoa1cdJbgQK2BTeq/4J7346JPlWeUoXtv6vWLZ4xutCCItAO+cwQ7DUuZ/ 6w3gJG6nARHvmg91IcK8EdXGPaJim0bkVFlziBULZfDwsMKv701S03+QY8h1QBvXch r/BI2xH15Ql+A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1739787502; x=1740048002; i=teddy.astie@vates.tech; bh=7v5xoQ3DFj0kUTvNDQbvt8cOPjYMlC78Cmix+Q2h9Q8=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=CJSuY/ZaxyhhclGF/WSNTEP6Fm1cQP9sPtZf9c+OCQEe6LhxXL4lKrh+HrFwET3k3 o8ccIW88Khmy+l3Uhmhf+pLZot7HjxpQ7ZQXIkFBDNXfIBrnjufbIqLI81PAbodCQ1 aUcbziybd/x2Ed/D0On3AJpOCJZ3BY1ldr68Ivb1wynU4q3OE/fqwHuYIsCEpWCOJW /J2+HXIxhAVJZPvqvXJQ6U4JUNqOZuknhosRZXV6mwkFYhu8QzCeIMmoYsaA8NWR2B y73G7xW0mx4yaQ098yCf/VjmhCoYHi0ZKtW0r1FD6xn97wfTMIYubaSd9BXH0UtGKi lKpC/C0VzMuQA== From: "Teddy Astie" Subject: =?utf-8?Q?[XEN=20RFC=20PATCH=20v6=2007/11]=20iommu:=20Simplify=20hardware=20did=20management?= X-Mailer: git-send-email 2.47.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1739787501248 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Jan Beulich" , "Andrew Cooper" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" Message-Id: <56ac13967ba7dfbb229c65450c79f6838a3aee9f.1739785339.git.teddy.astie@vates.tech> In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.042407c48e3e4b35b078042d33e14209?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20250217:md Date: Mon, 17 Feb 2025 10:18:22 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity teddy.astie@vates.tech) (identity @mandrillapp.com) X-ZM-MESSAGEID: 1739787532722019000 Content-Type: text/plain; charset="utf-8" Simplify the hardware DID management by allocating a DID per IOMMU context (currently Xen domain) instead of trying to reuse Xen domain DID (which may not be possible depending on hardware constraints like did limits). Signed-off-by: Teddy Astie --- xen/arch/x86/include/asm/iommu.h | 5 +- xen/drivers/passthrough/amd/iommu.h | 3 + xen/drivers/passthrough/amd/iommu_cmd.c | 4 +- xen/drivers/passthrough/amd/iommu_init.c | 3 +- xen/drivers/passthrough/vtd/extern.h | 2 - xen/drivers/passthrough/vtd/iommu.c | 335 +++++------------------ xen/drivers/passthrough/vtd/iommu.h | 2 - xen/drivers/passthrough/vtd/qinval.c | 2 +- xen/drivers/passthrough/x86/iommu.c | 27 +- 9 files changed, 89 insertions(+), 294 deletions(-) diff --git a/xen/arch/x86/include/asm/iommu.h b/xen/arch/x86/include/asm/io= mmu.h index 94513ba9dc..d20c3cda59 100644 --- a/xen/arch/x86/include/asm/iommu.h +++ b/xen/arch/x86/include/asm/iommu.h @@ -45,12 +45,15 @@ struct arch_iommu_context /* Intel VT-d */ struct { uint64_t pgd_maddr; /* io page directory machine address */ - unsigned long *iommu_bitmap; /* bitmap of iommu(s) that the co= ntext uses */ + domid_t *didmap; /* per-iommu DID (valid only if related iommu= _dev_cnt > 0) */ + unsigned long *iommu_dev_cnt; /* counter of devices per iommu = */ } vtd; /* AMD IOMMU */ struct { unsigned int paging_mode; struct page_info *root_table; + domid_t *didmap; /* per-iommu DID (valid only if related iommu= _dev_cnt > 0) */ + unsigned long *iommu_dev_cnt; /* counter of devices per iommu = */ } amd; }; }; diff --git a/xen/drivers/passthrough/amd/iommu.h b/xen/drivers/passthrough/= amd/iommu.h index 6095bc6a21..dbe427ed27 100644 --- a/xen/drivers/passthrough/amd/iommu.h +++ b/xen/drivers/passthrough/amd/iommu.h @@ -35,6 +35,7 @@ =20 #define iommu_found() (!list_empty(&amd_iommu_head)) =20 +extern unsigned int nr_amd_iommus; extern struct list_head amd_iommu_head; =20 typedef struct event_entry @@ -106,6 +107,8 @@ struct amd_iommu { =20 int enabled; =20 + unsigned int index; + struct list_head ats_devices; }; =20 diff --git a/xen/drivers/passthrough/amd/iommu_cmd.c b/xen/drivers/passthro= ugh/amd/iommu_cmd.c index 83c525b84f..e1a252db93 100644 --- a/xen/drivers/passthrough/amd/iommu_cmd.c +++ b/xen/drivers/passthrough/amd/iommu_cmd.c @@ -331,11 +331,13 @@ static void _amd_iommu_flush_pages(struct domain *d, daddr_t daddr, unsigned int order) { struct amd_iommu *iommu; - unsigned int dom_id =3D d->domain_id; + struct iommu_context *ctx =3D iommu_default_context(d); =20 /* send INVALIDATE_IOMMU_PAGES command */ for_each_amd_iommu ( iommu ) { + domid_t dom_id =3D ctx->arch.amd.didmap[iommu->index]; + invalidate_iommu_pages(iommu, daddr, dom_id, order); flush_command_buffer(iommu, 0); } diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthr= ough/amd/iommu_init.c index 41e241ccc8..333d5d5e39 100644 --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -23,7 +23,7 @@ =20 #include "iommu.h" =20 -static int __initdata nr_amd_iommus; +unsigned int nr_amd_iommus =3D 0; static bool __initdata pci_init; =20 static struct tasklet amd_iommu_irq_tasklet; @@ -919,6 +919,7 @@ static void enable_iommu(struct amd_iommu *iommu) set_iommu_translation_control(iommu, IOMMU_CONTROL_ENABLED); =20 iommu->enabled =3D 1; + iommu->index =3D nr_amd_iommus; =20 spin_unlock_irqrestore(&iommu->lock, flags); =20 diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough= /vtd/extern.h index 3dcb77c711..82db8f9435 100644 --- a/xen/drivers/passthrough/vtd/extern.h +++ b/xen/drivers/passthrough/vtd/extern.h @@ -45,8 +45,6 @@ void disable_intremap(struct vtd_iommu *iommu); int iommu_alloc(struct acpi_drhd_unit *drhd); void iommu_free(struct acpi_drhd_unit *drhd); =20 -domid_t did_to_domain_id(const struct vtd_iommu *iommu, unsigned int did); - int iommu_flush_iec_global(struct vtd_iommu *iommu); int iommu_flush_iec_index(struct vtd_iommu *iommu, u8 im, u16 iidx); void clear_fault_bits(struct vtd_iommu *iommu); diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/= vtd/iommu.c index 852994cf97..34b2a287f7 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -63,50 +63,6 @@ static struct tasklet vtd_fault_tasklet; static int cf_check setup_hwdom_device(u8 devfn, struct pci_dev *); static void setup_hwdom_rmrr(struct domain *d); =20 -static bool domid_mapping(const struct vtd_iommu *iommu) -{ - return (const void *)iommu->domid_bitmap !=3D (const void *)iommu->dom= id_map; -} - -static domid_t convert_domid(const struct vtd_iommu *iommu, domid_t domid) -{ - /* - * While we need to avoid DID 0 for caching-mode IOMMUs, maintain - * the property of the transformation being the same in either - * direction. By clipping to 16 bits we ensure that the resulting - * DID will fit in the respective context entry field. - */ - BUILD_BUG_ON(DOMID_MASK >=3D 0xffff); - - return !cap_caching_mode(iommu->cap) ? domid : ~domid; -} - -static int get_iommu_did(domid_t domid, const struct vtd_iommu *iommu, - bool warn) -{ - unsigned int nr_dom, i; - - if ( !domid_mapping(iommu) ) - return convert_domid(iommu, domid); - - nr_dom =3D cap_ndoms(iommu->cap); - i =3D find_first_bit(iommu->domid_bitmap, nr_dom); - while ( i < nr_dom ) - { - if ( iommu->domid_map[i] =3D=3D domid ) - return i; - - i =3D find_next_bit(iommu->domid_bitmap, nr_dom, i + 1); - } - - if ( warn ) - dprintk(XENLOG_ERR VTDPREFIX, - "No valid iommu %u domid for Dom%d\n", - iommu->index, domid); - - return -1; -} - #define DID_FIELD_WIDTH 16 #define DID_HIGH_OFFSET 8 =20 @@ -117,127 +73,17 @@ static int get_iommu_did(domid_t domid, const struct = vtd_iommu *iommu, static int context_set_domain_id(struct context_entry *context, domid_t domid, struct vtd_iommu *iommu) { - unsigned int i; - ASSERT(pcidevs_locked()); =20 - if ( domid_mapping(iommu) ) - { - unsigned int nr_dom =3D cap_ndoms(iommu->cap); - - i =3D find_first_bit(iommu->domid_bitmap, nr_dom); - while ( i < nr_dom && iommu->domid_map[i] !=3D domid ) - i =3D find_next_bit(iommu->domid_bitmap, nr_dom, i + 1); - - if ( i >=3D nr_dom ) - { - i =3D find_first_zero_bit(iommu->domid_bitmap, nr_dom); - if ( i >=3D nr_dom ) - { - dprintk(XENLOG_ERR VTDPREFIX, "IOMMU: no free domain id\n"= ); - return -EBUSY; - } - iommu->domid_map[i] =3D domid; - set_bit(i, iommu->domid_bitmap); - } - } - else - i =3D convert_domid(iommu, domid); - if ( context ) { context->hi &=3D ~(((1 << DID_FIELD_WIDTH) - 1) << DID_HIGH_OFFSET= ); - context->hi |=3D (i & ((1 << DID_FIELD_WIDTH) - 1)) << DID_HIGH_OF= FSET; + context->hi |=3D (domid & ((1 << DID_FIELD_WIDTH) - 1)) << DID_HIG= H_OFFSET; } =20 return 0; } =20 -static void cleanup_domid_map(domid_t domid, struct vtd_iommu *iommu) -{ - int iommu_domid; - - if ( !domid_mapping(iommu) ) - return; - - iommu_domid =3D get_iommu_did(domid, iommu, false); - - if ( iommu_domid >=3D 0 ) - { - /* - * Update domid_map[] /before/ domid_bitmap[] to avoid a race with - * context_set_domain_id(), setting the slot to DOMID_INVALID for - * did_to_domain_id() to return a suitable value while the bit is - * still set. - */ - iommu->domid_map[iommu_domid] =3D DOMID_INVALID; - clear_bit(iommu_domid, iommu->domid_bitmap); - } -} - -static bool any_pdev_behind_iommu(const struct domain *d, - const struct pci_dev *exclude, - const struct vtd_iommu *iommu) -{ - const struct pci_dev *pdev; - - for_each_pdev ( d, pdev ) - { - const struct acpi_drhd_unit *drhd; - - if ( pdev =3D=3D exclude ) - continue; - - drhd =3D acpi_find_matched_drhd_unit(pdev); - if ( drhd && drhd->iommu =3D=3D iommu ) - return true; - } - - return false; -} - -/* - * If no other devices under the same iommu owned by this domain, - * clear iommu in iommu_bitmap and clear domain_id in domid_bitmap. - */ -static void check_cleanup_domid_map(const struct domain *d, - const struct pci_dev *exclude, - struct vtd_iommu *iommu) -{ - bool found; - - if ( d =3D=3D dom_io ) - return; - - found =3D any_pdev_behind_iommu(d, exclude, iommu); - /* - * Hidden devices are associated with DomXEN but usable by the hardware - * domain. Hence they need considering here as well. - */ - if ( !found && is_hardware_domain(d) ) - found =3D any_pdev_behind_iommu(dom_xen, exclude, iommu); - - if ( !found ) - { - clear_bit(iommu->index, iommu_default_context(d)->arch.vtd.iommu_b= itmap); - cleanup_domid_map(d->domain_id, iommu); - } -} - -domid_t did_to_domain_id(const struct vtd_iommu *iommu, unsigned int did) -{ - if ( did >=3D cap_ndoms(iommu->cap) ) - return DOMID_INVALID; - - if ( !domid_mapping(iommu) ) - return convert_domid(iommu, did); - - if ( !test_bit(did, iommu->domid_bitmap) ) - return DOMID_INVALID; - - return iommu->domid_map[did]; -} - /* Allocate page table, return its machine address */ uint64_t alloc_pgtable_maddr(unsigned long npages, nodeid_t node) { @@ -754,13 +600,11 @@ static int __must_check cf_check iommu_flush_iotlb(st= ruct domain *d, dfn_t dfn, =20 iommu =3D drhd->iommu; =20 - if ( !test_bit(iommu->index, ctx->arch.vtd.iommu_bitmap) ) + if ( !ctx->arch.vtd.iommu_dev_cnt[iommu->index] ) continue; =20 flush_dev_iotlb =3D !!find_ats_dev_drhd(iommu); - iommu_domid =3D get_iommu_did(d->domain_id, iommu, !d->is_dying); - if ( iommu_domid =3D=3D -1 ) - continue; + iommu_domid =3D ctx->arch.vtd.didmap[iommu->index]; =20 if ( !page_count || (page_count & (page_count - 1)) || dfn_eq(dfn, INVALID_DFN) || !IS_ALIGNED(dfn_x(dfn), page_coun= t) ) @@ -1257,7 +1101,6 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd) { struct vtd_iommu *iommu; unsigned int sagaw, agaw =3D 0, nr_dom; - domid_t reserved_domid =3D DOMID_INVALID; int rc; =20 iommu =3D xzalloc(struct vtd_iommu); @@ -1346,43 +1189,16 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd) if ( !ecap_coherent(iommu->ecap) ) iommu_non_coherent =3D true; =20 - if ( nr_dom <=3D DOMID_MASK * 2 + cap_caching_mode(iommu->cap) ) - { - /* Allocate domain id (bit) maps. */ - iommu->domid_bitmap =3D xzalloc_array(unsigned long, - BITS_TO_LONGS(nr_dom)); - iommu->domid_map =3D xzalloc_array(domid_t, nr_dom); - rc =3D -ENOMEM; - if ( !iommu->domid_bitmap || !iommu->domid_map ) - goto free; - - /* - * If Caching mode is set, then invalid translations are tagged - * with domain id 0. Hence reserve bit/slot 0. - */ - if ( cap_caching_mode(iommu->cap) ) - { - iommu->domid_map[0] =3D DOMID_INVALID; - __set_bit(0, iommu->domid_bitmap); - } - } - else - { - /* Don't leave dangling NULL pointers. */ - iommu->domid_bitmap =3D ZERO_BLOCK_PTR; - iommu->domid_map =3D ZERO_BLOCK_PTR; - - /* - * If Caching mode is set, then invalid translations are tagged - * with domain id 0. Hence reserve the ID taking up bit/slot 0. - */ - reserved_domid =3D convert_domid(iommu, 0) ?: DOMID_INVALID; - } + /* Allocate domain id (bit) maps. */ + iommu->domid_bitmap =3D xzalloc_array(unsigned long, + BITS_TO_LONGS(nr_dom)); =20 - iommu->pseudo_domid_map =3D iommu_init_domid(reserved_domid); - rc =3D -ENOMEM; - if ( !iommu->pseudo_domid_map ) - goto free; + /* + * If Caching mode is set, then invalid translations are tagged + * with domain id 0. Hence reserve bit/slot 0. + */ + if ( cap_caching_mode(iommu->cap) ) + __set_bit(0, iommu->domid_bitmap); =20 return 0; =20 @@ -1410,8 +1226,6 @@ void __init iommu_free(struct acpi_drhd_unit *drhd) iounmap(iommu->reg); =20 xfree(iommu->domid_bitmap); - xfree(iommu->domid_map); - xfree(iommu->pseudo_domid_map); =20 if ( iommu->msi.irq >=3D 0 ) destroy_irq(iommu->msi.irq); @@ -1425,19 +1239,39 @@ void __init iommu_free(struct acpi_drhd_unit *drhd) agaw =3D 64; \ agaw; }) =20 -static int cf_check intel_iommu_domain_init(struct domain *d) +static int cf_check intel_iommu_context_init(struct domain *d, struct iomm= u_context *ctx) { - struct domain_iommu *hd =3D dom_iommu(d); - struct iommu_context *ctx =3D iommu_default_context(d); + struct acpi_drhd_unit *drhd; =20 - ctx->arch.vtd.iommu_bitmap =3D xzalloc_array(unsigned long, - BITS_TO_LONGS(nr_iommus)); - if ( !ctx->arch.vtd.iommu_bitmap ) + ctx->arch.vtd.didmap =3D xzalloc_array(domid_t, nr_iommus); + if ( !ctx->arch.vtd.didmap ) return -ENOMEM; =20 + ctx->arch.vtd.iommu_dev_cnt =3D xzalloc_array(unsigned long, nr_iommus= ); + if ( !ctx->arch.vtd.iommu_dev_cnt ) + { + xfree(ctx->arch.vtd.didmap); + return -ENOMEM; + } + + // TODO: Allocate IOMMU domid only when attaching devices ? + /* Populate context DID map using pseudo DIDs */ + for_each_drhd_unit(drhd) + { + ctx->arch.vtd.didmap[drhd->iommu->index] =3D + iommu_alloc_domid(drhd->iommu->domid_bitmap); + } + + return arch_iommu_context_init(d, ctx, 0); +} + +static int cf_check intel_iommu_domain_init(struct domain *d) +{ + struct domain_iommu *hd =3D dom_iommu(d); + hd->arch.vtd.agaw =3D width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH); =20 - return 0; + return intel_iommu_context_init(d, iommu_default_context(d)); } =20 static void __hwdom_init cf_check intel_iommu_hwdom_init(struct domain *d) @@ -1481,11 +1315,11 @@ int domain_context_mapping_one( struct context_entry *context, *context_entries, lctxt; __uint128_t res, old; uint64_t maddr; - uint16_t seg =3D iommu->drhd->segment, prev_did =3D 0; - struct domain *prev_dom =3D NULL; + uint16_t seg =3D iommu->drhd->segment, prev_did =3D 0, did; int rc, ret; - bool flush_dev_iotlb; + bool flush_dev_iotlb, overwrite_entry =3D false; =20 + struct iommu_context *prev_ctx =3D pdev->domain ? iommu_default_contex= t(pdev->domain) : NULL; =20 ASSERT(pcidevs_locked()); spin_lock(&iommu->lock); @@ -1494,23 +1328,12 @@ int domain_context_mapping_one( context =3D &context_entries[devfn]; old =3D (lctxt =3D *context).full; =20 + did =3D ctx->arch.vtd.didmap[iommu->index]; + if ( context_present(lctxt) ) { - domid_t domid; - prev_did =3D context_domain_id(lctxt); - domid =3D did_to_domain_id(iommu, prev_did); - if ( domid < DOMID_FIRST_RESERVED ) - prev_dom =3D rcu_lock_domain_by_id(domid); - if ( !prev_dom ) - { - spin_unlock(&iommu->lock); - unmap_vtd_domain_page(context_entries); - dprintk(XENLOG_DEBUG VTDPREFIX, - "no domain for did %u (nr_dom %u)\n", - prev_did, cap_ndoms(iommu->cap)); - return -ESRCH; - } + overwrite_entry =3D true; } =20 if ( iommu_hwdom_passthrough && is_hardware_domain(domain) ) @@ -1526,11 +1349,7 @@ int domain_context_mapping_one( root =3D domain_pgd_maddr(domain, ctx, pgd_maddr, iommu->nr_pt_lev= els); if ( !root ) { - spin_unlock(&ctx->arch.mapping_lock); - spin_unlock(&iommu->lock); unmap_vtd_domain_page(context_entries); - if ( prev_dom ) - rcu_unlock_domain(prev_dom); return -ENOMEM; } =20 @@ -1543,35 +1362,13 @@ int domain_context_mapping_one( spin_unlock(&ctx->arch.mapping_lock); } =20 - rc =3D context_set_domain_id(&lctxt, domid, iommu); + rc =3D context_set_domain_id(&lctxt, did, iommu); if ( rc ) - { - unlock: - spin_unlock(&iommu->lock); - unmap_vtd_domain_page(context_entries); - if ( prev_dom ) - rcu_unlock_domain(prev_dom); - return rc; - } - - if ( !prev_dom ) - { - context_set_address_width(lctxt, level_to_agaw(iommu->nr_pt_levels= )); - context_set_fault_enable(lctxt); - context_set_present(lctxt); - } - else if ( prev_dom =3D=3D domain ) - { - ASSERT(lctxt.full =3D=3D context->full); - rc =3D !!pdev; goto unlock; - } - else - { - ASSERT(context_address_width(lctxt) =3D=3D - level_to_agaw(iommu->nr_pt_levels)); - ASSERT(!context_fault_disable(lctxt)); - } + + context_set_address_width(lctxt, level_to_agaw(iommu->nr_pt_levels)); + context_set_fault_enable(lctxt); + context_set_present(lctxt); =20 res =3D cmpxchg16b(context, &old, &lctxt.full); =20 @@ -1581,8 +1378,6 @@ int domain_context_mapping_one( */ if ( res !=3D old ) { - if ( pdev ) - check_cleanup_domid_map(domain, pdev, iommu); printk(XENLOG_ERR "%pp: unexpected context entry %016lx_%016lx (expected %01= 6lx_%016lx)\n", &PCI_SBDF(seg, bus, devfn), @@ -1596,9 +1391,9 @@ int domain_context_mapping_one( spin_unlock(&iommu->lock); =20 rc =3D iommu_flush_context_device(iommu, prev_did, PCI_BDF(bus, devfn), - DMA_CCMD_MASK_NOBIT, !prev_dom); + DMA_CCMD_MASK_NOBIT, !overwrite_entry); flush_dev_iotlb =3D !!find_ats_dev_drhd(iommu); - ret =3D iommu_flush_iotlb_dsi(iommu, prev_did, !prev_dom, flush_dev_io= tlb); + ret =3D iommu_flush_iotlb_dsi(iommu, prev_did, !overwrite_entry, flush= _dev_iotlb); =20 /* * The current logic for returns: @@ -1614,18 +1409,27 @@ int domain_context_mapping_one( if ( rc > 0 ) rc =3D 0; =20 - set_bit(iommu->index, ctx->arch.vtd.iommu_bitmap); + if ( prev_ctx ) + { + /* Don't underflow the counter. */ + BUG_ON(!prev_ctx->arch.vtd.iommu_dev_cnt[iommu->index]); + prev_ctx->arch.vtd.iommu_dev_cnt[iommu->index]--; + } + + ctx->arch.vtd.iommu_dev_cnt[iommu->index]++; =20 unmap_vtd_domain_page(context_entries); + spin_unlock(&iommu->lock); =20 if ( !seg && !rc ) rc =3D me_wifi_quirk(domain, bus, devfn, domid, pgd_maddr, mode); =20 + return rc; =20 - if ( prev_dom ) - rcu_unlock_domain(prev_dom); - - return rc ?: pdev && prev_dom; + unlock: + unmap_vtd_domain_page(context_entries); + spin_unlock(&iommu->lock); + return rc; } =20 static const struct acpi_drhd_unit *domain_context_unmap( @@ -1637,7 +1441,7 @@ static int domain_context_mapping(struct domain *doma= in, struct iommu_context *c const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); const struct acpi_rmrr_unit *rmrr; paddr_t pgd_maddr =3D ctx->arch.vtd.pgd_maddr; - domid_t did =3D domain->domain_id; + domid_t did =3D ctx->arch.vtd.didmap[drhd->iommu->index]; int ret =3D 0; unsigned int i, mode =3D 0; uint16_t seg =3D pdev->seg, bdf; @@ -1960,9 +1764,10 @@ static void cf_check iommu_domain_teardown(struct do= main *d) ASSERT(!ctx->arch.vtd.pgd_maddr); =20 for_each_drhd_unit ( drhd ) - cleanup_domid_map(d->domain_id, drhd->iommu); + iommu_free_domid(d->domain_id, drhd->iommu->domid_bitmap); =20 - XFREE(ctx->arch.vtd.iommu_bitmap); + XFREE(ctx->arch.vtd.iommu_dev_cnt); + XFREE(ctx->arch.vtd.didmap); } =20 static void quarantine_teardown(struct pci_dev *pdev, diff --git a/xen/drivers/passthrough/vtd/iommu.h b/xen/drivers/passthrough/= vtd/iommu.h index 29d350b23d..77edfa3587 100644 --- a/xen/drivers/passthrough/vtd/iommu.h +++ b/xen/drivers/passthrough/vtd/iommu.h @@ -506,9 +506,7 @@ struct vtd_iommu { } flush; =20 struct list_head ats_devices; - unsigned long *pseudo_domid_map; /* "pseudo" domain id bitmap */ unsigned long *domid_bitmap; /* domain id bitmap */ - domid_t *domid_map; /* domain id mapping array */ uint32_t version; }; =20 diff --git a/xen/drivers/passthrough/vtd/qinval.c b/xen/drivers/passthrough= /vtd/qinval.c index 036f3e8505..3f25b6a2e0 100644 --- a/xen/drivers/passthrough/vtd/qinval.c +++ b/xen/drivers/passthrough/vtd/qinval.c @@ -229,7 +229,7 @@ static int __must_check dev_invalidate_sync(struct vtd_= iommu *iommu, rc =3D queue_invalidate_wait(iommu, 0, 1, 1, 1); if ( rc =3D=3D -ETIMEDOUT && !pdev->broken ) { - struct domain *d =3D rcu_lock_domain_by_id(did_to_domain_id(iommu,= did)); + struct domain *d =3D rcu_lock_domain(pdev->domain); =20 /* * In case the domain has been freed or the IOMMU domid bitmap is diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/= x86/iommu.c index a444e5813e..730a75e628 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -555,9 +555,6 @@ unsigned long *__init iommu_init_domid(domid_t reserve) { unsigned long *map; =20 - if ( !iommu_quarantine ) - return ZERO_BLOCK_PTR; - BUILD_BUG_ON(DOMID_MASK * 2U >=3D UINT16_MAX); =20 map =3D xzalloc_array(unsigned long, BITS_TO_LONGS(UINT16_MAX - DOMID_= MASK)); @@ -572,36 +569,24 @@ unsigned long *__init iommu_init_domid(domid_t reserv= e) =20 domid_t iommu_alloc_domid(unsigned long *map) { - /* - * This is used uniformly across all IOMMUs, such that on typical - * systems we wouldn't re-use the same ID very quickly (perhaps never). - */ - static unsigned int start; - unsigned int idx =3D find_next_zero_bit(map, UINT16_MAX - DOMID_MASK, = start); + /* TODO: Consider nr_doms ? */ + unsigned int idx =3D find_next_zero_bit(map, UINT16_MAX, 0); =20 - ASSERT(pcidevs_locked()); - - if ( idx >=3D UINT16_MAX - DOMID_MASK ) - idx =3D find_first_zero_bit(map, UINT16_MAX - DOMID_MASK); - if ( idx >=3D UINT16_MAX - DOMID_MASK ) - return DOMID_INVALID; + if ( idx >=3D UINT16_MAX ) + return UINT16_MAX; =20 __set_bit(idx, map); =20 - start =3D idx + 1; - - return idx | (DOMID_MASK + 1); + return idx; } =20 void iommu_free_domid(domid_t domid, unsigned long *map) { ASSERT(pcidevs_locked()); =20 - if ( domid =3D=3D DOMID_INVALID ) + if ( domid =3D=3D UINT16_MAX ) return; =20 - ASSERT(domid > DOMID_MASK); - if ( !__test_and_clear_bit(domid & DOMID_MASK, map) ) BUG(); } --=20 2.47.2 Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sat Nov 1 23:25:40 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=quarantine dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1739787544; cv=none; d=zohomail.com; s=zohoarc; b=j9buR04fbckRjOTCw6r8AQNR5LN6lim3Kj4s2wNVA7VreUA+mPwI6+uBqAkxZ/dpVjW3f1x39INGLKf8+W8upfv+L9Mfw9J0kvLxIS5tASINQY84aY8Ds7A5UxR5+4QVOtfYd+FjedqebUX9G9GrrnbhHSDP8bJyIz+eS+PoeKY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1739787544; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=VNWhXRexRSFadx2+ywxm8VaWNsRWqa4RZ/BNLzE749M=; b=aun4kLyX6mP9K59fj3jH3WgvWL+Ga7ZjYtJiFA3XDn5LohZPGVPgu+99ZBDDMRTrxNXt/kNwp8FyvFf1x2C8iH/K4Cml0AOsMOpCHF1/SzDSF5sesmVrqY1lMrPihMJfP6cY2UkdmUqDq2yHDw+Ci9iZ+o5ZfHWMF5UgET2ZoWE= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1739787544158291.38688071096794; Mon, 17 Feb 2025 02:19:04 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.890012.1299135 (Exim 4.92) (envelope-from ) id 1tjyCx-00032V-SY; Mon, 17 Feb 2025 10:18:35 +0000 Received: by outflank-mailman (output) from mailman id 890012.1299135; Mon, 17 Feb 2025 10:18:35 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCx-00031p-K2; Mon, 17 Feb 2025 10:18:35 +0000 Received: by outflank-mailman (input) for mailman id 890012; Mon, 17 Feb 2025 10:18:32 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCt-0008Nl-9v for xen-devel@lists.xenproject.org; Mon, 17 Feb 2025 10:18:31 +0000 Received: from mail178-27.suw51.mandrillapp.com (mail178-27.suw51.mandrillapp.com [198.2.178.27]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 8693d565-ed18-11ef-9aa6-95dc52dad729; Mon, 17 Feb 2025 11:18:26 +0100 (CET) Received: from pmta13.mandrill.prod.suw01.rsglab.com (localhost [127.0.0.1]) by mail178-27.suw51.mandrillapp.com (Mailchimp) with ESMTP id 4YxJX01rRLz6CPyR1 for ; Mon, 17 Feb 2025 10:18:24 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id 5e1117d3f181451da3627500a075607f; Mon, 17 Feb 2025 10:18:24 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 8693d565-ed18-11ef-9aa6-95dc52dad729 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1739787504; x=1740057504; bh=VNWhXRexRSFadx2+ywxm8VaWNsRWqa4RZ/BNLzE749M=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=jHLBqqBPhjxeULuf0xHi8iYI2pNbOheGKUQwrhT+DnIJWHxrjpsFC6ZJe8Z6xC0CB PSHEgjWwkajHpGu+h0AIvg0gJntzmcJ1dkT5bJXK2bKtEGIyoB+DYjuXPvnCsG3p/1 5G0NHX2lKtzYHBySwdjmOLfxASFzFzsgV3GvGzAzfLQQ8Sw+Sw7UPYb4rzjk5gPuKQ RxH+aTXfRb6SZOTns11feGsYxO6HTMARvI09XA18UM9CASXgAKq2j0SO1gw/KodLeP iIh9upkpf3YeAAemp+NYHL7y9OFsk5Ylueig6ZP2aw1HfEOZJSi6ly1wrQ2AsTb9w0 1rNtiHOro9Faw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1739787504; x=1740048004; i=teddy.astie@vates.tech; bh=VNWhXRexRSFadx2+ywxm8VaWNsRWqa4RZ/BNLzE749M=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=fTburftRHr7F0+2rTZkZlBJLdYe9R9nako5QKs/sVcXSSWSkfXC+wpwH6CA2w2ROX l0zGnjkiNJEvKjdV1LoUEJ1U79kVnhG325W0u2km/YQMmQPC2smb3vF5s1866R8/JJ wgkiQrcQg0kv4xwMZk3pZZJH4lwIY+tjnurDYAKSkP0onee4ge0gpgsd/1KfW2W6io I8/dK9db0i/JmAl6CJjQuLisVrJ2Df1Ft4An0HDFUZ8ohiZYoWfvWZf8g5JQZJoRBn wgiVKdKIYGh/I0m5S3lZAiTKqhEjzuZXQu5cBijJ1H7FLYjG1sTCUvEhwZ6NJwCIu/ WLlq9+ZkNPbmA== From: "Teddy Astie" Subject: =?utf-8?Q?[XEN=20RFC=20PATCH=20v6=2008/11]=20iommu:=20Introduce=20redesigned=20IOMMU=20subsystem?= X-Mailer: git-send-email 2.47.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1739787501552 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Jan Beulich" , "Andrew Cooper" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" , "Anthony PERARD" , "Michal Orzel" , "Julien Grall" , "Stefano Stabellini" Message-Id: <4bd97f512f521be425dbbdd6c2e2f9787cbe2a82.1739785339.git.teddy.astie@vates.tech> In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.5e1117d3f181451da3627500a075607f?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20250217:md Date: Mon, 17 Feb 2025 10:18:24 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity teddy.astie@vates.tech) (identity @mandrillapp.com) X-ZM-MESSAGEID: 1739787544691019000 Content-Type: text/plain; charset="utf-8" Introduce the changes proposed in docs/designs/iommu-context.md. Signed-off-by: Teddy Astie --- This patch is still quite large but I am not sure how to split it further. --- xen/arch/x86/include/asm/iommu.h | 8 +- xen/arch/x86/mm/p2m-ept.c | 2 +- xen/arch/x86/pv/dom0_build.c | 6 +- xen/common/memory.c | 4 +- xen/drivers/passthrough/amd/iommu.h | 13 +- xen/drivers/passthrough/amd/iommu_cmd.c | 20 +- xen/drivers/passthrough/amd/iommu_init.c | 2 +- xen/drivers/passthrough/amd/iommu_map.c | 52 +- xen/drivers/passthrough/amd/pci_amd_iommu.c | 297 +++--- xen/drivers/passthrough/iommu.c | 622 ++++++++++- xen/drivers/passthrough/pci.c | 397 +++---- xen/drivers/passthrough/vtd/extern.h | 17 +- xen/drivers/passthrough/vtd/iommu.c | 1048 ++++++++----------- xen/drivers/passthrough/vtd/quirks.c | 22 +- xen/drivers/passthrough/x86/iommu.c | 153 ++- xen/include/xen/iommu.h | 93 +- xen/include/xen/pci.h | 3 + 17 files changed, 1538 insertions(+), 1221 deletions(-) diff --git a/xen/arch/x86/include/asm/iommu.h b/xen/arch/x86/include/asm/io= mmu.h index d20c3cda59..654a07b9b2 100644 --- a/xen/arch/x86/include/asm/iommu.h +++ b/xen/arch/x86/include/asm/iommu.h @@ -2,10 +2,12 @@ #ifndef __ARCH_X86_IOMMU_H__ #define __ARCH_X86_IOMMU_H__ =20 +#include #include #include #include #include +#include #include #include #include @@ -39,18 +41,16 @@ struct arch_iommu_context struct list_head identity_maps; =20 =20 - spinlock_t mapping_lock; /* io page table lock */ - union { /* Intel VT-d */ struct { uint64_t pgd_maddr; /* io page directory machine address */ domid_t *didmap; /* per-iommu DID (valid only if related iommu= _dev_cnt > 0) */ unsigned long *iommu_dev_cnt; /* counter of devices per iommu = */ + uint32_t superpage_progress; /* superpage progress during tear= down */ } vtd; /* AMD IOMMU */ struct { - unsigned int paging_mode; struct page_info *root_table; domid_t *didmap; /* per-iommu DID (valid only if related iommu= _dev_cnt > 0) */ unsigned long *iommu_dev_cnt; /* counter of devices per iommu = */ @@ -72,7 +72,7 @@ struct arch_iommu struct { unsigned int paging_mode; struct guest_iommu *g_iommu; - }; + } amd; }; }; =20 diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index 0cf6818c13..0cf5d3c323 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -978,7 +978,7 @@ out: rc =3D iommu_iotlb_flush(d, _dfn(gfn), 1ul << order, (iommu_flags ? IOMMU_FLUSHF_added : 0) | (vtd_pte_present ? IOMMU_FLUSHF_modified - : 0)); + : 0), 0); else if ( need_iommu_pt_sync(d) ) rc =3D iommu_flags ? iommu_legacy_map(d, _dfn(gfn), mfn, 1ul << order, iommu_fl= ags) : diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c index f54d1da5c6..453fb22252 100644 --- a/xen/arch/x86/pv/dom0_build.c +++ b/xen/arch/x86/pv/dom0_build.c @@ -77,7 +77,7 @@ static __init void mark_pv_pt_pages_rdonly(struct domain = *d, * iommu_memory_setup() ended up mapping them. */ if ( need_iommu_pt_sync(d) && - iommu_unmap(d, _dfn(mfn_x(page_to_mfn(page))), 1, 0, flush_fl= ags) ) + iommu_unmap(d, _dfn(mfn_x(page_to_mfn(page))), 1, 0, flush_fl= ags, 0) ) BUG(); =20 /* Read-only mapping + PGC_allocated + page-table page. */ @@ -128,7 +128,7 @@ static void __init iommu_memory_setup(struct domain *d,= const char *what, =20 while ( (rc =3D iommu_map(d, _dfn(mfn_x(mfn)), mfn, nr, IOMMUF_readable | IOMMUF_writable | IOMMUF_pre= empt, - flush_flags)) > 0 ) + flush_flags, 0)) > 0 ) { mfn =3D mfn_add(mfn, rc); nr -=3D rc; @@ -970,7 +970,7 @@ static int __init dom0_construct(struct boot_info *bi, = struct domain *d) } =20 /* Use while() to avoid compiler warning. */ - while ( iommu_iotlb_flush_all(d, flush_flags) ) + while ( iommu_iotlb_flush_all(d, 0, flush_flags) ) break; =20 if ( initrd_len !=3D 0 ) diff --git a/xen/common/memory.c b/xen/common/memory.c index a6f2f6d1b3..acf305bcd0 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -926,7 +926,7 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_= add_to_physmap *xatp, this_cpu(iommu_dont_flush_iotlb) =3D 0; =20 ret =3D iommu_iotlb_flush(d, _dfn(xatp->idx - done), done, - IOMMU_FLUSHF_modified); + IOMMU_FLUSHF_modified, 0); if ( unlikely(ret) && rc >=3D 0 ) rc =3D ret; =20 @@ -940,7 +940,7 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_= add_to_physmap *xatp, put_page(pages[i]); =20 ret =3D iommu_iotlb_flush(d, _dfn(xatp->gpfn - done), done, - IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified= ); + IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified= , 0); if ( unlikely(ret) && rc >=3D 0 ) rc =3D ret; } diff --git a/xen/drivers/passthrough/amd/iommu.h b/xen/drivers/passthrough/= amd/iommu.h index dbe427ed27..217c1ebc7a 100644 --- a/xen/drivers/passthrough/amd/iommu.h +++ b/xen/drivers/passthrough/amd/iommu.h @@ -198,11 +198,10 @@ void amd_iommu_quarantine_teardown(struct pci_dev *pd= ev); /* mapping functions */ int __must_check cf_check amd_iommu_map_page( struct domain *d, dfn_t dfn, mfn_t mfn, unsigned int flags, - unsigned int *flush_flags); + unsigned int *flush_flags, struct iommu_context *ctx); int __must_check cf_check amd_iommu_unmap_page( struct domain *d, dfn_t dfn, unsigned int order, - unsigned int *flush_flags); -int __must_check amd_iommu_alloc_root(struct domain *d); + unsigned int *flush_flags, struct iommu_context *ctx); int amd_iommu_reserve_domain_unity_map(struct domain *d, struct iommu_cont= ext *ctx, const struct ivrs_unity_map *map, unsigned int flag); @@ -211,7 +210,7 @@ int amd_iommu_reserve_domain_unity_unmap(struct domain = *d, struct iommu_context int cf_check amd_iommu_get_reserved_device_memory( iommu_grdm_t *func, void *ctxt); int __must_check cf_check amd_iommu_flush_iotlb_pages( - struct domain *d, dfn_t dfn, unsigned long page_count, + struct domain *d, struct iommu_context *ctx, dfn_t dfn, unsigned long = page_count, unsigned int flush_flags); void amd_iommu_print_entries(const struct amd_iommu *iommu, unsigned int d= ev_id, dfn_t dfn); @@ -233,9 +232,9 @@ void iommu_dte_add_device_entry(struct amd_iommu_dte *d= te, const struct ivrs_mappings *ivrs_dev); =20 /* send cmd to iommu */ -void amd_iommu_flush_all_pages(struct domain *d); -void amd_iommu_flush_pages(struct domain *d, unsigned long dfn, - unsigned int order); +void amd_iommu_flush_all_pages(struct domain *d, struct iommu_context *ctx= ); +void amd_iommu_flush_pages(struct domain *d, struct iommu_context *ctx, + unsigned long dfn, unsigned int order); void amd_iommu_flush_iotlb(u8 devfn, const struct pci_dev *pdev, daddr_t daddr, unsigned int order); void amd_iommu_flush_device(struct amd_iommu *iommu, uint16_t bdf, diff --git a/xen/drivers/passthrough/amd/iommu_cmd.c b/xen/drivers/passthro= ugh/amd/iommu_cmd.c index e1a252db93..495e6139fd 100644 --- a/xen/drivers/passthrough/amd/iommu_cmd.c +++ b/xen/drivers/passthrough/amd/iommu_cmd.c @@ -327,19 +327,21 @@ static void amd_iommu_flush_all_iotlbs(const struct d= omain *d, daddr_t daddr, } =20 /* Flush iommu cache after p2m changes. */ -static void _amd_iommu_flush_pages(struct domain *d, +static void _amd_iommu_flush_pages(struct domain *d, struct iommu_context = *ctx, daddr_t daddr, unsigned int order) { struct amd_iommu *iommu; - struct iommu_context *ctx =3D iommu_default_context(d); =20 /* send INVALIDATE_IOMMU_PAGES command */ for_each_amd_iommu ( iommu ) { - domid_t dom_id =3D ctx->arch.amd.didmap[iommu->index]; + if ( ctx->arch.amd.iommu_dev_cnt[iommu->index] ) + { + domid_t dom_id =3D ctx->arch.amd.didmap[iommu->index]; =20 - invalidate_iommu_pages(iommu, daddr, dom_id, order); - flush_command_buffer(iommu, 0); + invalidate_iommu_pages(iommu, daddr, dom_id, order); + flush_command_buffer(iommu, 0); + } } =20 if ( ats_enabled ) @@ -355,15 +357,15 @@ static void _amd_iommu_flush_pages(struct domain *d, } } =20 -void amd_iommu_flush_all_pages(struct domain *d) +void amd_iommu_flush_all_pages(struct domain *d, struct iommu_context *ctx) { - _amd_iommu_flush_pages(d, INV_IOMMU_ALL_PAGES_ADDRESS, 0); + _amd_iommu_flush_pages(d, ctx, INV_IOMMU_ALL_PAGES_ADDRESS, 0); } =20 -void amd_iommu_flush_pages(struct domain *d, +void amd_iommu_flush_pages(struct domain *d, struct iommu_context *ctx, unsigned long dfn, unsigned int order) { - _amd_iommu_flush_pages(d, __dfn_to_daddr(dfn), order); + _amd_iommu_flush_pages(d, ctx, __dfn_to_daddr(dfn), order); } =20 void amd_iommu_flush_device(struct amd_iommu *iommu, uint16_t bdf, diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthr= ough/amd/iommu_init.c index 333d5d5e39..67235b4ce4 100644 --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -1538,7 +1538,7 @@ static void invalidate_all_domain_pages(void) =20 for_each_domain( d ) if ( is_iommu_enabled(d) ) - amd_iommu_flush_all_pages(d); + amd_iommu_flush_all_pages(d, iommu_default_context(d)); } =20 static int cf_check _invalidate_all_devices( diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthro= ugh/amd/iommu_map.c index 91d8c21048..6c3ec975ce 100644 --- a/xen/drivers/passthrough/amd/iommu_map.c +++ b/xen/drivers/passthrough/amd/iommu_map.c @@ -276,7 +276,7 @@ static int iommu_pde_from_dfn(struct domain *d, struct = iommu_context *ctx, struct domain_iommu *hd =3D dom_iommu(d); =20 table =3D ctx->arch.amd.root_table; - level =3D ctx->arch.amd.paging_mode; + level =3D hd->arch.amd.paging_mode; =20 if ( !table || target < 1 || level < target || level > 6 ) { @@ -400,21 +400,17 @@ static void queue_free_pt(struct domain *d, struct io= mmu_context *ctx, mfn_t mfn =20 int cf_check amd_iommu_map_page( struct domain *d, dfn_t dfn, mfn_t mfn, unsigned int flags, - unsigned int *flush_flags) + unsigned int *flush_flags, struct iommu_context *ctx) { struct domain_iommu *hd =3D dom_iommu(d); - struct iommu_context *ctx =3D iommu_default_context(d); unsigned int level =3D (IOMMUF_order(flags) / PTE_PER_TABLE_SHIFT) + 1; bool contig; - int rc; unsigned long pt_mfn =3D 0; union amd_iommu_pte old; =20 ASSERT((hd->platform_ops->page_sizes >> IOMMUF_order(flags)) & PAGE_SIZE_4K); =20 - spin_lock(&ctx->arch.mapping_lock); - /* * IOMMU mapping request can be safely ignored when the domain is dyin= g. * @@ -422,25 +418,11 @@ int cf_check amd_iommu_map_page( * before any page tables are freed (see iommu_free_pgtables()). */ if ( d->is_dying ) - { - spin_unlock(&ctx->arch.mapping_lock); return 0; - } - - rc =3D amd_iommu_alloc_root(d); - if ( rc ) - { - spin_unlock(&ctx->arch.mapping_lock); - AMD_IOMMU_ERROR("root table alloc failed, dfn =3D %"PRI_dfn"\n", - dfn_x(dfn)); - domain_crash(d); - return rc; - } =20 if ( iommu_pde_from_dfn(d, ctx, dfn_x(dfn), level, &pt_mfn, flush_flag= s, true) || !pt_mfn ) { - spin_unlock(&ctx->arch.mapping_lock); AMD_IOMMU_ERROR("invalid IO pagetable entry dfn =3D %"PRI_dfn"\n", dfn_x(dfn)); domain_crash(d); @@ -452,7 +434,7 @@ int cf_check amd_iommu_map_page( flags & IOMMUF_writable, flags & IOMMUF_readable, &contig); =20 - while ( unlikely(contig) && ++level < ctx->arch.amd.paging_mode ) + while ( unlikely(contig) && ++level < hd->arch.amd.paging_mode ) { struct page_info *pg =3D mfn_to_page(_mfn(pt_mfn)); unsigned long next_mfn; @@ -471,8 +453,6 @@ int cf_check amd_iommu_map_page( perfc_incr(iommu_pt_coalesces); } =20 - spin_unlock(&ctx->arch.mapping_lock); - *flush_flags |=3D IOMMU_FLUSHF_added; if ( old.pr ) { @@ -486,11 +466,11 @@ int cf_check amd_iommu_map_page( } =20 int cf_check amd_iommu_unmap_page( - struct domain *d, dfn_t dfn, unsigned int order, unsigned int *flush_f= lags) + struct domain *d, dfn_t dfn, unsigned int order, unsigned int *flush_f= lags, + struct iommu_context *ctx) { unsigned long pt_mfn =3D 0; struct domain_iommu *hd =3D dom_iommu(d); - struct iommu_context *ctx =3D iommu_default_context(d); unsigned int level =3D (order / PTE_PER_TABLE_SHIFT) + 1; union amd_iommu_pte old =3D {}; =20 @@ -500,17 +480,11 @@ int cf_check amd_iommu_unmap_page( */ ASSERT((hd->platform_ops->page_sizes >> order) & PAGE_SIZE_4K); =20 - spin_lock(&ctx->arch.mapping_lock); - if ( !ctx->arch.amd.root_table ) - { - spin_unlock(&ctx->arch.mapping_lock); return 0; - } =20 if ( iommu_pde_from_dfn(d, ctx, dfn_x(dfn), level, &pt_mfn, flush_flag= s, false) ) { - spin_unlock(&ctx->arch.mapping_lock); AMD_IOMMU_ERROR("invalid IO pagetable entry dfn =3D %"PRI_dfn"\n", dfn_x(dfn)); domain_crash(d); @@ -524,7 +498,7 @@ int cf_check amd_iommu_unmap_page( /* Mark PTE as 'page not present'. */ old =3D clear_iommu_pte_present(pt_mfn, dfn_x(dfn), level, &free); =20 - while ( unlikely(free) && ++level < ctx->arch.amd.paging_mode ) + while ( unlikely(free) && ++level < hd->arch.amd.paging_mode ) { struct page_info *pg =3D mfn_to_page(_mfn(pt_mfn)); =20 @@ -540,8 +514,6 @@ int cf_check amd_iommu_unmap_page( } } =20 - spin_unlock(&ctx->arch.mapping_lock); - if ( old.pr ) { *flush_flags |=3D IOMMU_FLUSHF_modified; @@ -608,7 +580,7 @@ static unsigned long flush_count(unsigned long dfn, uns= igned long page_count, } =20 int cf_check amd_iommu_flush_iotlb_pages( - struct domain *d, dfn_t dfn, unsigned long page_count, + struct domain *d, struct iommu_context *ctx, dfn_t dfn, unsigned long = page_count, unsigned int flush_flags) { unsigned long dfn_l =3D dfn_x(dfn); @@ -626,7 +598,7 @@ int cf_check amd_iommu_flush_iotlb_pages( /* If so requested or if the range wraps then just flush everything. */ if ( (flush_flags & IOMMU_FLUSHF_all) || dfn_l + page_count < dfn_l ) { - amd_iommu_flush_all_pages(d); + amd_iommu_flush_all_pages(d, ctx); return 0; } =20 @@ -639,13 +611,13 @@ int cf_check amd_iommu_flush_iotlb_pages( * flush code. */ if ( page_count =3D=3D 1 ) /* order 0 flush count */ - amd_iommu_flush_pages(d, dfn_l, 0); + amd_iommu_flush_pages(d, ctx, dfn_l, 0); else if ( flush_count(dfn_l, page_count, 9) =3D=3D 1 ) - amd_iommu_flush_pages(d, dfn_l, 9); + amd_iommu_flush_pages(d, ctx, dfn_l, 9); else if ( flush_count(dfn_l, page_count, 18) =3D=3D 1 ) - amd_iommu_flush_pages(d, dfn_l, 18); + amd_iommu_flush_pages(d, ctx, dfn_l, 18); else - amd_iommu_flush_all_pages(d); + amd_iommu_flush_all_pages(d, ctx); =20 return 0; } diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/pass= through/amd/pci_amd_iommu.c index 0008b35162..366d5eb982 100644 --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -20,8 +20,11 @@ #include #include #include +#include +#include =20 #include +#include =20 #include "iommu.h" #include "../ats.h" @@ -85,18 +88,6 @@ int get_dma_requestor_id(uint16_t seg, uint16_t bdf) return req_id; } =20 -static int __must_check allocate_domain_resources(struct domain *d) -{ - struct iommu_context *ctx =3D iommu_default_context(d); - int rc; - - spin_lock(&ctx->arch.mapping_lock); - rc =3D amd_iommu_alloc_root(d); - spin_unlock(&ctx->arch.mapping_lock); - - return rc; -} - static bool any_pdev_behind_iommu(const struct domain *d, const struct pci_dev *exclude, const struct amd_iommu *iommu) @@ -127,8 +118,9 @@ static bool use_ats( =20 static int __must_check amd_iommu_setup_domain_device( struct domain *domain, struct iommu_context *ctx, struct amd_iommu *io= mmu, - uint8_t devfn, struct pci_dev *pdev) + uint8_t devfn, struct pci_dev *pdev, struct iommu_context *prev_ctx) { + struct domain_iommu *hd =3D dom_iommu(domain); struct amd_iommu_dte *table, *dte; unsigned long flags; unsigned int req_id, sr_flags; @@ -138,11 +130,7 @@ static int __must_check amd_iommu_setup_domain_device( const struct page_info *root_pg; domid_t domid; =20 - BUG_ON(!ctx->arch.amd.paging_mode || !iommu->dev_table.buffer); - - rc =3D allocate_domain_resources(domain); - if ( rc ) - return rc; + BUG_ON(!hd->arch.amd.paging_mode || !iommu->dev_table.buffer); =20 req_id =3D get_dma_requestor_id(iommu->seg, pdev->sbdf.bdf); ivrs_dev =3D &get_ivrs_mappings(iommu->seg)[req_id]; @@ -157,7 +145,7 @@ static int __must_check amd_iommu_setup_domain_device( ivrs_dev =3D &get_ivrs_mappings(iommu->seg)[req_id]; =20 root_pg =3D ctx->arch.amd.root_table; - domid =3D domain->domain_id; + domid =3D ctx->arch.amd.didmap[iommu->index]; =20 spin_lock_irqsave(&iommu->lock, flags); =20 @@ -166,7 +154,7 @@ static int __must_check amd_iommu_setup_domain_device( /* bind DTE to domain page-tables */ rc =3D amd_iommu_set_root_page_table( dte, page_to_maddr(root_pg), domid, - ctx->arch.amd.paging_mode, sr_flags); + hd->arch.amd.paging_mode, sr_flags); if ( rc ) { ASSERT(rc < 0); @@ -208,7 +196,7 @@ static int __must_check amd_iommu_setup_domain_device( else rc =3D amd_iommu_set_root_page_table( dte, page_to_maddr(root_pg), domid, - ctx->arch.amd.paging_mode, sr_flags); + hd->arch.amd.paging_mode, sr_flags); if ( rc < 0 ) { spin_unlock_irqrestore(&iommu->lock, flags); @@ -259,7 +247,7 @@ static int __must_check amd_iommu_setup_domain_device( "root table =3D %#"PRIx64", " "domain =3D %d, paging mode =3D %d\n", req_id, pdev->type, page_to_maddr(root_pg), - domid, ctx->arch.amd.paging_mode); + domid, hd->arch.amd.paging_mode); =20 ASSERT(pcidevs_locked()); =20 @@ -272,6 +260,15 @@ static int __must_check amd_iommu_setup_domain_device( amd_iommu_flush_iotlb(devfn, pdev, INV_IOMMU_ALL_PAGES_ADDRESS, 0); } =20 + if ( prev_ctx ) + { + /* Don't underflow the counter. */ + BUG_ON(!prev_ctx->arch.amd.iommu_dev_cnt[iommu->index]); + prev_ctx->arch.vtd.iommu_dev_cnt[iommu->index]--; + } + + ctx->arch.amd.iommu_dev_cnt[iommu->index]++; + return 0; } =20 @@ -338,27 +335,12 @@ static int cf_check iov_enable_xt(void) return 0; } =20 -int amd_iommu_alloc_root(struct domain *d) -{ - struct domain_iommu *hd =3D dom_iommu(d); - struct iommu_context *ctx =3D iommu_default_context(d); - - if ( unlikely(!ctx->arch.amd.root_table) && d !=3D dom_io ) - { - ctx->arch.amd.root_table =3D iommu_alloc_pgtable(hd, ctx, 0); - if ( !ctx->arch.amd.root_table ) - return -ENOMEM; - } - - return 0; -} - unsigned int __read_mostly amd_iommu_max_paging_mode =3D IOMMU_MAX_PT_LEVE= LS; int __read_mostly amd_iommu_min_paging_mode =3D 1; =20 static int cf_check amd_iommu_domain_init(struct domain *d) { - struct iommu_context *ctx =3D iommu_default_context(d); + struct domain_iommu *hd =3D dom_iommu(d); int pglvl =3D amd_iommu_get_paging_mode( 1UL << (domain_max_paddr_bits(d) - PAGE_SHIFT)); =20 @@ -369,7 +351,7 @@ static int cf_check amd_iommu_domain_init(struct domain= *d) * Choose the number of levels for the IOMMU page tables, taking into * account unity maps. */ - ctx->arch.amd.paging_mode =3D max(pglvl, amd_iommu_min_paging_mode); + hd->arch.amd.paging_mode =3D max(pglvl, amd_iommu_min_paging_mode); =20 return 0; } @@ -380,9 +362,6 @@ static void __hwdom_init cf_check amd_iommu_hwdom_init(= struct domain *d) { const struct amd_iommu *iommu; =20 - if ( allocate_domain_resources(d) ) - BUG(); - for_each_amd_iommu ( iommu ) if ( iomem_deny_access(d, PFN_DOWN(iommu->mmio_base_phys), PFN_DOWN(iommu->mmio_base_phys + @@ -394,8 +373,11 @@ static void __hwdom_init cf_check amd_iommu_hwdom_init= (struct domain *d) setup_hwdom_pci_devices(d, amd_iommu_add_device); } =20 + + static void amd_iommu_disable_domain_device(const struct domain *domain, struct amd_iommu *iommu, + struct iommu_context *prev_ctx, uint8_t devfn, struct pci_dev = *pdev) { struct amd_iommu_dte *table, *dte; @@ -442,108 +424,141 @@ static void amd_iommu_disable_domain_device(const s= truct domain *domain, AMD_IOMMU_DEBUG("Disable: device id =3D %#x, " "domain =3D %d, paging mode =3D %d\n", req_id, dte->domain_id, - iommu_default_context(domain)->arch.amd.paging_mod= e); + dom_iommu(domain)->arch.amd.paging_mode); } else spin_unlock_irqrestore(&iommu->lock, flags); + + BUG_ON(!prev_ctx->arch.amd.iommu_dev_cnt[iommu->index]); + prev_ctx->arch.amd.iommu_dev_cnt[iommu->index]--; } =20 -static int cf_check reassign_device( - struct domain *source, struct domain *target, u8 devfn, - struct pci_dev *pdev) +static int cf_check amd_iommu_context_init(struct domain *d, struct iommu_= context *ctx, + u32 flags) { struct amd_iommu *iommu; - struct iommu_context *target_ctx =3D iommu_default_context(target); - struct iommu_context *source_ctx =3D iommu_default_context(source); - int rc; + struct domain_iommu *hd =3D dom_iommu(d); =20 - iommu =3D find_iommu_for_device(pdev->seg, pdev->sbdf.bdf); - if ( !iommu ) + ctx->arch.amd.didmap =3D xzalloc_array(domid_t, nr_amd_iommus); + if ( !ctx->arch.amd.didmap ) + return -ENOMEM; + + ctx->arch.amd.iommu_dev_cnt =3D xzalloc_array(unsigned long, nr_amd_io= mmus); + if ( !ctx->arch.amd.iommu_dev_cnt ) { - AMD_IOMMU_WARN("failed to find IOMMU: %pp cannot be assigned to %p= d\n", - &PCI_SBDF(pdev->seg, pdev->bus, devfn), target); - return -ENODEV; + xfree(ctx->arch.amd.didmap); + return -ENOMEM; } =20 - rc =3D amd_iommu_setup_domain_device(target, target_ctx, iommu, devfn,= pdev); - if ( rc ) - return rc; + // TODO: Allocate IOMMU domid only when attaching devices ? + /* Populate context DID map using pseudo DIDs */ + for_each_amd_iommu(iommu) + { + ctx->arch.amd.didmap[iommu->index] =3D + iommu_alloc_domid(iommu->domid_map); + } =20 - if ( devfn =3D=3D pdev->devfn && pdev->domain !=3D target ) + if ( !ctx->opaque ) { - write_lock(&source->pci_lock); - list_del(&pdev->domain_list); - write_unlock(&source->pci_lock); + /* Create initial context page */ + ctx->arch.amd.root_table =3D iommu_alloc_pgtable(hd, ctx, 0); + } =20 - pdev->domain =3D target; + return arch_iommu_context_init(d, ctx, flags); =20 - write_lock(&target->pci_lock); - list_add(&pdev->domain_list, &target->pdev_list); - write_unlock(&target->pci_lock); - } +} =20 - /* - * If the device belongs to the hardware domain, and it has a unity ma= pping, - * don't remove it from the hardware domain, because BIOS may referenc= e that - * mapping. - */ - if ( !is_hardware_domain(source) ) - { - const struct ivrs_mappings *ivrs_mappings =3D get_ivrs_mappings(pd= ev->seg); - unsigned int bdf =3D PCI_BDF(pdev->bus, devfn); +static int cf_check amd_iommu_context_teardown(struct domain *d, + struct iommu_context *ctx, u32 fla= gs) +{ + struct amd_iommu *iommu; + pcidevs_lock(); =20 - rc =3D amd_iommu_reserve_domain_unity_unmap( - source, source_ctx, - ivrs_mappings[get_dma_requestor_id(pdev->seg, bdf)].unity= _map); - if ( rc ) - return rc; + // TODO: Cleanup mappings + ASSERT(ctx->arch.amd.didmap); + + for_each_amd_iommu(iommu) + { + iommu_free_domid(ctx->arch.amd.didmap[iommu->index], iommu->domid_= map); } =20 - AMD_IOMMU_DEBUG("Re-assign %pp from %pd to %pd\n", - &PCI_SBDF(pdev->seg, pdev->bus, devfn), source, target= ); + xfree(ctx->arch.amd.didmap); =20 - return 0; + pcidevs_unlock(); + return arch_iommu_context_teardown(d, ctx, flags); } =20 -static int cf_check amd_iommu_assign_device( - struct domain *d, u8 devfn, struct pci_dev *pdev, u32 flag) +static int cf_check amd_iommu_attach( + struct domain *d, struct pci_dev *pdev, struct iommu_context *ctx) { + int ret; struct ivrs_mappings *ivrs_mappings =3D get_ivrs_mappings(pdev->seg); - unsigned int bdf =3D PCI_BDF(pdev->bus, devfn); - int req_id =3D get_dma_requestor_id(pdev->seg, bdf); - int rc =3D amd_iommu_reserve_domain_unity_map( - d, iommu_default_context(d), - ivrs_mappings[req_id].unity_map, flag); + int req_id =3D get_dma_requestor_id(pdev->seg, pdev->sbdf.bdf); + struct ivrs_unity_map *map =3D ivrs_mappings[req_id].unity_map; + struct amd_iommu *iommu =3D find_iommu_for_device(pdev->seg, pdev->sbd= f.bdf); =20 - if ( !rc ) - rc =3D reassign_device(pdev->domain, d, devfn, pdev); + ret =3D amd_iommu_reserve_domain_unity_map(d, ctx, map, 0); + if ( !ret ) + return ret; =20 - if ( rc && !is_hardware_domain(d) ) - { - int ret =3D amd_iommu_reserve_domain_unity_unmap( - d, iommu_default_context(d), - ivrs_mappings[req_id].unity_map); + return amd_iommu_setup_domain_device(d, ctx, iommu, pdev->devfn, pdev,= NULL); +} =20 - if ( ret ) - { - printk(XENLOG_ERR "AMD-Vi: " - "unity-unmap for %pd/%04x:%02x:%02x.%u failed (%d)\n", - d, pdev->seg, pdev->bus, - PCI_SLOT(devfn), PCI_FUNC(devfn), ret); - domain_crash(d); - } - } +static int cf_check amd_iommu_detach(struct domain *d, struct pci_dev *pde= v, + struct iommu_context *prev_ctx) +{ + struct ivrs_mappings *ivrs_mappings =3D get_ivrs_mappings(pdev->seg); + int req_id =3D get_dma_requestor_id(pdev->seg, pdev->sbdf.bdf); + struct amd_iommu *iommu =3D find_iommu_for_device(pdev->seg, pdev->sbd= f.bdf); + + amd_iommu_disable_domain_device(d, iommu, prev_ctx, pdev->devfn, pdev); =20 - return rc; + return amd_iommu_reserve_domain_unity_unmap(d, prev_ctx, ivrs_mappings= [req_id].unity_map); } =20 -static void cf_check amd_iommu_clear_root_pgtable(struct domain *d) +static int cf_check amd_iommu_add_devfn(struct domain *d, struct pci_dev *= pdev, + u16 devfn, struct iommu_context *c= tx) { - struct iommu_context *ctx =3D iommu_default_context(d); + struct amd_iommu *iommu =3D find_iommu_for_device(pdev->seg, pdev->sbd= f.bdf); + + return amd_iommu_setup_domain_device(d, ctx, iommu, pdev->devfn, pdev,= NULL); +} + +static int cf_check amd_iommu_remove_devfn(struct domain *d, struct pci_de= v *pdev, + u16 devfn) +{ + struct amd_iommu *iommu =3D find_iommu_for_device(pdev->seg, pdev->sbd= f.bdf); + + amd_iommu_disable_domain_device(d, iommu, NULL, pdev->devfn, pdev); + + return 0; +} + +static int cf_check amd_iommu_reattach(struct domain *d, + struct pci_dev *pdev, + struct iommu_context *prev_ctx, + struct iommu_context *ctx) +{ + int ret; + struct ivrs_mappings *ivrs_mappings =3D get_ivrs_mappings(pdev->seg); + int req_id =3D get_dma_requestor_id(pdev->seg, pdev->sbdf.bdf); + struct ivrs_unity_map *map =3D ivrs_mappings[req_id].unity_map; + + ret =3D amd_iommu_reserve_domain_unity_map(d, ctx, map, 0); + if ( !ret ) + return ret; + + ret =3D amd_iommu_setup_domain_device(d, ctx, ivrs_mappings->iommu, pd= ev->devfn, + pdev, prev_ctx); + if ( !ret ) + return ret; =20 - spin_lock(&ctx->arch.mapping_lock); + return amd_iommu_reserve_domain_unity_unmap(d, prev_ctx, map); +} + +static void cf_check amd_iommu_clear_root_pgtable(struct domain *d, struct= iommu_context *ctx) +{ ctx->arch.amd.root_table =3D NULL; - spin_unlock(&ctx->arch.mapping_lock); } =20 static void cf_check amd_iommu_domain_destroy(struct domain *d) @@ -628,48 +643,7 @@ static int cf_check amd_iommu_add_device(u8 devfn, str= uct pci_dev *pdev) AMD_IOMMU_WARN("%pd: unity mapping failed for %pp\n", pdev->domain, &PCI_SBDF(pdev->seg, bdf)); =20 - return amd_iommu_setup_domain_device(pdev->domain, ctx, iommu, devfn, = pdev); -} - -static int cf_check amd_iommu_remove_device(u8 devfn, struct pci_dev *pdev) -{ - struct amd_iommu *iommu; - struct iommu_context *ctx; - u16 bdf; - struct ivrs_mappings *ivrs_mappings; - - if ( !pdev->domain ) - return -EINVAL; - - ctx =3D iommu_default_context(pdev->domain); - - iommu =3D find_iommu_for_device(pdev->seg, pdev->sbdf.bdf); - if ( !iommu ) - { - AMD_IOMMU_WARN("failed to find IOMMU: %pp cannot be removed from %= pd\n", - &PCI_SBDF(pdev->seg, pdev->bus, devfn), pdev->doma= in); - return -ENODEV; - } - - amd_iommu_disable_domain_device(pdev->domain, iommu, devfn, pdev); - - ivrs_mappings =3D get_ivrs_mappings(pdev->seg); - bdf =3D PCI_BDF(pdev->bus, devfn); - - if ( amd_iommu_reserve_domain_unity_unmap( - pdev->domain, ctx, - ivrs_mappings[ivrs_mappings[bdf].dte_requestor_id].unity_map)= ) - AMD_IOMMU_WARN("%pd: unity unmapping failed for %pp\n", - pdev->domain, &PCI_SBDF(pdev->seg, bdf)); - - amd_iommu_quarantine_teardown(pdev); - - if ( amd_iommu_perdev_intremap && - ivrs_mappings[bdf].dte_requestor_id =3D=3D bdf && - ivrs_mappings[bdf].intremap_table ) - amd_iommu_free_intremap_table(iommu, &ivrs_mappings[bdf], bdf); - - return 0; + return amd_iommu_setup_domain_device(pdev->domain, ctx, iommu, devfn, = pdev, NULL); } =20 static int cf_check amd_iommu_group_id(u16 seg, u8 bus, u8 devfn) @@ -729,30 +703,33 @@ static void amd_dump_page_table_level(struct page_inf= o *pg, int level, =20 static void cf_check amd_dump_page_tables(struct domain *d) { + struct domain_iommu *hd =3D dom_iommu(d); struct iommu_context *ctx =3D iommu_default_context(d); =20 if ( !ctx->arch.amd.root_table ) return; =20 - printk("AMD IOMMU %pd table has %u levels\n", d, ctx->arch.amd.paging_= mode); + printk("AMD IOMMU %pd table has %u levels\n", d, hd->arch.amd.paging_m= ode); amd_dump_page_table_level(ctx->arch.amd.root_table, - ctx->arch.amd.paging_mode, 0, 0); + hd->arch.amd.paging_mode, 0, 0); } =20 static const struct iommu_ops __initconst_cf_clobber _iommu_ops =3D { .page_sizes =3D PAGE_SIZE_4K | PAGE_SIZE_2M | PAGE_SIZE_1G, .init =3D amd_iommu_domain_init, .hwdom_init =3D amd_iommu_hwdom_init, - .quarantine_init =3D amd_iommu_quarantine_init, - .add_device =3D amd_iommu_add_device, - .remove_device =3D amd_iommu_remove_device, - .assign_device =3D amd_iommu_assign_device, + .context_init =3D amd_iommu_context_init, + .context_teardown =3D amd_iommu_context_teardown, + .attach =3D amd_iommu_attach, + .detach =3D amd_iommu_detach, + .reattach =3D amd_iommu_reattach, + .add_devfn =3D amd_iommu_add_devfn, + .remove_devfn =3D amd_iommu_remove_devfn, .teardown =3D amd_iommu_domain_destroy, .clear_root_pgtable =3D amd_iommu_clear_root_pgtable, .map_page =3D amd_iommu_map_page, .unmap_page =3D amd_iommu_unmap_page, .iotlb_flush =3D amd_iommu_flush_iotlb_pages, - .reassign_device =3D reassign_device, .get_device_group_id =3D amd_iommu_group_id, .enable_x2apic =3D iov_enable_xt, .update_ire_from_apic =3D amd_iommu_ioapic_update_ire, diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iomm= u.c index 662da49766..f92835a2ed 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -208,13 +208,15 @@ int iommu_domain_init(struct domain *d, unsigned int = opts) hd->node =3D NUMA_NO_NODE; #endif =20 + rspin_lock_init(&hd->default_ctx.lock); + ret =3D arch_iommu_domain_init(d); if ( ret ) return ret; =20 hd->platform_ops =3D iommu_get_ops(); ret =3D iommu_call(hd->platform_ops, init, d); - if ( ret || is_system_domain(d) ) + if ( ret || (is_system_domain(d) && d !=3D dom_io) ) return ret; =20 /* @@ -236,7 +238,17 @@ int iommu_domain_init(struct domain *d, unsigned int o= pts) =20 ASSERT(!(hd->need_sync && hd->hap_pt_share)); =20 - return 0; + rspin_lock(&hd->default_ctx.lock); + ret =3D iommu_context_init(d, &hd->default_ctx, 0, IOMMU_CONTEXT_INIT_= default); + rspin_unlock(&hd->default_ctx.lock); + + rwlock_init(&hd->other_contexts.lock); + hd->other_contexts.initialized =3D (atomic_t)ATOMIC_INIT(0); + hd->other_contexts.count =3D 0; + hd->other_contexts.bitmap =3D NULL; + hd->other_contexts.map =3D NULL; + + return ret; } =20 static void cf_check iommu_dump_page_tables(unsigned char key) @@ -249,14 +261,11 @@ static void cf_check iommu_dump_page_tables(unsigned = char key) =20 for_each_domain(d) { - if ( is_hardware_domain(d) || !is_iommu_enabled(d) ) + if ( !is_iommu_enabled(d) ) continue; =20 if ( iommu_use_hap_pt(d) ) - { printk("%pd sharing page tables\n", d); - continue; - } =20 iommu_vcall(dom_iommu(d)->platform_ops, dump_page_tables, d); } @@ -274,9 +283,13 @@ void __hwdom_init iommu_hwdom_init(struct domain *d) iommu_vcall(hd->platform_ops, hwdom_init, d); } =20 -static void iommu_teardown(struct domain *d) +void cf_check iommu_domain_destroy(struct domain *d) { struct domain_iommu *hd =3D dom_iommu(d); + struct pci_dev *pdev; + + if ( !is_iommu_enabled(d) ) + return; =20 /* * During early domain creation failure, we may reach here with the @@ -285,17 +298,65 @@ static void iommu_teardown(struct domain *d) if ( !hd->platform_ops ) return; =20 + /* Move all devices back to quarantine */ + /* TODO: Is it needed ? */ + for_each_pdev(d, pdev) + { + int rc =3D iommu_reattach_context(d, dom_io, pdev, 0); + + if ( rc ) + { + printk(XENLOG_WARNING "Unable to quarantine device %pp (%d)\n"= , &pdev->sbdf, rc); + pdev->broken =3D true; + } + else + pdev->domain =3D dom_io; + } + iommu_vcall(hd->platform_ops, teardown, d); + + arch_iommu_domain_destroy(d); } =20 -void iommu_domain_destroy(struct domain *d) -{ - if ( !is_iommu_enabled(d) ) - return; +bool cf_check iommu_check_context(struct domain *d, u16 ctx_id) { + struct domain_iommu *hd =3D dom_iommu(d); =20 - iommu_teardown(d); + if (ctx_id =3D=3D 0) + return 1; /* Default context always exist. */ =20 - arch_iommu_domain_destroy(d); + if ((ctx_id - 1) >=3D hd->other_contexts.count) + return 0; /* out of bounds */ + + return test_bit(ctx_id - 1, hd->other_contexts.bitmap); +} + +struct iommu_context * cf_check iommu_get_context(struct domain *d, u16 ct= x_id) { + struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx; + + if ( !iommu_check_context(d, ctx_id) ) + return NULL; + + if (ctx_id =3D=3D 0) + ctx =3D &hd->default_ctx; + else + ctx =3D &hd->other_contexts.map[ctx_id - 1]; + + rspin_lock(&ctx->lock); + /* Check if the context is still valid at this point */ + if ( unlikely(!iommu_check_context(d, ctx_id)) ) + { + /* Context has been destroyed in between */ + rspin_unlock(&ctx->lock); + return NULL; + } + + return ctx; +} + +void cf_check iommu_put_context(struct iommu_context *ctx) +{ + rspin_unlock(&ctx->lock); } =20 static unsigned int mapping_order(const struct domain_iommu *hd, @@ -323,11 +384,11 @@ static unsigned int mapping_order(const struct domain= _iommu *hd, return order; } =20 -long iommu_map(struct domain *d, dfn_t dfn0, mfn_t mfn0, - unsigned long page_count, unsigned int flags, - unsigned int *flush_flags) +static long _iommu_map(struct domain *d, dfn_t dfn0, mfn_t mfn0, + unsigned long page_count, unsigned int flags, + unsigned int *flush_flags, struct iommu_context *ct= x) { - const struct domain_iommu *hd =3D dom_iommu(d); + struct domain_iommu *hd =3D dom_iommu(d); unsigned long i; unsigned int order, j =3D 0; int rc =3D 0; @@ -350,7 +411,7 @@ long iommu_map(struct domain *d, dfn_t dfn0, mfn_t mfn0, return i; =20 rc =3D iommu_call(hd->platform_ops, map_page, d, dfn, mfn, - flags | IOMMUF_order(order), flush_flags); + flags | IOMMUF_order(order), flush_flags, ctx); =20 if ( likely(!rc) ) continue; @@ -361,10 +422,10 @@ long iommu_map(struct domain *d, dfn_t dfn0, mfn_t mf= n0, d->domain_id, dfn_x(dfn), mfn_x(mfn), rc); =20 /* while statement to satisfy __must_check */ - while ( iommu_unmap(d, dfn0, i, 0, flush_flags) ) + while ( iommu_unmap(d, dfn0, i, 0, flush_flags, ctx->id) ) break; =20 - if ( !is_hardware_domain(d) ) + if ( !ctx->id && !is_hardware_domain(d) ) domain_crash(d); =20 break; @@ -375,43 +436,67 @@ long iommu_map(struct domain *d, dfn_t dfn0, mfn_t mf= n0, * page, flush everything and clear flush flags. */ if ( page_count > 1 && unlikely(rc) && - !iommu_iotlb_flush_all(d, *flush_flags) ) + !iommu_iotlb_flush_all(d, ctx->id, *flush_flags) ) *flush_flags =3D 0; =20 return rc; } =20 +long iommu_map(struct domain *d, dfn_t dfn0, mfn_t mfn0, + unsigned long page_count, unsigned int flags, + unsigned int *flush_flags, u16 ctx_id) +{ + struct iommu_context *ctx; + long ret; + + if ( !(ctx =3D iommu_get_context(d, ctx_id)) ) + return -ENOENT; + + ret =3D _iommu_map(d, dfn0, mfn0, page_count, flags, flush_flags, ctx); + + iommu_put_context(ctx); + + return ret; +} + int iommu_legacy_map(struct domain *d, dfn_t dfn, mfn_t mfn, unsigned long page_count, unsigned int flags) { + struct iommu_context *ctx; unsigned int flush_flags =3D 0; - int rc; + int rc =3D 0; =20 ASSERT(!(flags & IOMMUF_preempt)); - rc =3D iommu_map(d, dfn, mfn, page_count, flags, &flush_flags); =20 - if ( !this_cpu(iommu_dont_flush_iotlb) && !rc ) - rc =3D iommu_iotlb_flush(d, dfn, page_count, flush_flags); + ctx =3D iommu_get_context(d, 0); + + if ( !ctx->opaque ) + { + rc =3D iommu_map(d, dfn, mfn, page_count, flags, &flush_flags, 0); + + if ( !this_cpu(iommu_dont_flush_iotlb) && !rc ) + rc =3D iommu_iotlb_flush(d, dfn, page_count, flush_flags, 0); + } + + iommu_put_context(ctx); =20 return rc; } =20 -long iommu_unmap(struct domain *d, dfn_t dfn0, unsigned long page_count, - unsigned int flags, unsigned int *flush_flags) +static long _iommu_unmap(struct domain *d, dfn_t dfn0, unsigned long page_= count, + unsigned int flags, unsigned int *flush_flags, + struct iommu_context *ctx) { - const struct domain_iommu *hd =3D dom_iommu(d); + struct domain_iommu *hd =3D dom_iommu(d); unsigned long i; unsigned int order, j =3D 0; int rc =3D 0; - struct iommu_context *ctx; =20 if ( !is_iommu_enabled(d) ) return 0; =20 ASSERT(!(flags & ~IOMMUF_preempt)); =20 - ctx =3D iommu_default_context(d); - for ( i =3D 0; i < page_count; i +=3D 1UL << order ) { dfn_t dfn =3D dfn_add(dfn0, i); @@ -425,7 +510,8 @@ long iommu_unmap(struct domain *d, dfn_t dfn0, unsigned= long page_count, return i; =20 err =3D iommu_call(hd->platform_ops, unmap_page, d, dfn, - flags | IOMMUF_order(order), flush_flags); + flags | IOMMUF_order(order), flush_flags, + ctx); =20 if ( likely(!err) ) continue; @@ -438,7 +524,7 @@ long iommu_unmap(struct domain *d, dfn_t dfn0, unsigned= long page_count, if ( !rc ) rc =3D err; =20 - if ( !is_hardware_domain(d) ) + if ( !ctx->id && !is_hardware_domain(d) ) { domain_crash(d); break; @@ -450,41 +536,74 @@ long iommu_unmap(struct domain *d, dfn_t dfn0, unsign= ed long page_count, * page, flush everything and clear flush flags. */ if ( page_count > 1 && unlikely(rc) && - !iommu_iotlb_flush_all(d, *flush_flags) ) + !iommu_iotlb_flush_all(d, ctx->id, *flush_flags) ) *flush_flags =3D 0; =20 return rc; } =20 +long iommu_unmap(struct domain *d, dfn_t dfn0, unsigned long page_count, + unsigned int flags, unsigned int *flush_flags, + u16 ctx_id) +{ + struct iommu_context *ctx; + long ret; + + if ( !(ctx =3D iommu_get_context(d, ctx_id)) ) + return -ENOENT; + + ret =3D _iommu_unmap(d, dfn0, page_count, flags, flush_flags, ctx); + + iommu_put_context(ctx); + + return ret; +} + int iommu_legacy_unmap(struct domain *d, dfn_t dfn, unsigned long page_cou= nt) { unsigned int flush_flags =3D 0; - int rc =3D iommu_unmap(d, dfn, page_count, 0, &flush_flags); + struct iommu_context *ctx; + int rc =3D 0; + + ctx =3D iommu_get_context(d, 0); =20 - if ( !this_cpu(iommu_dont_flush_iotlb) && !rc ) - rc =3D iommu_iotlb_flush(d, dfn, page_count, flush_flags); + if ( !ctx->opaque ) + { + rc =3D iommu_unmap(d, dfn, page_count, 0, &flush_flags, 0); + + if ( !this_cpu(iommu_dont_flush_iotlb) && !rc ) + rc =3D iommu_iotlb_flush(d, dfn, page_count, flush_flags, 0); + } + + iommu_put_context(ctx); =20 return rc; } =20 int iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn, - unsigned int *flags) + unsigned int *flags, u16 ctx_id) { - const struct domain_iommu *hd =3D dom_iommu(d); + struct domain_iommu *hd =3D dom_iommu(d); struct iommu_context *ctx; + int ret; =20 if ( !is_iommu_enabled(d) || !hd->platform_ops->lookup_page ) return -EOPNOTSUPP; =20 - ctx =3D iommu_default_context(d); + if ( !(ctx =3D iommu_get_context(d, ctx_id)) ) + return -ENOENT; + + ret =3D iommu_call(hd->platform_ops, lookup_page, d, dfn, mfn, flags, = ctx); =20 - return iommu_call(hd->platform_ops, lookup_page, d, dfn, mfn, flags); + iommu_put_context(ctx); + return ret; } =20 int iommu_iotlb_flush(struct domain *d, dfn_t dfn, unsigned long page_coun= t, - unsigned int flush_flags) + unsigned int flush_flags, u16 ctx_id) { - const struct domain_iommu *hd =3D dom_iommu(d); + struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx; int rc; =20 if ( !is_iommu_enabled(d) || !hd->platform_ops->iotlb_flush || @@ -494,7 +613,10 @@ int iommu_iotlb_flush(struct domain *d, dfn_t dfn, uns= igned long page_count, if ( dfn_eq(dfn, INVALID_DFN) ) return -EINVAL; =20 - rc =3D iommu_call(hd->platform_ops, iotlb_flush, d, dfn, page_count, + if ( !(ctx =3D iommu_get_context(d, ctx_id)) ) + return -ENOENT; + + rc =3D iommu_call(hd->platform_ops, iotlb_flush, d, ctx, dfn, page_cou= nt, flush_flags); if ( unlikely(rc) ) { @@ -503,23 +625,29 @@ int iommu_iotlb_flush(struct domain *d, dfn_t dfn, un= signed long page_count, "d%d: IOMMU IOTLB flush failed: %d, dfn %"PRI_dfn", pag= e count %lu flags %x\n", d->domain_id, rc, dfn_x(dfn), page_count, flush_flags); =20 - if ( !is_hardware_domain(d) ) + if ( !ctx->id && !is_hardware_domain(d) ) domain_crash(d); } =20 + iommu_put_context(ctx); + return rc; } =20 -int iommu_iotlb_flush_all(struct domain *d, unsigned int flush_flags) +int iommu_iotlb_flush_all(struct domain *d, u16 ctx_id, unsigned int flush= _flags) { - const struct domain_iommu *hd =3D dom_iommu(d); + struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx; int rc; =20 if ( !is_iommu_enabled(d) || !hd->platform_ops->iotlb_flush || !flush_flags ) return 0; =20 - rc =3D iommu_call(hd->platform_ops, iotlb_flush, d, INVALID_DFN, 0, + if ( !(ctx =3D iommu_get_context(d, ctx_id)) ) + return -ENOENT; + + rc =3D iommu_call(hd->platform_ops, iotlb_flush, d, ctx, _dfn(0), 0, flush_flags | IOMMU_FLUSHF_all); if ( unlikely(rc) ) { @@ -532,21 +660,409 @@ int iommu_iotlb_flush_all(struct domain *d, unsigned= int flush_flags) domain_crash(d); } =20 + iommu_put_context(ctx); return rc; } =20 +int cf_check iommu_context_init(struct domain *d, struct iommu_context *ct= x, u16 ctx_id, + u32 flags) +{ + if ( !dom_iommu(d)->platform_ops->context_init ) + return -ENOSYS; + + INIT_LIST_HEAD(&ctx->devices); + ctx->id =3D ctx_id; + ctx->dying =3D false; + ctx->opaque =3D false; /* assume non-opaque by default */ + + return iommu_call(dom_iommu(d)->platform_ops, context_init, d, ctx, fl= ags); +} + +int iommu_context_alloc(struct domain *d, u16 *ctx_id, u32 flags) +{ + unsigned int i; + int ret; + struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx; + + do { + i =3D find_first_zero_bit(hd->other_contexts.bitmap, hd->other_con= texts.count); + + if ( i >=3D hd->other_contexts.count ) + return -ENOSPC; + + ctx =3D &hd->other_contexts.map[i]; + + /* Try to lock the mutex, can fail on concurrent accesses */ + if ( !rspin_trylock(&ctx->lock) ) + continue; + + /* We can now set it as used, we keep the lock for initialization.= */ + set_bit(i, hd->other_contexts.bitmap); + } while (0); + + *ctx_id =3D i + 1; + + ret =3D iommu_context_init(d, ctx, *ctx_id, flags); + + if ( ret ) + clear_bit(*ctx_id, hd->other_contexts.bitmap); + + iommu_put_context(ctx); + return ret; +} + +/** + * Attach dev phantom functions to ctx, override any existing + * mapped context. + */ +static int cf_check iommu_reattach_phantom(struct domain *d, device_t *dev, + struct iommu_context *ctx) +{ + int ret =3D 0; + uint8_t devfn =3D dev->devfn; + struct domain_iommu *hd =3D dom_iommu(d); + + while ( dev->phantom_stride ) + { + devfn +=3D dev->phantom_stride; + + if ( PCI_SLOT(devfn) !=3D PCI_SLOT(dev->devfn) ) + break; + + ret =3D iommu_call(hd->platform_ops, add_devfn, d, dev, devfn, ctx= ); + + if ( ret ) + break; + } + + return ret; +} + +/** + * Detach all device phantom functions. + */ +static int cf_check iommu_detach_phantom(struct domain *d, device_t *dev) +{ + int ret =3D 0; + uint8_t devfn =3D dev->devfn; + struct domain_iommu *hd =3D dom_iommu(d); + + while ( dev->phantom_stride ) + { + devfn +=3D dev->phantom_stride; + + if ( PCI_SLOT(devfn) !=3D PCI_SLOT(dev->devfn) ) + break; + + ret =3D iommu_call(hd->platform_ops, remove_devfn, d, dev, devfn); + + if ( ret ) + break; + } + + return ret; +} + +int cf_check iommu_attach_context(struct domain *d, device_t *dev, u16 ctx= _id) +{ + struct iommu_context *ctx =3D NULL; + int ret, rc; + + if ( !(ctx =3D iommu_get_context(d, ctx_id)) ) + { + ret =3D -ENOENT; + goto unlock; + } + + pcidevs_lock(); + + if ( ctx->dying ) + { + ret =3D -EINVAL; + goto unlock; + } + + ret =3D iommu_call(dom_iommu(d)->platform_ops, attach, d, dev, ctx); + + if ( ret ) + goto unlock; + + /* See iommu_reattach_context() */ + rc =3D iommu_reattach_phantom(d, dev, ctx); + + if ( rc ) + { + printk(XENLOG_ERR "IOMMU: Unable to attach %pp phantom functions\n= ", + &dev->sbdf); + + if( iommu_call(dom_iommu(d)->platform_ops, detach, d, dev, ctx) + || iommu_detach_phantom(d, dev) ) + { + printk(XENLOG_ERR "IOMMU: Improperly detached %pp\n", &dev->sb= df); + WARN(); + } + + ret =3D -EIO; + goto unlock; + } + + dev->context =3D ctx_id; + list_add(&dev->context_list, &ctx->devices); + +unlock: + pcidevs_unlock(); + + if ( ctx ) + iommu_put_context(ctx); + + return ret; +} + +int cf_check iommu_detach_context(struct domain *d, device_t *dev) +{ + struct iommu_context *ctx; + int ret, rc; + + if ( !dev->domain ) + { + printk(XENLOG_WARNING "IOMMU: Trying to detach a non-attached devi= ce\n"); + WARN(); + return 0; + } + + /* Make sure device is actually in the domain. */ + ASSERT(d =3D=3D dev->domain); + + pcidevs_lock(); + + ctx =3D iommu_get_context(d, dev->context); + ASSERT(ctx); /* device is using an invalid context ? + dev->context invalid ? */ + + ret =3D iommu_call(dom_iommu(d)->platform_ops, detach, d, dev, ctx); + + if ( ret ) + goto unlock; + + rc =3D iommu_detach_phantom(d, dev); + + if ( rc ) + printk(XENLOG_WARNING "IOMMU: " + "Improperly detached device functions (%d)\n", rc); + + list_del(&dev->context_list); + +unlock: + pcidevs_unlock(); + iommu_put_context(ctx); + return ret; +} + +int cf_check iommu_reattach_context(struct domain *prev_dom, struct domain= *next_dom, + device_t *dev, u16 ctx_id) +{ + u16 prev_ctx_id; + device_t *ctx_dev; + struct domain_iommu *prev_hd, *next_hd; + struct iommu_context *prev_ctx =3D NULL, *next_ctx =3D NULL; + int ret, rc; + bool same_domain; + + /* Make sure we actually are doing something meaningful */ + BUG_ON(!prev_dom && !next_dom); + + /* Device domain must be coherent with prev_dom. */ + ASSERT(!prev_dom || dev->domain =3D=3D prev_dom); + + /// TODO: Do such cases exists ? + // /* Platform ops must match */ + // if (dom_iommu(prev_dom)->platform_ops !=3D dom_iommu(next_dom)->pla= tform_ops) + // return -EINVAL; + + if ( !prev_dom ) + return iommu_attach_context(next_dom, dev, ctx_id); + + if ( !next_dom ) + return iommu_detach_context(prev_dom, dev); + + prev_hd =3D dom_iommu(prev_dom); + next_hd =3D dom_iommu(next_dom); + + pcidevs_lock(); + + same_domain =3D prev_dom =3D=3D next_dom; + + prev_ctx_id =3D dev->context; + + if ( same_domain && (ctx_id =3D=3D prev_ctx_id) ) + { + printk(XENLOG_DEBUG + "IOMMU: Reattaching %pp to same IOMMU context c%hu\n", + &dev->sbdf, ctx_id); + ret =3D 0; + goto unlock; + } + + if ( !(prev_ctx =3D iommu_get_context(prev_dom, prev_ctx_id)) ) + { + ret =3D -ENOENT; + goto unlock; + } + + if ( !(next_ctx =3D iommu_get_context(next_dom, ctx_id)) ) + { + ret =3D -ENOENT; + goto unlock; + } + + if ( next_ctx->dying ) + { + ret =3D -EINVAL; + goto unlock; + } + + ret =3D iommu_call(prev_hd->platform_ops, reattach, next_dom, dev, pre= v_ctx, + next_ctx); + + if ( ret ) + goto unlock; + + /* + * We need to do special handling for phantom devices as they + * also use some other PCI functions behind the scenes. + */ + rc =3D iommu_reattach_phantom(next_dom, dev, next_ctx); + + if ( rc ) + { + /** + * Device is being partially reattached (we have primary function = and + * maybe some phantom functions attached to next_ctx, some others = to prev_ctx), + * some functions of the device will be attached to next_ctx. + */ + printk(XENLOG_WARNING "IOMMU: " + "Device %pp improperly reattached due to phantom function" + " reattach failure between %dd%dc and %dd%dc (%d)\n", dev, + prev_dom->domain_id, prev_ctx->id, next_dom->domain_id, + next_dom->domain_id, rc); + + /* Try reattaching to previous context, reverting into a consisten= t state. */ + if ( iommu_call(prev_hd->platform_ops, reattach, prev_dom, dev, ne= xt_ctx, + prev_ctx) || iommu_reattach_phantom(prev_dom, dev,= prev_ctx) ) + { + printk(XENLOG_ERR "Unable to reattach %pp back to %dd%dc\n", + &dev->sbdf, prev_dom->domain_id, prev_ctx->id); + + if ( !is_hardware_domain(prev_dom) ) + domain_crash(prev_dom); + + if ( prev_dom !=3D next_dom && !is_hardware_domain(next_dom) ) + domain_crash(next_dom); + + rc =3D -EIO; + } + + ret =3D rc; + goto unlock; + } + + /* Remove device from previous context, and add it to new one. */ + list_for_each_entry(ctx_dev, &prev_ctx->devices, context_list) + { + if ( ctx_dev =3D=3D dev ) + { + list_del(&ctx_dev->context_list); + list_add(&ctx_dev->context_list, &next_ctx->devices); + break; + } + } + + if (!ret) + dev->context =3D ctx_id; /* update device context*/ + +unlock: + pcidevs_unlock(); + + if ( prev_ctx ) + iommu_put_context(prev_ctx); + + if ( next_ctx ) + iommu_put_context(next_ctx); + + return ret; +} + +int cf_check iommu_context_teardown(struct domain *d, struct iommu_context= *ctx, u32 flags) +{ + struct domain_iommu *hd =3D dom_iommu(d); + + if ( !hd->platform_ops->context_teardown ) + return -ENOSYS; + + ctx->dying =3D true; + + /* first reattach devices back to default context if needed */ + if ( flags & IOMMU_TEARDOWN_REATTACH_DEFAULT ) + { + struct pci_dev *device; + list_for_each_entry(device, &ctx->devices, context_list) + iommu_reattach_context(d, d, device, 0); + } + else if (!list_empty(&ctx->devices)) + return -EBUSY; /* there is a device in context */ + + return iommu_call(hd->platform_ops, context_teardown, d, ctx, flags); +} + +int cf_check iommu_context_free(struct domain *d, u16 ctx_id, u32 flags) +{ + int ret; + struct domain_iommu *hd =3D dom_iommu(d); + struct iommu_context *ctx; + + if ( ctx_id =3D=3D 0 ) + return -EINVAL; + + if ( !(ctx =3D iommu_get_context(d, ctx_id)) ) + return -ENOENT; + + ret =3D iommu_context_teardown(d, ctx, flags); + + if ( !ret ) + clear_bit(ctx_id - 1, hd->other_contexts.bitmap); + + iommu_put_context(ctx); + return ret; +} + int iommu_quarantine_dev_init(device_t *dev) { - const struct domain_iommu *hd =3D dom_iommu(dom_io); + int ret; + u16 ctx_id; =20 - if ( !iommu_quarantine || !hd->platform_ops->quarantine_init ) + if ( !iommu_quarantine ) return 0; =20 - return iommu_call(hd->platform_ops, quarantine_init, - dev, iommu_quarantine =3D=3D IOMMU_quarantine_scratc= h_page); + ret =3D iommu_context_alloc(dom_io, &ctx_id, IOMMU_CONTEXT_INIT_quaran= tine); + + if ( ret ) + return ret; + + /** TODO: Setup scratch page, mappings... */ + + ret =3D iommu_reattach_context(dev->domain, dom_io, dev, ctx_id); + + if ( ret ) + { + ASSERT(!iommu_context_free(dom_io, ctx_id, 0)); + return ret; + } + + return ret; } =20 -static int __init iommu_quarantine_init(void) +int __init iommu_quarantine_init(void) { dom_io->options |=3D XEN_DOMCTL_CDF_iommu; =20 diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index e1ca74b477..56f65090fc 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -654,6 +654,101 @@ unsigned int pci_size_mem_bar(pci_sbdf_t sbdf, unsign= ed int pos, return is64bits ? 2 : 1; } =20 +static int device_assigned(struct pci_dev *pdev) +{ + int rc =3D 0; + + /* + * If the device exists and it is not owned by either the hardware + * domain or dom_io then it must be assigned to a guest, or be + * hidden (owned by dom_xen). + */ + if ( pdev->domain !=3D hardware_domain && pdev->domain !=3D dom_io ) + rc =3D -EBUSY; + + return rc; +} + +/* Caller should hold the pcidevs_lock */ +static int pci_reassign_device(struct domain *prev_dom, struct domain *nex= t_dom, + struct pci_dev *pdev, u32 flag) +{ + int rc =3D 0; + ASSERT(prev_dom || next_dom); + + if ( !is_iommu_enabled(next_dom) ) + return -EINVAL; + + if ( !arch_iommu_use_permitted(next_dom) ) + return -EXDEV; + + /* Do not allow broken devices to be assigned to guests. */ + if ( pdev->broken && next_dom !=3D hardware_domain && next_dom !=3D do= m_io ) + return -EBADF; + + if ( prev_dom ) + { + write_lock(&prev_dom->pci_lock); + vpci_deassign_device(pdev); + write_unlock(&prev_dom->pci_lock); + } + + rc =3D pdev_msix_assign(next_dom, pdev); + if ( rc ) + goto done; + + pdev->fault.count =3D 0; + + if ( prev_dom && next_dom ) + { + printk(XENLOG_INFO "PCI: Reassigning PCI device from %dd to %dd\n", + prev_dom->domain_id, next_dom->domain_id); + } + else if ( prev_dom ) + { + printk(XENLOG_INFO "PCI: Assigning PCI device to %dd\n", prev_dom-= >domain_id); + } + else if ( next_dom ) + { + printk(XENLOG_INFO "PCI: Remove PCI device of %dd\n", next_dom->do= main_id); + } + else + { + ASSERT_UNREACHABLE(); + } + + rc =3D iommu_reattach_context(prev_dom, next_dom, pci_to_dev(pdev), 0); + + if ( rc ) + goto done; + + if ( prev_dom ) + { + write_lock(&prev_dom->pci_lock); + list_del(&pdev->domain_list); + write_unlock(&prev_dom->pci_lock); + } + + pdev->domain =3D next_dom; + + if ( next_dom ) + { + write_lock(&next_dom->pci_lock); + list_add(&pdev->domain_list, &next_dom->pdev_list); + + rc =3D vpci_assign_device(pdev); + write_unlock(&next_dom->pci_lock); + } + + done: + + /* The device is assigned to dom_io so mark it as quarantined */ + if ( !rc && next_dom =3D=3D dom_io ) + pdev->quarantine =3D true; + + return rc; +} + int pci_add_device(u16 seg, u8 bus, u8 devfn, const struct pci_dev_info *info, nodeid_t node) { @@ -699,13 +794,30 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, =20 if ( !pf_pdev ) { - printk(XENLOG_WARNING - "Attempted to add SR-IOV VF %pp without PF %pp\n", - &pdev->sbdf, - &PCI_SBDF(seg, info->physfn.bus, info->physfn.devfn= )); - free_pdev(pseg, pdev); - ret =3D -ENODEV; - goto out; + ret =3D pci_add_device(seg, info->physfn.bus, info->physfn= .devfn, + NULL, node); + if ( ret ) + { + printk(XENLOG_WARNING + "Failed to add SR-IOV device PF %pp for VF %pp\= n", + &PCI_SBDF(seg, info->physfn.bus, info->physfn.d= evfn), + &pdev->sbdf); + free_pdev(pseg, pdev); + goto out; + } + pf_pdev =3D pci_get_pdev(NULL, PCI_SBDF(seg, info->physfn.= bus, + info->physfn.devfn)); + if ( !pf_pdev ) + { + printk(XENLOG_ERR + "Inconsistent PCI state: failed to find newly a= dded PF %pp for VF %pp\n", + &PCI_SBDF(seg, info->physfn.bus, info->physfn.d= evfn), + &pdev->sbdf); + ASSERT_UNREACHABLE(); + free_pdev(pseg, pdev); + ret =3D -EILSEQ; + goto out; + } } =20 if ( !pdev->pf_pdev ) @@ -877,74 +989,6 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn) return ret; } =20 -/* Caller should hold the pcidevs_lock */ -static int deassign_device(struct domain *d, uint16_t seg, uint8_t bus, - uint8_t devfn) -{ - const struct domain_iommu *hd =3D dom_iommu(d); - struct pci_dev *pdev; - struct domain *target; - int ret =3D 0; - - if ( !is_iommu_enabled(d) ) - return -EINVAL; - - ASSERT(pcidevs_locked()); - pdev =3D pci_get_pdev(d, PCI_SBDF(seg, bus, devfn)); - if ( !pdev ) - return -ENODEV; - - /* De-assignment from dom_io should de-quarantine the device */ - if ( (pdev->quarantine || iommu_quarantine) && pdev->domain !=3D dom_i= o ) - { - ret =3D iommu_quarantine_dev_init(pci_to_dev(pdev)); - if ( ret ) - return ret; - - target =3D dom_io; - } - else - target =3D hardware_domain; - - while ( pdev->phantom_stride ) - { - devfn +=3D pdev->phantom_stride; - if ( PCI_SLOT(devfn) !=3D PCI_SLOT(pdev->devfn) ) - break; - ret =3D iommu_call(hd->platform_ops, reassign_device, d, target, d= evfn, - pci_to_dev(pdev)); - if ( ret ) - goto out; - } - - write_lock(&d->pci_lock); - vpci_deassign_device(pdev); - write_unlock(&d->pci_lock); - - devfn =3D pdev->devfn; - ret =3D iommu_call(hd->platform_ops, reassign_device, d, target, devfn, - pci_to_dev(pdev)); - if ( ret ) - goto out; - - if ( pdev->domain =3D=3D hardware_domain ) - pdev->quarantine =3D false; - - pdev->fault.count =3D 0; - - write_lock(&target->pci_lock); - /* Re-assign back to hardware_domain */ - ret =3D vpci_assign_device(pdev); - write_unlock(&target->pci_lock); - - out: - if ( ret ) - printk(XENLOG_G_ERR "%pd: deassign (%pp) failed (%d)\n", - d, &PCI_SBDF(seg, bus, devfn), ret); - - return ret; -} - int pci_release_devices(struct domain *d) { int combined_ret; @@ -966,13 +1010,10 @@ int pci_release_devices(struct domain *d) struct pci_dev *pdev =3D list_first_entry(&d->pdev_list, struct pci_dev, domain_list); - uint16_t seg =3D pdev->seg; - uint8_t bus =3D pdev->bus; - uint8_t devfn =3D pdev->devfn; int ret; =20 write_unlock(&d->pci_lock); - ret =3D deassign_device(d, seg, bus, devfn); + ret =3D pci_reassign_device(d, dom_io, pdev, 0); write_lock(&d->pci_lock); if ( ret ) { @@ -1180,25 +1221,18 @@ struct setup_hwdom { static void __hwdom_init setup_one_hwdom_device(const struct setup_hwdom *= ctxt, struct pci_dev *pdev) { - u8 devfn =3D pdev->devfn; int err; =20 - do { - err =3D ctxt->handler(devfn, pdev); - if ( err ) - { - printk(XENLOG_ERR "setup %pp for d%d failed (%d)\n", - &pdev->sbdf, ctxt->d->domain_id, err); - if ( devfn =3D=3D pdev->devfn ) - return; - } - devfn +=3D pdev->phantom_stride; - } while ( devfn !=3D pdev->devfn && - PCI_SLOT(devfn) =3D=3D PCI_SLOT(pdev->devfn) ); + err =3D ctxt->handler(pdev->devfn, pdev); + + if ( err ) + goto done; =20 write_lock(&ctxt->d->pci_lock); err =3D vpci_assign_device(pdev); write_unlock(&ctxt->d->pci_lock); + +done: if ( err ) printk(XENLOG_ERR "setup of vPCI for d%d failed: %d\n", ctxt->d->domain_id, err); @@ -1397,8 +1431,6 @@ __initcall(setup_dump_pcidevs); static int iommu_add_device(struct pci_dev *pdev) { const struct domain_iommu *hd; - int rc; - unsigned int devfn =3D pdev->devfn; =20 if ( !pdev->domain ) return -EINVAL; @@ -1409,20 +1441,7 @@ static int iommu_add_device(struct pci_dev *pdev) if ( !is_iommu_enabled(pdev->domain) ) return 0; =20 - rc =3D iommu_call(hd->platform_ops, add_device, devfn, pci_to_dev(pdev= )); - if ( rc || !pdev->phantom_stride ) - return rc; - - for ( ; ; ) - { - devfn +=3D pdev->phantom_stride; - if ( PCI_SLOT(devfn) !=3D PCI_SLOT(pdev->devfn) ) - return 0; - rc =3D iommu_call(hd->platform_ops, add_device, devfn, pci_to_dev(= pdev)); - if ( rc ) - printk(XENLOG_WARNING "IOMMU: add %pp failed (%d)\n", - &PCI_SBDF(pdev->seg, pdev->bus, devfn), rc); - } + return iommu_attach_context(pdev->domain, pci_to_dev(pdev), 0); } =20 static int iommu_enable_device(struct pci_dev *pdev) @@ -1444,145 +1463,13 @@ static int iommu_enable_device(struct pci_dev *pde= v) =20 static int iommu_remove_device(struct pci_dev *pdev) { - const struct domain_iommu *hd; - u8 devfn; - if ( !pdev->domain ) return -EINVAL; =20 - hd =3D dom_iommu(pdev->domain); if ( !is_iommu_enabled(pdev->domain) ) return 0; =20 - for ( devfn =3D pdev->devfn ; pdev->phantom_stride; ) - { - int rc; - - devfn +=3D pdev->phantom_stride; - if ( PCI_SLOT(devfn) !=3D PCI_SLOT(pdev->devfn) ) - break; - rc =3D iommu_call(hd->platform_ops, remove_device, devfn, - pci_to_dev(pdev)); - if ( !rc ) - continue; - - printk(XENLOG_ERR "IOMMU: remove %pp failed (%d)\n", - &PCI_SBDF(pdev->seg, pdev->bus, devfn), rc); - return rc; - } - - devfn =3D pdev->devfn; - - return iommu_call(hd->platform_ops, remove_device, devfn, pci_to_dev(p= dev)); -} - -static int device_assigned(u16 seg, u8 bus, u8 devfn) -{ - struct pci_dev *pdev; - int rc =3D 0; - - ASSERT(pcidevs_locked()); - pdev =3D pci_get_pdev(NULL, PCI_SBDF(seg, bus, devfn)); - - if ( !pdev ) - rc =3D -ENODEV; - /* - * If the device exists and it is not owned by either the hardware - * domain or dom_io then it must be assigned to a guest, or be - * hidden (owned by dom_xen). - */ - else if ( pdev->domain !=3D hardware_domain && - pdev->domain !=3D dom_io ) - rc =3D -EBUSY; - - return rc; -} - -/* Caller should hold the pcidevs_lock */ -static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 = flag) -{ - const struct domain_iommu *hd =3D dom_iommu(d); - struct pci_dev *pdev; - int rc =3D 0; - - if ( !is_iommu_enabled(d) ) - return 0; - - if ( !arch_iommu_use_permitted(d) ) - return -EXDEV; - - /* device_assigned() should already have cleared the device for assign= ment */ - ASSERT(pcidevs_locked()); - pdev =3D pci_get_pdev(NULL, PCI_SBDF(seg, bus, devfn)); - ASSERT(pdev && (pdev->domain =3D=3D hardware_domain || - pdev->domain =3D=3D dom_io)); - - /* Do not allow broken devices to be assigned to guests. */ - rc =3D -EBADF; - if ( pdev->broken && d !=3D hardware_domain && d !=3D dom_io ) - goto done; - - write_lock(&pdev->domain->pci_lock); - vpci_deassign_device(pdev); - write_unlock(&pdev->domain->pci_lock); - - rc =3D pdev_msix_assign(d, pdev); - if ( rc ) - goto done; - - if ( pdev->domain !=3D dom_io ) - { - rc =3D iommu_quarantine_dev_init(pci_to_dev(pdev)); - if ( rc ) - goto done; - } - - pdev->fault.count =3D 0; - - rc =3D iommu_call(hd->platform_ops, assign_device, d, devfn, pci_to_de= v(pdev), - flag); - - while ( pdev->phantom_stride && !rc ) - { - devfn +=3D pdev->phantom_stride; - if ( PCI_SLOT(devfn) !=3D PCI_SLOT(pdev->devfn) ) - break; - rc =3D iommu_call(hd->platform_ops, assign_device, d, devfn, - pci_to_dev(pdev), flag); - } - - if ( rc ) - goto done; - - write_lock(&d->pci_lock); - rc =3D vpci_assign_device(pdev); - write_unlock(&d->pci_lock); - - done: - if ( rc ) - { - printk(XENLOG_G_WARNING "%pd: assign %s(%pp) failed (%d)\n", - d, devfn !=3D pdev->devfn ? "phantom function " : "", - &PCI_SBDF(seg, bus, devfn), rc); - - if ( devfn !=3D pdev->devfn && deassign_device(d, seg, bus, pdev->= devfn) ) - { - /* - * Device with phantom functions that failed to both assign and - * rollback. Mark the device as broken and crash the target d= omain, - * as the state of the functions at this point is unknown and = Xen - * has no way to assert consistent context assignment among th= em. - */ - pdev->broken =3D true; - if ( !is_hardware_domain(d) && d !=3D dom_io ) - domain_crash(d); - } - } - /* The device is assigned to dom_io so mark it as quarantined */ - else if ( d =3D=3D dom_io ) - pdev->quarantine =3D true; - - return rc; + return iommu_detach_context(pdev->domain, pdev); } =20 static int iommu_get_device_group( @@ -1672,6 +1559,7 @@ int iommu_do_pci_domctl( u8 bus, devfn; int ret =3D 0; uint32_t machine_sbdf; + struct pci_dev *pdev; =20 switch ( domctl->cmd ) { @@ -1741,7 +1629,15 @@ int iommu_do_pci_domctl( devfn =3D PCI_DEVFN(machine_sbdf); =20 pcidevs_lock(); - ret =3D device_assigned(seg, bus, devfn); + pdev =3D pci_get_pdev(NULL, PCI_SBDF(seg, bus, devfn)); + + if ( !pdev ) + { + printk(XENLOG_G_INFO "%pp doesn't exist", &PCI_SBDF(seg, bus, = devfn)); + break; + } + + ret =3D device_assigned(pdev); if ( domctl->cmd =3D=3D XEN_DOMCTL_test_assign_device ) { if ( ret ) @@ -1752,7 +1648,7 @@ int iommu_do_pci_domctl( } } else if ( !ret ) - ret =3D assign_device(d, seg, bus, devfn, flags); + ret =3D pci_reassign_device(pdev->domain, d, pdev, flags); pcidevs_unlock(); if ( ret =3D=3D -ERESTART ) ret =3D hypercall_create_continuation(__HYPERVISOR_domctl, @@ -1786,7 +1682,20 @@ int iommu_do_pci_domctl( devfn =3D PCI_DEVFN(machine_sbdf); =20 pcidevs_lock(); - ret =3D deassign_device(d, seg, bus, devfn); + pdev =3D pci_get_pdev(d, PCI_SBDF(seg, bus, devfn)); + + if ( pdev ) + { + struct domain *target =3D hardware_domain; + + if ( (pdev->quarantine || iommu_quarantine) && pdev->domain != =3D dom_io ) + target =3D dom_io; + + ret =3D pci_reassign_device(d, target, pdev, 0); + } + else + ret =3D -ENODEV; + pcidevs_unlock(); break; =20 diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough= /vtd/extern.h index 82db8f9435..a980be3646 100644 --- a/xen/drivers/passthrough/vtd/extern.h +++ b/xen/drivers/passthrough/vtd/extern.h @@ -78,12 +78,12 @@ uint64_t alloc_pgtable_maddr(unsigned long npages, node= id_t node); void free_pgtable_maddr(u64 maddr); void *map_vtd_domain_page(u64 maddr); void unmap_vtd_domain_page(const void *va); -int domain_context_mapping_one(struct domain *domain, struct iommu_context= *ctx, - struct vtd_iommu *iommu, uint8_t bus, uint8= _t devfn, - const struct pci_dev *pdev, domid_t domid, - paddr_t pgd_maddr, unsigned int mode); -int domain_context_unmap_one(struct domain *domain, struct vtd_iommu *iomm= u, - uint8_t bus, uint8_t devfn); +int apply_context_single(struct domain *domain, struct iommu_context *ctx, + struct vtd_iommu *iommu, uint8_t bus, uint8_t dev= fn, + struct iommu_context *prev_ctx); +int unapply_context_single(struct domain *domain, struct vtd_iommu *iommu, + struct iommu_context *prev_ctx, uint8_t bus, + uint8_t devfn); int cf_check intel_iommu_get_reserved_device_memory( iommu_grdm_t *func, void *ctxt); =20 @@ -104,8 +104,9 @@ void platform_quirks_init(void); void vtd_ops_preamble_quirk(struct vtd_iommu *iommu); void vtd_ops_postamble_quirk(struct vtd_iommu *iommu); int __must_check me_wifi_quirk(struct domain *domain, uint8_t bus, - uint8_t devfn, domid_t domid, paddr_t pgd_m= addr, - unsigned int mode); + uint8_t devfn, domid_t domid, + unsigned int mode, struct iommu_context *ct= x, + struct iommu_context *prev_ctx); void pci_vtd_quirk(const struct pci_dev *); void quirk_iommu_caps(struct vtd_iommu *iommu); =20 diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/= vtd/iommu.c index 34b2a287f7..bb53cff158 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -61,7 +62,6 @@ static unsigned int __ro_after_init min_pt_levels =3D UIN= T_MAX; static struct tasklet vtd_fault_tasklet; =20 static int cf_check setup_hwdom_device(u8 devfn, struct pci_dev *); -static void setup_hwdom_rmrr(struct domain *d); =20 #define DID_FIELD_WIDTH 16 #define DID_HIGH_OFFSET 8 @@ -165,7 +165,7 @@ static uint64_t addr_to_dma_page_maddr(struct domain *d= omain, u64 pte_maddr =3D 0; =20 addr &=3D (((u64)1) << addr_width) - 1; - ASSERT(spin_is_locked(&ctx->arch.mapping_lock)); + ASSERT(rspin_is_locked(&ctx->lock)); ASSERT(target || !alloc); =20 if ( !ctx->arch.vtd.pgd_maddr ) @@ -270,36 +270,22 @@ static uint64_t addr_to_dma_page_maddr(struct domain = *domain, return pte_maddr; } =20 -static paddr_t domain_pgd_maddr(struct domain *d, struct iommu_context *ct= x, - paddr_t pgd_maddr, unsigned int nr_pt_leve= ls) +static paddr_t get_context_pgd(struct domain *d, struct iommu_context *ctx, + unsigned int nr_pt_levels) { unsigned int agaw; + paddr_t pgd_maddr =3D ctx->arch.vtd.pgd_maddr; =20 - ASSERT(spin_is_locked(&ctx->arch.mapping_lock)); - - if ( pgd_maddr ) - /* nothing */; - else if ( iommu_use_hap_pt(d) ) + if ( !ctx->arch.vtd.pgd_maddr ) { - pagetable_t pgt =3D p2m_get_pagetable(p2m_get_hostp2m(d)); + /* + * Ensure we have pagetables allocated down to the smallest + * level the loop below may need to run to. + */ + addr_to_dma_page_maddr(d, ctx, 0, min_pt_levels, NULL, true); =20 - pgd_maddr =3D pagetable_get_paddr(pgt); - } - else - { if ( !ctx->arch.vtd.pgd_maddr ) - { - /* - * Ensure we have pagetables allocated down to the smallest - * level the loop below may need to run to. - */ - addr_to_dma_page_maddr(d, ctx, 0, min_pt_levels, NULL, true); - - if ( !ctx->arch.vtd.pgd_maddr ) - return 0; - } - - pgd_maddr =3D ctx->arch.vtd.pgd_maddr; + return 0; } =20 /* Skip top level(s) of page tables for less-than-maximum level DRHDs.= */ @@ -568,17 +554,20 @@ static int __must_check iommu_flush_all(void) return rc; } =20 -static int __must_check cf_check iommu_flush_iotlb(struct domain *d, dfn_t= dfn, +static int __must_check cf_check iommu_flush_iotlb(struct domain *d, + struct iommu_context *c= tx, + dfn_t dfn, unsigned long page_coun= t, unsigned int flush_flag= s) { - struct iommu_context *ctx =3D iommu_default_context(d); struct acpi_drhd_unit *drhd; struct vtd_iommu *iommu; bool flush_dev_iotlb; int iommu_domid; int ret =3D 0; =20 + ASSERT(ctx); + if ( flush_flags & IOMMU_FLUSHF_all ) { dfn =3D INVALID_DFN; @@ -1239,7 +1228,8 @@ void __init iommu_free(struct acpi_drhd_unit *drhd) agaw =3D 64; \ agaw; }) =20 -static int cf_check intel_iommu_context_init(struct domain *d, struct iomm= u_context *ctx) +static int cf_check intel_iommu_context_init(struct domain *d, struct iomm= u_context *ctx, + u32 flags) { struct acpi_drhd_unit *drhd; =20 @@ -1254,6 +1244,27 @@ static int cf_check intel_iommu_context_init(struct = domain *d, struct iommu_cont return -ENOMEM; } =20 + ctx->arch.vtd.superpage_progress =3D 0; + + if ( flags & IOMMU_CONTEXT_INIT_default ) + { + ctx->arch.vtd.pgd_maddr =3D 0; + + /* + * Context is considered "opaque" (non-managed) in these cases : + * - HAP is enabled, in this case, the pagetable is not managed b= y the + * IOMMU code, thus opaque + * - IOMMU is in passthrough which means that there is no actual = pagetable + */ + if ( iommu_use_hap_pt(d) ) + { + pagetable_t pgt =3D p2m_get_pagetable(p2m_get_hostp2m(d)); + ctx->arch.vtd.pgd_maddr =3D pagetable_get_paddr(pgt); + + ctx->opaque =3D true; + } + } + // TODO: Allocate IOMMU domid only when attaching devices ? /* Populate context DID map using pseudo DIDs */ for_each_drhd_unit(drhd) @@ -1262,7 +1273,11 @@ static int cf_check intel_iommu_context_init(struct = domain *d, struct iommu_cont iommu_alloc_domid(drhd->iommu->domid_bitmap); } =20 - return arch_iommu_context_init(d, ctx, 0); + if ( !ctx->opaque ) + /* Create initial context page */ + addr_to_dma_page_maddr(d, ctx, 0, min_pt_levels, NULL, true); + + return arch_iommu_context_init(d, ctx, flags); } =20 static int cf_check intel_iommu_domain_init(struct domain *d) @@ -1271,7 +1286,7 @@ static int cf_check intel_iommu_domain_init(struct do= main *d) =20 hd->arch.vtd.agaw =3D width_to_agaw(DEFAULT_DOMAIN_ADDRESS_WIDTH); =20 - return intel_iommu_context_init(d, iommu_default_context(d)); + return 0; } =20 static void __hwdom_init cf_check intel_iommu_hwdom_init(struct domain *d) @@ -1279,7 +1294,7 @@ static void __hwdom_init cf_check intel_iommu_hwdom_i= nit(struct domain *d) struct acpi_drhd_unit *drhd; =20 setup_hwdom_pci_devices(d, setup_hwdom_device); - setup_hwdom_rmrr(d); + /* Make sure workarounds are applied before enabling the IOMMU(s). */ arch_iommu_hwdom_init(d); =20 @@ -1296,21 +1311,17 @@ static void __hwdom_init cf_check intel_iommu_hwdom= _init(struct domain *d) } } =20 -/* - * This function returns - * - a negative errno value upon error, - * - zero upon success when previously the entry was non-present, or this = isn't - * the "main" request for a device (pdev =3D=3D NULL), or for no-op quar= antining - * assignments, - * - positive (one) upon success when previously the entry was present and= this - * is the "main" request for a device (pdev !=3D NULL). +/** + * Apply a context on a device. + * @param domain Domain of the context + * @param ctx IOMMU context to apply + * @param iommu IOMMU hardware to use (must match device iommu) + * @param bus PCI device bus + * @param devfn PCI device function */ -int domain_context_mapping_one( - struct domain *domain, - struct iommu_context *ctx, - struct vtd_iommu *iommu, - uint8_t bus, uint8_t devfn, const struct pci_dev *pdev, - domid_t domid, paddr_t pgd_maddr, unsigned int mode) +int apply_context_single(struct domain *domain, struct iommu_context *ctx, + struct vtd_iommu *iommu, uint8_t bus, uint8_t dev= fn, + struct iommu_context *prev_ctx) { struct context_entry *context, *context_entries, lctxt; __uint128_t res, old; @@ -1319,8 +1330,6 @@ int domain_context_mapping_one( int rc, ret; bool flush_dev_iotlb, overwrite_entry =3D false; =20 - struct iommu_context *prev_ctx =3D pdev->domain ? iommu_default_contex= t(pdev->domain) : NULL; - ASSERT(pcidevs_locked()); spin_lock(&iommu->lock); maddr =3D bus_to_context_maddr(iommu, bus); @@ -1336,7 +1345,7 @@ int domain_context_mapping_one( overwrite_entry =3D true; } =20 - if ( iommu_hwdom_passthrough && is_hardware_domain(domain) ) + if ( iommu_hwdom_passthrough && is_hardware_domain(domain) && !ctx->id= ) { context_set_translation_type(lctxt, CONTEXT_TT_PASS_THRU); } @@ -1344,9 +1353,7 @@ int domain_context_mapping_one( { paddr_t root; =20 - spin_lock(&ctx->arch.mapping_lock); - - root =3D domain_pgd_maddr(domain, ctx, pgd_maddr, iommu->nr_pt_lev= els); + root =3D get_context_pgd(domain, ctx, iommu->nr_pt_levels); if ( !root ) { unmap_vtd_domain_page(context_entries); @@ -1358,8 +1365,6 @@ int domain_context_mapping_one( context_set_translation_type(lctxt, CONTEXT_TT_DEV_IOTLB); else context_set_translation_type(lctxt, CONTEXT_TT_MULTI_LEVEL); - - spin_unlock(&ctx->arch.mapping_lock); } =20 rc =3D context_set_domain_id(&lctxt, did, iommu); @@ -1388,7 +1393,6 @@ int domain_context_mapping_one( } =20 iommu_sync_cache(context, sizeof(struct context_entry)); - spin_unlock(&iommu->lock); =20 rc =3D iommu_flush_context_device(iommu, prev_did, PCI_BDF(bus, devfn), DMA_CCMD_MASK_NOBIT, !overwrite_entry); @@ -1422,7 +1426,7 @@ int domain_context_mapping_one( spin_unlock(&iommu->lock); =20 if ( !seg && !rc ) - rc =3D me_wifi_quirk(domain, bus, devfn, domid, pgd_maddr, mode); + rc =3D me_wifi_quirk(domain, bus, devfn, did, 0, ctx, prev_ctx); =20 return rc; =20 @@ -1432,152 +1436,32 @@ int domain_context_mapping_one( return rc; } =20 -static const struct acpi_drhd_unit *domain_context_unmap( - struct domain *d, uint8_t devfn, struct pci_dev *pdev); - -static int domain_context_mapping(struct domain *domain, struct iommu_cont= ext *ctx, - u8 devfn, struct pci_dev *pdev) +int apply_context(struct domain *d, struct iommu_context *ctx, + struct pci_dev *pdev, u8 devfn, + struct iommu_context *prev_ctx) { - const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); - const struct acpi_rmrr_unit *rmrr; - paddr_t pgd_maddr =3D ctx->arch.vtd.pgd_maddr; - domid_t did =3D ctx->arch.vtd.didmap[drhd->iommu->index]; + struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev); + struct vtd_iommu *iommu =3D drhd->iommu; int ret =3D 0; - unsigned int i, mode =3D 0; - uint16_t seg =3D pdev->seg, bdf; - uint8_t bus =3D pdev->bus, secbus; - - /* - * Generally we assume only devices from one node to get assigned to a - * given guest. But even if not, by replacing the prior value here we - * guarantee that at least some basic allocations for the device being - * added will get done against its node. Any further allocations for - * this or other devices may be penalized then, but some would also be - * if we left other than NUMA_NO_NODE untouched here. - */ - if ( drhd && drhd->iommu->node !=3D NUMA_NO_NODE ) - dom_iommu(domain)->node =3D drhd->iommu->node; =20 - ASSERT(pcidevs_locked()); + if ( !drhd ) + return -EINVAL; =20 - for_each_rmrr_device( rmrr, bdf, i ) + if ( pdev->type =3D=3D DEV_TYPE_PCI_HOST_BRIDGE || + pdev->type =3D=3D DEV_TYPE_PCIe_BRIDGE || + pdev->type =3D=3D DEV_TYPE_PCIe2PCI_BRIDGE || + pdev->type =3D=3D DEV_TYPE_LEGACY_PCI_BRIDGE ) { - if ( rmrr->segment !=3D pdev->seg || bdf !=3D pdev->sbdf.bdf ) - continue; - - mode |=3D MAP_WITH_RMRR; - break; + printk(XENLOG_WARNING VTDPREFIX " Ignoring apply_context on PCI br= idge\n"); + return 0; } =20 - if ( domain !=3D pdev->domain && pdev->domain !=3D dom_io && - pdev->domain->is_dying ) - mode |=3D MAP_OWNER_DYING; - - switch ( pdev->type ) - { - bool prev_present; - - case DEV_TYPE_PCI_HOST_BRIDGE: - if ( iommu_debug ) - printk(VTDPREFIX "%pd:Hostbridge: skip %pp map\n", - domain, &PCI_SBDF(seg, bus, devfn)); - if ( !is_hardware_domain(domain) ) - return -EPERM; - break; - - case DEV_TYPE_PCIe_BRIDGE: - case DEV_TYPE_PCIe2PCI_BRIDGE: - case DEV_TYPE_LEGACY_PCI_BRIDGE: - break; - - case DEV_TYPE_PCIe_ENDPOINT: - if ( !drhd ) - return -ENODEV; - - if ( iommu_debug ) - printk(VTDPREFIX "%pd:PCIe: map %pp\n", - domain, &PCI_SBDF(seg, bus, devfn)); - ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, bus, = devfn, pdev, - did, pgd_maddr, mode); - if ( ret > 0 ) - ret =3D 0; - if ( !ret && devfn =3D=3D pdev->devfn && ats_device(pdev, drhd) > = 0 ) - enable_ats_device(pdev, &drhd->iommu->ats_devices); - - break; - - case DEV_TYPE_PCI: - if ( !drhd ) - return -ENODEV; - - if ( iommu_debug ) - printk(VTDPREFIX "%pd:PCI: map %pp\n", - domain, &PCI_SBDF(seg, bus, devfn)); - - ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, bus, = devfn, - pdev, did, pgd_maddr, mode); - if ( ret < 0 ) - break; - prev_present =3D ret; - - if ( (ret =3D find_upstream_bridge(seg, &bus, &devfn, &secbus)) < = 1 ) - { - if ( !ret ) - break; - ret =3D -ENXIO; - } - /* - * Strictly speaking if the device is the only one behind this bri= dge - * and the only one with this (secbus,0,0) tuple, it could be allo= wed - * to be re-assigned regardless of RMRR presence. But let's deal = with - * that case only if it is actually found in the wild. Note that - * dealing with this just here would still not render the operation - * secure. - */ - else if ( prev_present && (mode & MAP_WITH_RMRR) && - domain !=3D pdev->domain ) - ret =3D -EOPNOTSUPP; - - /* - * Mapping a bridge should, if anything, pass the struct pci_dev of - * that bridge. Since bridges don't normally get assigned to guest= s, - * their owner would be the wrong one. Pass NULL instead. - */ - if ( ret >=3D 0 ) - ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, b= us, devfn, - NULL, did, pgd_maddr, mode); - - /* - * Devices behind PCIe-to-PCI/PCIx bridge may generate different - * requester-id. It may originate from devfn=3D0 on the secondary = bus - * behind the bridge. Map that id as well if we didn't already. - * - * Somewhat similar as for bridges, we don't want to pass a struct - * pci_dev here - there may not even exist one for this (secbus,0,= 0) - * tuple. If there is one, without properly working device groups = it - * may again not have the correct owner. - */ - if ( !ret && pdev_type(seg, bus, devfn) =3D=3D DEV_TYPE_PCIe2PCI_B= RIDGE && - (secbus !=3D pdev->bus || pdev->devfn !=3D 0) ) - ret =3D domain_context_mapping_one(domain, ctx, drhd->iommu, s= ecbus, 0, - NULL, did, pgd_maddr, mode); - - if ( ret ) - { - if ( !prev_present ) - domain_context_unmap(domain, devfn, pdev); - else if ( pdev->domain !=3D domain ) /* Avoid infinite recursi= on. */ - domain_context_mapping(pdev->domain, ctx, devfn, pdev); - } + ASSERT(pcidevs_locked()); =20 - break; + ret =3D apply_context_single(d, ctx, iommu, pdev->bus, pdev->devfn, pr= ev_ctx); =20 - default: - dprintk(XENLOG_ERR VTDPREFIX, "%pd:unknown(%u): %pp\n", - domain, pdev->type, &PCI_SBDF(seg, bus, devfn)); - ret =3D -EINVAL; - break; - } + if ( !ret && ats_device(pdev, drhd) > 0 ) + enable_ats_device(pdev, &iommu->ats_devices); =20 if ( !ret && devfn =3D=3D pdev->devfn ) pci_vtd_quirk(pdev); @@ -1585,10 +1469,8 @@ static int domain_context_mapping(struct domain *dom= ain, struct iommu_context *c return ret; } =20 -int domain_context_unmap_one( - struct domain *domain, - struct vtd_iommu *iommu, - uint8_t bus, uint8_t devfn) +int unapply_context_single(struct domain *domain, struct vtd_iommu *iommu, + struct iommu_context *prev_ctx, uint8_t bus, ui= nt8_t devfn) { struct context_entry *context, *context_entries; u64 maddr; @@ -1636,12 +1518,18 @@ int domain_context_unmap_one( if ( rc > 0 ) rc =3D 0; =20 + if ( !rc ) + { + BUG_ON(!prev_ctx->arch.vtd.iommu_dev_cnt[iommu->index]); + prev_ctx->arch.vtd.iommu_dev_cnt[iommu->index]--; + } + spin_unlock(&iommu->lock); unmap_vtd_domain_page(context_entries); =20 if ( !iommu->drhd->segment && !rc ) - rc =3D me_wifi_quirk(domain, bus, devfn, DOMID_INVALID, 0, - UNMAP_ME_PHANTOM_FUNC); + rc =3D me_wifi_quirk(domain, bus, devfn, DOMID_INVALID, UNMAP_ME_P= HANTOM_FUNC, + NULL, prev_ctx); =20 if ( rc && !is_hardware_domain(domain) && domain !=3D dom_io ) { @@ -1659,128 +1547,27 @@ int domain_context_unmap_one( return rc; } =20 -static const struct acpi_drhd_unit *domain_context_unmap( - struct domain *domain, - uint8_t devfn, - struct pci_dev *pdev) +static void cf_check iommu_clear_root_pgtable(struct domain *d, + struct iommu_context *ctx) { - const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); - struct vtd_iommu *iommu =3D drhd ? drhd->iommu : NULL; - int ret; - uint16_t seg =3D pdev->seg; - uint8_t bus =3D pdev->bus, tmp_bus, tmp_devfn, secbus; - - switch ( pdev->type ) - { - case DEV_TYPE_PCI_HOST_BRIDGE: - if ( iommu_debug ) - printk(VTDPREFIX "%pd:Hostbridge: skip %pp unmap\n", - domain, &PCI_SBDF(seg, bus, devfn)); - return ERR_PTR(is_hardware_domain(domain) ? 0 : -EPERM); - - case DEV_TYPE_PCIe_BRIDGE: - case DEV_TYPE_PCIe2PCI_BRIDGE: - case DEV_TYPE_LEGACY_PCI_BRIDGE: - return ERR_PTR(0); - - case DEV_TYPE_PCIe_ENDPOINT: - if ( !iommu ) - return ERR_PTR(-ENODEV); - - if ( iommu_debug ) - printk(VTDPREFIX "%pd:PCIe: unmap %pp\n", - domain, &PCI_SBDF(seg, bus, devfn)); - ret =3D domain_context_unmap_one(domain, iommu, bus, devfn); - if ( !ret && devfn =3D=3D pdev->devfn && ats_device(pdev, drhd) > = 0 ) - disable_ats_device(pdev); - - break; - - case DEV_TYPE_PCI: - if ( !iommu ) - return ERR_PTR(-ENODEV); - - if ( iommu_debug ) - printk(VTDPREFIX "%pd:PCI: unmap %pp\n", - domain, &PCI_SBDF(seg, bus, devfn)); - ret =3D domain_context_unmap_one(domain, iommu, bus, devfn); - if ( ret ) - break; - - tmp_bus =3D bus; - tmp_devfn =3D devfn; - if ( (ret =3D find_upstream_bridge(seg, &tmp_bus, &tmp_devfn, - &secbus)) < 1 ) - { - if ( ret ) - { - ret =3D -ENXIO; - if ( !domain->is_dying && - !is_hardware_domain(domain) && domain !=3D dom_io ) - { - domain_crash(domain); - /* Make upper layers continue in a best effort manner.= */ - ret =3D 0; - } - } - break; - } - - ret =3D domain_context_unmap_one(domain, iommu, tmp_bus, tmp_devfn= ); - /* PCIe to PCI/PCIx bridge */ - if ( !ret && pdev_type(seg, tmp_bus, tmp_devfn) =3D=3D DEV_TYPE_PC= Ie2PCI_BRIDGE ) - ret =3D domain_context_unmap_one(domain, iommu, secbus, 0); - - break; - - default: - dprintk(XENLOG_ERR VTDPREFIX, "%pd:unknown(%u): %pp\n", - domain, pdev->type, &PCI_SBDF(seg, bus, devfn)); - return ERR_PTR(-EINVAL); - } - - return drhd; -} - -static void cf_check iommu_clear_root_pgtable(struct domain *d) -{ - struct iommu_context *ctx =3D iommu_default_context(d); - - spin_lock(&ctx->arch.mapping_lock); ctx->arch.vtd.pgd_maddr =3D 0; - spin_unlock(&ctx->arch.mapping_lock); } =20 static void cf_check iommu_domain_teardown(struct domain *d) { struct iommu_context *ctx =3D iommu_default_context(d); - const struct acpi_drhd_unit *drhd; =20 if ( list_empty(&acpi_drhd_units) ) return; =20 - iommu_identity_map_teardown(d, ctx); - ASSERT(!ctx->arch.vtd.pgd_maddr); - - for_each_drhd_unit ( drhd ) - iommu_free_domid(d->domain_id, drhd->iommu->domid_bitmap); - - XFREE(ctx->arch.vtd.iommu_dev_cnt); - XFREE(ctx->arch.vtd.didmap); -} - -static void quarantine_teardown(struct pci_dev *pdev, - const struct acpi_drhd_unit *drhd) -{ } =20 static int __must_check cf_check intel_iommu_map_page( struct domain *d, dfn_t dfn, mfn_t mfn, unsigned int flags, - unsigned int *flush_flags) + unsigned int *flush_flags, struct iommu_context *ctx) { struct domain_iommu *hd =3D dom_iommu(d); - struct iommu_context *ctx =3D iommu_default_context(d); struct dma_pte *page, *pte, old, new =3D {}; u64 pg_maddr; unsigned int level =3D (IOMMUF_order(flags) / LEVEL_STRIDE) + 1; @@ -1789,35 +1576,22 @@ static int __must_check cf_check intel_iommu_map_pa= ge( ASSERT((hd->platform_ops->page_sizes >> IOMMUF_order(flags)) & PAGE_SIZE_4K); =20 - /* Do nothing if VT-d shares EPT page table */ - if ( iommu_use_hap_pt(d) ) + if ( ctx->opaque ) return 0; =20 - /* Do nothing if hardware domain and iommu supports pass thru. */ - if ( iommu_hwdom_passthrough && is_hardware_domain(d) ) - return 0; - - spin_lock(&ctx->arch.mapping_lock); - /* * IOMMU mapping request can be safely ignored when the domain is dyin= g. * - * hd->arch.mapping_lock guarantees that d->is_dying will be observed + * hd->lock guarantees that d->is_dying will be observed * before any page tables are freed (see iommu_free_pgtables()) */ if ( d->is_dying ) - { - spin_unlock(&ctx->arch.mapping_lock); return 0; - } =20 pg_maddr =3D addr_to_dma_page_maddr(d, ctx, dfn_to_daddr(dfn), level, = flush_flags, true); if ( pg_maddr < PAGE_SIZE ) - { - spin_unlock(&ctx->arch.mapping_lock); return -ENOMEM; - } =20 page =3D (struct dma_pte *)map_vtd_domain_page(pg_maddr); pte =3D &page[address_level_offset(dfn_to_daddr(dfn), level)]; @@ -1836,7 +1610,6 @@ static int __must_check cf_check intel_iommu_map_page( =20 if ( !((old.val ^ new.val) & ~DMA_PTE_CONTIG_MASK) ) { - spin_unlock(&ctx->arch.mapping_lock); unmap_vtd_domain_page(page); return 0; } @@ -1879,7 +1652,6 @@ static int __must_check cf_check intel_iommu_map_page( perfc_incr(iommu_pt_coalesces); } =20 - spin_unlock(&ctx->arch.mapping_lock); unmap_vtd_domain_page(page); =20 *flush_flags |=3D IOMMU_FLUSHF_added; @@ -1896,10 +1668,10 @@ static int __must_check cf_check intel_iommu_map_pa= ge( } =20 static int __must_check cf_check intel_iommu_unmap_page( - struct domain *d, dfn_t dfn, unsigned int order, unsigned int *flush_f= lags) + struct domain *d, dfn_t dfn, unsigned int order, unsigned int *flush_f= lags, + struct iommu_context *ctx) { struct domain_iommu *hd =3D dom_iommu(d); - struct iommu_context *ctx =3D iommu_default_context(d); daddr_t addr =3D dfn_to_daddr(dfn); struct dma_pte *page =3D NULL, *pte =3D NULL, old; uint64_t pg_maddr; @@ -1911,20 +1683,13 @@ static int __must_check cf_check intel_iommu_unmap_= page( */ ASSERT((hd->platform_ops->page_sizes >> order) & PAGE_SIZE_4K); =20 - /* Do nothing if VT-d shares EPT page table */ - if ( iommu_use_hap_pt(d) ) + if ( ctx->opaque ) return 0; =20 - /* Do nothing if hardware domain and iommu supports pass thru. */ - if ( iommu_hwdom_passthrough && is_hardware_domain(d) ) - return 0; - - spin_lock(&ctx->arch.mapping_lock); /* get target level pte */ pg_maddr =3D addr_to_dma_page_maddr(d, ctx, addr, level, flush_flags, = false); if ( pg_maddr < PAGE_SIZE ) { - spin_unlock(&ctx->arch.mapping_lock); return pg_maddr ? -ENOMEM : 0; } =20 @@ -1933,7 +1698,6 @@ static int __must_check cf_check intel_iommu_unmap_pa= ge( =20 if ( !dma_pte_present(*pte) ) { - spin_unlock(&ctx->arch.mapping_lock); unmap_vtd_domain_page(page); return 0; } @@ -1964,8 +1728,6 @@ static int __must_check cf_check intel_iommu_unmap_pa= ge( perfc_incr(iommu_pt_coalesces); } =20 - spin_unlock(&ctx->arch.mapping_lock); - unmap_vtd_domain_page(page); =20 *flush_flags |=3D IOMMU_FLUSHF_modified; @@ -1978,25 +1740,16 @@ static int __must_check cf_check intel_iommu_unmap_= page( } =20 static int cf_check intel_iommu_lookup_page( - struct domain *d, dfn_t dfn, mfn_t *mfn, unsigned int *flags) + struct domain *d, dfn_t dfn, mfn_t *mfn, unsigned int *flags, + struct iommu_context *ctx) { - struct iommu_context *ctx =3D iommu_default_context(d); uint64_t val; =20 - /* - * If VT-d shares EPT page table or if the domain is the hardware - * domain and iommu_passthrough is set then pass back the dfn. - */ - if ( iommu_use_hap_pt(d) || - (iommu_hwdom_passthrough && is_hardware_domain(d)) ) + if ( ctx->opaque ) return -EOPNOTSUPP; =20 - spin_lock(&ctx->arch.mapping_lock); - val =3D addr_to_dma_page_maddr(d, ctx, dfn_to_daddr(dfn), 0, NULL, fal= se); =20 - spin_unlock(&ctx->arch.mapping_lock); - if ( val < PAGE_SIZE ) return -ENOENT; =20 @@ -2025,47 +1778,6 @@ static bool __init vtd_ept_page_compatible(const str= uct vtd_iommu *iommu) (cap_sps_1gb(vtd_cap) && iommu_superpages); } =20 -static int cf_check intel_iommu_add_device(u8 devfn, struct pci_dev *pdev) -{ - struct acpi_rmrr_unit *rmrr; - struct iommu_context *ctx; - u16 bdf; - int ret, i; - - ASSERT(pcidevs_locked()); - - if ( !pdev->domain ) - return -EINVAL; - - ctx =3D iommu_default_context(pdev->domain); - - for_each_rmrr_device ( rmrr, bdf, i ) - { - if ( rmrr->segment =3D=3D pdev->seg && bdf =3D=3D PCI_BDF(pdev->bu= s, devfn) ) - { - /* - * iommu_add_device() is only called for the hardware - * domain (see xen/drivers/passthrough/pci.c:pci_add_device()). - * Since RMRRs are always reserved in the e820 map for the har= dware - * domain, there shouldn't be a conflict. - */ - ret =3D iommu_identity_mapping(pdev->domain, ctx, p2m_access_r= w, - rmrr->base_address, rmrr->end_add= ress, - 0); - if ( ret ) - dprintk(XENLOG_ERR VTDPREFIX, "%pd: RMRR mapping failed\n", - pdev->domain); - } - } - - ret =3D domain_context_mapping(pdev->domain, ctx, devfn, pdev); - if ( ret ) - dprintk(XENLOG_ERR VTDPREFIX, "%pd: context mapping failed\n", - pdev->domain); - - return ret; -} - static int cf_check intel_iommu_enable_device(struct pci_dev *pdev) { struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev); @@ -2081,47 +1793,16 @@ static int cf_check intel_iommu_enable_device(struc= t pci_dev *pdev) return ret >=3D 0 ? 0 : ret; } =20 -static int cf_check intel_iommu_remove_device(u8 devfn, struct pci_dev *pd= ev) -{ - const struct acpi_drhd_unit *drhd; - struct acpi_rmrr_unit *rmrr; - u16 bdf; - unsigned int i; - struct iommu_context *ctx; - - if ( !pdev->domain ) - return -EINVAL; - - ctx =3D iommu_default_context(pdev->domain); - - drhd =3D domain_context_unmap(pdev->domain, devfn, pdev); - if ( IS_ERR(drhd) ) - return PTR_ERR(drhd); - - for_each_rmrr_device ( rmrr, bdf, i ) - { - if ( rmrr->segment !=3D pdev->seg || bdf !=3D PCI_BDF(pdev->bus, d= evfn) ) - continue; - - /* - * Any flag is nothing to clear these mappings but here - * its always safe and strict to set 0. - */ - iommu_identity_mapping(pdev->domain, ctx, p2m_access_x, rmrr->base= _address, - rmrr->end_address, 0); - } - - quarantine_teardown(pdev, drhd); - - return 0; -} - static int __hwdom_init cf_check setup_hwdom_device( u8 devfn, struct pci_dev *pdev) { - struct iommu_context *ctx =3D iommu_default_context(pdev->domain); + if (pdev->type =3D=3D DEV_TYPE_PCI_HOST_BRIDGE || + pdev->type =3D=3D DEV_TYPE_PCIe_BRIDGE || + pdev->type =3D=3D DEV_TYPE_PCIe2PCI_BRIDGE || + pdev->type =3D=3D DEV_TYPE_LEGACY_PCI_BRIDGE) + return 0; =20 - return domain_context_mapping(pdev->domain, ctx, devfn, pdev); + return iommu_attach_context(hardware_domain, pdev, 0); } =20 void clear_fault_bits(struct vtd_iommu *iommu) @@ -2291,35 +1972,53 @@ static int __must_check init_vtd_hw(bool resume) return iommu_flush_all(); } =20 -static void __hwdom_init setup_hwdom_rmrr(struct domain *d) +static void cf_check arch_iommu_dump_domain_contexts(struct domain *d) { - struct iommu_context *ctx =3D iommu_default_context(d); - struct acpi_rmrr_unit *rmrr; - u16 bdf; - int ret, i; + unsigned int i, iommu_no; + struct pci_dev *pdev; + struct iommu_context *ctx; + struct domain_iommu *hd =3D dom_iommu(d); =20 - pcidevs_lock(); - for_each_rmrr_device ( rmrr, bdf, i ) + if (d =3D=3D dom_io) + printk("d[IO] contexts\n"); + else + printk("d%hu contexts\n", d->domain_id); + + for (i =3D 0; i < (1 + hd->other_contexts.count); ++i) { - /* - * Here means we're add a device to the hardware domain. - * Since RMRRs are always reserved in the e820 map for the hardware - * domain, there shouldn't be a conflict. So its always safe and - * strict to set 0. - */ - ret =3D iommu_identity_mapping(d, ctx, p2m_access_rw, rmrr->base_a= ddress, - rmrr->end_address, 0); - if ( ret ) - dprintk(XENLOG_ERR VTDPREFIX, - "IOMMU: mapping reserved region failed\n"); + if ( (ctx =3D iommu_get_context(d, i)) ) + { + printk(" Context %d (%"PRIx64")\n", i, ctx->arch.vtd.pgd_maddr= ); + + for (iommu_no =3D 0; iommu_no < nr_iommus; iommu_no++) + printk(" IOMMU %u (used=3D%lu; did=3D%hu)\n", iommu_no, + ctx->arch.vtd.iommu_dev_cnt[iommu_no], + ctx->arch.vtd.didmap[iommu_no]); + + list_for_each_entry(pdev, &ctx->devices, context_list) + { + printk(" - %pp\n", &pdev->sbdf); + } + + iommu_put_context(ctx); + } } - pcidevs_unlock(); } =20 static struct iommu_state { uint32_t fectl; } *__read_mostly iommu_state; =20 +static void cf_check arch_iommu_dump_contexts(unsigned char key) +{ + struct domain *d; + + for_each_domain(d) + if (is_iommu_enabled(d)) + arch_iommu_dump_domain_contexts(d); + + arch_iommu_dump_domain_contexts(dom_io); +} static int __init cf_check vtd_setup(void) { struct acpi_drhd_unit *drhd; @@ -2449,6 +2148,7 @@ static int __init cf_check vtd_setup(void) iommu_ops.page_sizes |=3D large_sizes; =20 register_keyhandler('V', vtd_dump_iommu_info, "dump iommu info", 1); + register_keyhandler('X', arch_iommu_dump_contexts, "dump iommu context= s", 1); =20 return 0; =20 @@ -2463,173 +2163,6 @@ static int __init cf_check vtd_setup(void) return ret; } =20 -static int cf_check reassign_device_ownership( - struct domain *source, - struct domain *target, - u8 devfn, struct pci_dev *pdev) -{ - int ret; - - if ( !has_arch_pdevs(target) ) - vmx_pi_hooks_assign(target); - -#ifdef CONFIG_PV - /* - * Devices assigned to untrusted domains (here assumed to be any do= mU) - * can attempt to send arbitrary LAPIC/MSI messages. We are unprote= cted - * by the root complex unless interrupt remapping is enabled. - */ - if ( !iommu_intremap && !is_hardware_domain(target) && - !is_system_domain(target) ) - untrusted_msi =3D true; -#endif - - ret =3D domain_context_mapping(target, iommu_default_context(target), = devfn, pdev); - - if ( ret ) - { - if ( !has_arch_pdevs(target) ) - vmx_pi_hooks_deassign(target); - return ret; - } - - if ( devfn =3D=3D pdev->devfn && pdev->domain !=3D target ) - { - write_lock(&source->pci_lock); - list_del(&pdev->domain_list); - write_unlock(&source->pci_lock); - - pdev->domain =3D target; - - write_lock(&target->pci_lock); - list_add(&pdev->domain_list, &target->pdev_list); - write_unlock(&target->pci_lock); - } - - if ( !has_arch_pdevs(source) ) - vmx_pi_hooks_deassign(source); - - /* - * If the device belongs to the hardware domain, and it has RMRR, don't - * remove it from the hardware domain, because BIOS may use RMRR at - * booting time. - */ - if ( !is_hardware_domain(source) ) - { - const struct acpi_rmrr_unit *rmrr; - struct iommu_context *ctx =3D iommu_default_context(source); - u16 bdf; - unsigned int i; - - for_each_rmrr_device( rmrr, bdf, i ) - if ( rmrr->segment =3D=3D pdev->seg && - bdf =3D=3D PCI_BDF(pdev->bus, devfn) ) - { - /* - * Any RMRR flag is always ignored when remove a device, - * but its always safe and strict to set 0. - */ - ret =3D iommu_identity_mapping(source, ctx, p2m_access_x, - rmrr->base_address, - rmrr->end_address, 0); - if ( ret && ret !=3D -ENOENT ) - return ret; - } - } - - return 0; -} - -static int cf_check intel_iommu_assign_device( - struct domain *d, u8 devfn, struct pci_dev *pdev, u32 flag) -{ - struct domain *s =3D pdev->domain; - struct iommu_context *ctx =3D iommu_default_context(d); - struct acpi_rmrr_unit *rmrr; - int ret =3D 0, i; - u16 bdf, seg; - u8 bus; - - if ( list_empty(&acpi_drhd_units) ) - return -ENODEV; - - seg =3D pdev->seg; - bus =3D pdev->bus; - /* - * In rare cases one given rmrr is shared by multiple devices but - * obviously this would put the security of a system at risk. So - * we would prevent from this sort of device assignment. But this - * can be permitted if user set - * "pci =3D [ 'sbdf, rdm_policy=3Drelaxed' ]" - * - * TODO: in the future we can introduce group device assignment - * interface to make sure devices sharing RMRR are assigned to the - * same domain together. - */ - for_each_rmrr_device( rmrr, bdf, i ) - { - if ( rmrr->segment =3D=3D seg && bdf =3D=3D PCI_BDF(bus, devfn) && - rmrr->scope.devices_cnt > 1 ) - { - bool relaxed =3D flag & XEN_DOMCTL_DEV_RDM_RELAXED; - - printk(XENLOG_GUEST "%s" VTDPREFIX - " It's %s to assign %pp" - " with shared RMRR at %"PRIx64" for %pd.\n", - relaxed ? XENLOG_WARNING : XENLOG_ERR, - relaxed ? "risky" : "disallowed", - &PCI_SBDF(seg, bus, devfn), rmrr->base_address, d); - if ( !relaxed ) - return -EPERM; - } - } - - /* Setup rmrr identity mapping */ - for_each_rmrr_device( rmrr, bdf, i ) - { - if ( rmrr->segment =3D=3D seg && bdf =3D=3D PCI_BDF(bus, devfn) ) - { - ret =3D iommu_identity_mapping(d, ctx, p2m_access_rw, rmrr->ba= se_address, - rmrr->end_address, flag); - if ( ret ) - { - printk(XENLOG_G_ERR VTDPREFIX - "%pd: cannot map reserved region [%"PRIx64",%"PRIx6= 4"]: %d\n", - d, rmrr->base_address, rmrr->end_address, ret); - break; - } - } - } - - if ( !ret ) - ret =3D reassign_device_ownership(s, d, devfn, pdev); - - /* See reassign_device_ownership() for the hwdom aspect. */ - if ( !ret || is_hardware_domain(d) ) - return ret; - - for_each_rmrr_device( rmrr, bdf, i ) - { - if ( rmrr->segment =3D=3D seg && bdf =3D=3D PCI_BDF(bus, devfn) ) - { - int rc =3D iommu_identity_mapping(d, ctx, p2m_access_x, - rmrr->base_address, - rmrr->end_address, 0); - - if ( rc && rc !=3D -ENOENT ) - { - printk(XENLOG_ERR VTDPREFIX - "%pd: cannot unmap reserved region [%"PRIx64",%"PRI= x64"]: %d\n", - d, rmrr->base_address, rmrr->end_address, rc); - domain_crash(d); - break; - } - } - } - - return ret; -} - static int cf_check intel_iommu_group_id(u16 seg, u8 bus, u8 devfn) { u8 secbus; @@ -2754,6 +2287,11 @@ static void vtd_dump_page_table_level(paddr_t pt_mad= dr, int level, paddr_t gpa, if ( level < 1 ) return; =20 + if (pt_maddr =3D=3D 0) { + printk(" (empty)\n"); + return; + } + pt_vaddr =3D map_vtd_domain_page(pt_maddr); =20 next_level =3D level - 1; @@ -2785,35 +2323,305 @@ static void vtd_dump_page_table_level(paddr_t pt_m= addr, int level, paddr_t gpa, static void cf_check vtd_dump_page_tables(struct domain *d) { const struct domain_iommu *hd =3D dom_iommu(d); - struct iommu_context *ctx =3D iommu_default_context(d); + unsigned int i; =20 - printk(VTDPREFIX" %pd table has %d levels\n", d, + printk(VTDPREFIX " %pd table has %d levels\n", d, agaw_to_level(hd->arch.vtd.agaw)); - vtd_dump_page_table_level(ctx->arch.vtd.pgd_maddr, - agaw_to_level(hd->arch.vtd.agaw), 0, 0); + + for (i =3D 1; i < (1 + hd->other_contexts.count); ++i) + { + struct iommu_context *ctx =3D iommu_get_context(d, i); + + printk(VTDPREFIX " %pd context %d: %s\n", d, i, + ctx ? "allocated" : "non-allocated"); + + if (ctx) + { + vtd_dump_page_table_level(ctx->arch.vtd.pgd_maddr, + agaw_to_level(hd->arch.vtd.agaw), 0,= 0); + iommu_put_context(ctx); + } + } +} + +static int intel_iommu_cleanup_pte(uint64_t pte_maddr, bool preempt) +{ + size_t i; + struct dma_pte *pte =3D map_vtd_domain_page(pte_maddr); + + for (i =3D 0; i < (1 << PAGETABLE_ORDER); ++i) + if ( dma_pte_present(pte[i]) ) + { + /* Remove the reference of the target mapping (if needed) */ + mfn_t mfn =3D maddr_to_mfn(dma_pte_addr(pte[i])); + + if ( mfn_valid(mfn) ) + put_page(mfn_to_page(mfn)); + + if ( preempt ) + dma_clear_pte(pte[i]); + } + + unmap_vtd_domain_page(pte); + + return 0; +} + +/** + * Cleanup logic : + * Walk through the entire page table, progressively removing mappings if = preempt. + * + * Return values : + * - Report preemption with -ERESTART. + * - Report empty pte/pgd with 0. + * + * When preempted during superpage operation, store state in vtd.superpage= _progress. + */ + +static int intel_iommu_cleanup_superpage(struct iommu_context *ctx, + unsigned int page_order, uint64_= t pte_maddr, + bool preempt) +{ + size_t i =3D 0, page_count =3D 1 << page_order; + struct page_info *page =3D maddr_to_page(pte_maddr); + + if ( preempt ) + i =3D ctx->arch.vtd.superpage_progress; + + for (; i < page_count; page++) + { + put_page(page); + + if ( preempt && (i & 0xff) && general_preempt_check() ) + { + ctx->arch.vtd.superpage_progress =3D i + 1; + return -ERESTART; + } + } + + if ( preempt ) + ctx->arch.vtd.superpage_progress =3D 0; + + return 0; +} + +static int intel_iommu_cleanup_mappings(struct iommu_context *ctx, + unsigned int nr_pt_levels, uint64= _t pgd_maddr, + bool preempt) +{ + size_t i; + int rc; + struct dma_pte *pgd; + + if ( ctx->opaque ) + /* don't touch opaque contexts */ + return 0; + + pgd =3D map_vtd_domain_page(pgd_maddr); + + for (i =3D 0; i < (1 << PAGETABLE_ORDER); ++i) + { + if ( dma_pte_present(pgd[i]) ) + { + uint64_t pte_maddr =3D dma_pte_addr(pgd[i]); + + if ( dma_pte_superpage(pgd[i]) ) + rc =3D intel_iommu_cleanup_superpage(ctx, nr_pt_levels * S= UPERPAGE_ORDER, + pte_maddr, preempt); + else if ( nr_pt_levels > 2 ) + /* Next level is not PTE */ + rc =3D intel_iommu_cleanup_mappings(ctx, nr_pt_levels - 1, + pte_maddr, preempt); + else + rc =3D intel_iommu_cleanup_pte(pte_maddr, preempt); + + if ( preempt && !rc ) + /* Fold pgd (no more mappings in it) */ + dma_clear_pte(pgd[i]); + else if ( preempt && (rc =3D=3D -ERESTART || general_preempt_c= heck()) ) + { + unmap_vtd_domain_page(pgd); + return -ERESTART; + } + } + } + + unmap_vtd_domain_page(pgd); + + return 0; } =20 -static int cf_check intel_iommu_quarantine_init(struct pci_dev *pdev, - bool scratch_page) +static int cf_check intel_iommu_context_teardown(struct domain *d, + struct iommu_context *ctx, u32 fla= gs) { + struct acpi_drhd_unit *drhd; + pcidevs_lock(); + + // Cleanup mappings + if ( intel_iommu_cleanup_mappings(ctx, agaw_to_level(d->iommu.arch.vtd= .agaw), + ctx->arch.vtd.pgd_maddr, + flags & IOMMUF_preempt) < 0 ) + { + pcidevs_unlock(); + return -ERESTART; + } + + ASSERT(ctx->arch.vtd.didmap); + + for_each_drhd_unit(drhd) + { + unsigned long index =3D drhd->iommu->index; + + iommu_free_domid(ctx->arch.vtd.didmap[index], drhd->iommu->domid_b= itmap); + } + + xfree(ctx->arch.vtd.didmap); + + pcidevs_unlock(); + return arch_iommu_context_teardown(d, ctx, flags); +} + +static int intel_iommu_dev_rmrr(struct domain *d, struct pci_dev *pdev, + struct iommu_context *ctx, bool unmap) +{ + struct acpi_rmrr_unit *rmrr; + u16 bdf; + int ret, i; + + for_each_rmrr_device(rmrr, bdf, i) + { + if ( PCI_SBDF(rmrr->segment, bdf).sbdf =3D=3D pdev->sbdf.sbdf ) + { + ret =3D iommu_identity_mapping(d, ctx, + unmap ? p2m_access_x : p2m_access= _rw, + rmrr->base_address, rmrr->end_add= ress, + 0); + + if ( ret < 0 ) + return ret; + } + } + return 0; } =20 +static int cf_check intel_iommu_attach(struct domain *d, struct pci_dev *p= dev, + struct iommu_context *ctx) +{ + int ret; + const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); + + if ( !pdev || !drhd ) + return -EINVAL; + + ret =3D intel_iommu_dev_rmrr(d, pdev, ctx, false); + + if ( ret ) + return ret; + + ret =3D apply_context(d, ctx, pdev, pdev->devfn, NULL); + + if ( ret ) + return ret; + + pci_vtd_quirk(pdev); + + return ret; +} + +static int cf_check intel_iommu_detach(struct domain *d, struct pci_dev *p= dev, + struct iommu_context *prev_ctx) +{ + int ret, rc; + const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); + + if (!pdev || !drhd) + return -EINVAL; + + ret =3D unapply_context_single(d, drhd->iommu, prev_ctx, pdev->bus, pd= ev->devfn); + + if ( ret ) + return ret; + + if ( (rc =3D intel_iommu_dev_rmrr(d, pdev, prev_ctx, true)) ) + printk(XENLOG_WARNING VTDPREFIX + " Unable to unmap RMRR from d%dc%d for %pp (%d)\n", + d->domain_id, prev_ctx->id, &pdev->sbdf, rc); + + return ret; +} + +static int cf_check intel_iommu_reattach(struct domain *d, + struct pci_dev *pdev, + struct iommu_context *prev_ctx, + struct iommu_context *ctx) +{ + int ret, rc; + const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); + + if (!pdev || !drhd) + return -EINVAL; + + ret =3D intel_iommu_dev_rmrr(d, pdev, ctx, false); + + if ( ret ) + return ret; + + ret =3D apply_context(d, ctx, pdev, pdev->devfn, prev_ctx); + + if ( ret ) + return ret; + + if ( (rc =3D intel_iommu_dev_rmrr(d, pdev, prev_ctx, true)) ) + printk(XENLOG_WARNING VTDPREFIX + " Unable to unmap RMRR from d%dc%d for %pp (%d)\n", + d->domain_id, prev_ctx->id, &pdev->sbdf, rc); + + pci_vtd_quirk(pdev); + + return ret; +} + +static int cf_check intel_iommu_add_devfn(struct domain *d, + struct pci_dev *pdev, u16 devfn, + struct iommu_context *ctx) +{ + const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); + + if ( !pdev || !drhd ) + return -EINVAL; + + return apply_context(d, ctx, pdev, devfn, NULL); +} + +static int cf_check intel_iommu_remove_devfn(struct domain *d, struct pci_= dev *pdev, + u16 devfn) +{ + const struct acpi_drhd_unit *drhd =3D acpi_find_matched_drhd_unit(pdev= ); + + if ( !pdev || !drhd ) + return -EINVAL; + + return unapply_context_single(d, drhd->iommu, NULL, pdev->bus, devfn); +} + static const struct iommu_ops __initconst_cf_clobber vtd_ops =3D { .page_sizes =3D PAGE_SIZE_4K, .init =3D intel_iommu_domain_init, .hwdom_init =3D intel_iommu_hwdom_init, - .quarantine_init =3D intel_iommu_quarantine_init, - .add_device =3D intel_iommu_add_device, + .context_init =3D intel_iommu_context_init, + .context_teardown =3D intel_iommu_context_teardown, + .attach =3D intel_iommu_attach, + .detach =3D intel_iommu_detach, + .reattach =3D intel_iommu_reattach, + .add_devfn =3D intel_iommu_add_devfn, + .remove_devfn =3D intel_iommu_remove_devfn, .enable_device =3D intel_iommu_enable_device, - .remove_device =3D intel_iommu_remove_device, - .assign_device =3D intel_iommu_assign_device, .teardown =3D iommu_domain_teardown, .clear_root_pgtable =3D iommu_clear_root_pgtable, .map_page =3D intel_iommu_map_page, .unmap_page =3D intel_iommu_unmap_page, .lookup_page =3D intel_iommu_lookup_page, - .reassign_device =3D reassign_device_ownership, .get_device_group_id =3D intel_iommu_group_id, .enable_x2apic =3D intel_iommu_enable_eim, .disable_x2apic =3D intel_iommu_disable_eim, diff --git a/xen/drivers/passthrough/vtd/quirks.c b/xen/drivers/passthrough= /vtd/quirks.c index 7937eb8c2b..0c8a6d73dd 100644 --- a/xen/drivers/passthrough/vtd/quirks.c +++ b/xen/drivers/passthrough/vtd/quirks.c @@ -408,9 +408,9 @@ void __init platform_quirks_init(void) =20 static int __must_check map_me_phantom_function(struct domain *domain, unsigned int dev, - domid_t domid, - paddr_t pgd_maddr, - unsigned int mode) + unsigned int mode, + struct iommu_context *ctx, + struct iommu_context *prev= _ctx) { struct acpi_drhd_unit *drhd; struct pci_dev *pdev; @@ -422,19 +422,17 @@ static int __must_check map_me_phantom_function(struc= t domain *domain, =20 /* map or unmap ME phantom function */ if ( !(mode & UNMAP_ME_PHANTOM_FUNC) ) - rc =3D domain_context_mapping_one(domain, iommu_default_context(do= main), - drhd->iommu, 0, - PCI_DEVFN(dev, 7), NULL, - domid, pgd_maddr, mode); + rc =3D apply_context_single(domain, ctx, drhd->iommu, 0, + PCI_DEVFN(dev, 7), prev_ctx); else - rc =3D domain_context_unmap_one(domain, drhd->iommu, 0, - PCI_DEVFN(dev, 7)); + rc =3D unapply_context_single(domain, drhd->iommu, prev_ctx, 0, PC= I_DEVFN(dev, 7)); =20 return rc; } =20 int me_wifi_quirk(struct domain *domain, uint8_t bus, uint8_t devfn, - domid_t domid, paddr_t pgd_maddr, unsigned int mode) + domid_t domid, unsigned int mode, + struct iommu_context *ctx, struct iommu_context *prev_ct= x) { u32 id; int rc =3D 0; @@ -458,7 +456,7 @@ int me_wifi_quirk(struct domain *domain, uint8_t bus, u= int8_t devfn, case 0x423b8086: case 0x423c8086: case 0x423d8086: - rc =3D map_me_phantom_function(domain, 3, domid, pgd_maddr= , mode); + rc =3D map_me_phantom_function(domain, 3, mode, ctx, prev_= ctx); break; default: break; @@ -484,7 +482,7 @@ int me_wifi_quirk(struct domain *domain, uint8_t bus, u= int8_t devfn, case 0x42388086: /* Puma Peak */ case 0x422b8086: case 0x422c8086: - rc =3D map_me_phantom_function(domain, 22, domid, pgd_madd= r, mode); + rc =3D map_me_phantom_function(domain, 22, mode, ctx, prev= _ctx); break; default: break; diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/= x86/iommu.c index 730a75e628..7b7fac0db8 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -12,6 +12,12 @@ * this program; If not, see . */ =20 +#include +#include +#include +#include +#include +#include #include #include #include @@ -19,7 +25,6 @@ #include #include #include -#include #include #include #include @@ -29,6 +34,9 @@ #include #include #include +#include +#include +#include =20 const struct iommu_init_ops *__initdata iommu_init_ops; struct iommu_ops __ro_after_init iommu_ops; @@ -192,8 +200,6 @@ int arch_iommu_domain_init(struct domain *d) =20 int arch_iommu_context_init(struct domain *d, struct iommu_context *ctx, u= 32 flags) { - spin_lock_init(&ctx->arch.mapping_lock); - INIT_PAGE_LIST_HEAD(&ctx->arch.pgtables); INIT_LIST_HEAD(&ctx->arch.identity_maps); =20 @@ -220,6 +226,95 @@ struct identity_map { unsigned int count; }; =20 +static int unmap_identity_region(struct domain *d, struct iommu_context *c= tx, + unsigned int base_pfn, unsigned int end_p= fn) +{ + int ret =3D 0; + + if ( ctx->opaque && !ctx->id ) + { + #ifdef CONFIG_HVM + this_cpu(iommu_dont_flush_iotlb) =3D true; + while ( base_pfn < end_pfn ) + { + if ( p2m_remove_identity_entry(d, base_pfn) ) + ret =3D -ENXIO; + + base_pfn++; + } + this_cpu(iommu_dont_flush_iotlb) =3D false; + #else + ASSERT_UNREACHABLE(); + #endif + } + else + { + size_t page_count =3D end_pfn - base_pfn + 1; + unsigned int flush_flags; + + ret =3D iommu_unmap(d, _dfn(base_pfn), page_count, 0, &flush_flags, + ctx->id); + + if ( ret ) + return ret; + + ret =3D iommu_iotlb_flush(d, _dfn(base_pfn), page_count, + flush_flags, ctx->id); + } + + return ret; +} + +static int map_identity_region(struct domain *d, struct iommu_context *ctx, + unsigned int base_pfn, unsigned int end_pfn, + p2m_access_t p2ma, unsigned int flag) +{ + int ret =3D 0; + unsigned int flush_flags =3D 0; + size_t page_count =3D end_pfn - base_pfn + 1; + + if ( ctx->opaque && !ctx->id ) + { + #ifdef CONFIG_HVM + int i; + this_cpu(iommu_dont_flush_iotlb) =3D true; + + for (i =3D 0; i < page_count; i++) + { + ret =3D p2m_add_identity_entry(d, base_pfn + i, p2ma, flag); + + if ( ret ) + break; + + base_pfn++; + } + this_cpu(iommu_dont_flush_iotlb) =3D false; + #else + ASSERT_UNREACHABLE(); + #endif + } + else + { + int i; + + for (i =3D 0; i < page_count; i++) + { + ret =3D iommu_map(d, _dfn(base_pfn + i), _mfn(base_pfn + i), 1, + p2m_access_to_iommu_flags(p2ma), &flush_flags, + ctx->id); + + if ( ret ) + break; + } + } + + ret =3D iommu_iotlb_flush(d, _dfn(base_pfn), page_count, flush_flags, + ctx->id); + + return ret; +} + +/* p2m_access_x removes the mapping */ int iommu_identity_mapping(struct domain *d, struct iommu_context *ctx, p2m_access_t p2ma, paddr_t base, paddr_t end, unsigned int flag) @@ -227,24 +322,20 @@ int iommu_identity_mapping(struct domain *d, struct i= ommu_context *ctx, unsigned long base_pfn =3D base >> PAGE_SHIFT_4K; unsigned long end_pfn =3D PAGE_ALIGN_4K(end) >> PAGE_SHIFT_4K; struct identity_map *map; + int ret =3D 0; =20 ASSERT(pcidevs_locked()); ASSERT(base < end); =20 - /* - * No need to acquire hd->arch.mapping_lock: Both insertion and removal - * get done while holding pcidevs_lock. - */ list_for_each_entry( map, &ctx->arch.identity_maps, list ) { if ( map->base =3D=3D base && map->end =3D=3D end ) { - int ret =3D 0; - if ( p2ma !=3D p2m_access_x ) { if ( map->access !=3D p2ma ) return -EADDRINUSE; + ++map->count; return 0; } @@ -252,12 +343,9 @@ int iommu_identity_mapping(struct domain *d, struct io= mmu_context *ctx, if ( --map->count ) return 0; =20 - while ( base_pfn < end_pfn ) - { - if ( clear_identity_p2m_entry(d, base_pfn) ) - ret =3D -ENXIO; - base_pfn++; - } + printk("Unmapping [%"PRI_mfn"x:%"PRI_mfn"] for d%dc%d\n", base= _pfn, end_pfn, + d->domain_id, ctx->id); + ret =3D unmap_identity_region(d, ctx, base_pfn, end_pfn); =20 list_del(&map->list); xfree(map); @@ -281,27 +369,17 @@ int iommu_identity_mapping(struct domain *d, struct i= ommu_context *ctx, map->access =3D p2ma; map->count =3D 1; =20 - /* - * Insert into list ahead of mapping, so the range can be found when - * trying to clean up. - */ - list_add_tail(&map->list, &ctx->arch.identity_maps); + printk("Mapping [%"PRI_mfn"x:%"PRI_mfn"] for d%dc%d\n", base_pfn, end_= pfn, + d->domain_id, ctx->id); + ret =3D map_identity_region(d, ctx, base_pfn, end_pfn, p2ma, flag); =20 - for ( ; base_pfn < end_pfn; ++base_pfn ) + if ( ret ) { - int err =3D set_identity_p2m_entry(d, base_pfn, p2ma, flag); - - if ( !err ) - continue; - - if ( (map->base >> PAGE_SHIFT_4K) =3D=3D base_pfn ) - { - list_del(&map->list); - xfree(map); - } - return err; + xfree(map); + return ret; } =20 + list_add(&map->list, &ctx->arch.identity_maps); return 0; } =20 @@ -373,7 +451,7 @@ static int __hwdom_init cf_check identity_map(unsigned = long s, unsigned long e, if ( iomem_access_permitted(d, s, s) ) { rc =3D iommu_map(d, _dfn(s), _mfn(s), 1, perms, - &info->flush_flags); + &info->flush_flags, 0); if ( rc < 0 ) break; /* Must map a frame at least, which is what we request for= . */ @@ -383,7 +461,7 @@ static int __hwdom_init cf_check identity_map(unsigned = long s, unsigned long e, s++; } while ( (rc =3D iommu_map(d, _dfn(s), _mfn(s), e - s + 1, - perms, &info->flush_flags)) > 0 ) + perms, &info->flush_flags, 0)) > 0 ) { s +=3D rc; process_pending_softirqs(); @@ -543,7 +621,7 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *= d) map_data.mmio_ro ? "read-only " : "", rc); =20 /* Use if to avoid compiler warning */ - if ( iommu_iotlb_flush_all(d, map_data.flush_flags) ) + if ( iommu_iotlb_flush_all(d, 0, map_data.flush_flags) ) return; } =20 @@ -600,14 +678,11 @@ int iommu_free_pgtables(struct domain *d, struct iomm= u_context *ctx) if ( !is_iommu_enabled(d) ) return 0; =20 - /* After this barrier, no new IOMMU mappings can be inserted. */ - spin_barrier(&ctx->arch.mapping_lock); - /* * Pages will be moved to the free list below. So we want to * clear the root page-table to avoid any potential use after-free. */ - iommu_vcall(hd->platform_ops, clear_root_pgtable, d); + iommu_vcall(hd->platform_ops, clear_root_pgtable, d, ctx); =20 while ( (pg =3D page_list_remove_head(&ctx->arch.pgtables)) ) { diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h index 11d23cdafb..15250da119 100644 --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -161,11 +161,10 @@ enum */ long __must_check iommu_map(struct domain *d, dfn_t dfn0, mfn_t mfn0, unsigned long page_count, unsigned int flags, - unsigned int *flush_flags); + unsigned int *flush_flags, u16 ctx_id); long __must_check iommu_unmap(struct domain *d, dfn_t dfn0, unsigned long page_count, unsigned int flags, - unsigned int *flush_flags); - + unsigned int *flush_flags, u16 ctx_id); int __must_check iommu_legacy_map(struct domain *d, dfn_t dfn, mfn_t mfn, unsigned long page_count, unsigned int flags); @@ -173,12 +172,13 @@ int __must_check iommu_legacy_unmap(struct domain *d,= dfn_t dfn, unsigned long page_count); =20 int __must_check iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn, - unsigned int *flags); + unsigned int *flags, u16 ctx_id); =20 int __must_check iommu_iotlb_flush(struct domain *d, dfn_t dfn, unsigned long page_count, - unsigned int flush_flags); -int __must_check iommu_iotlb_flush_all(struct domain *d, + unsigned int flush_flags, + u16 ctx_id); +int __must_check iommu_iotlb_flush_all(struct domain *d, u16 ctx_id, unsigned int flush_flags); =20 enum iommu_feature @@ -250,20 +250,30 @@ struct page_info; */ typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, u32 id, void *ct= xt); =20 +struct iommu_context; + struct iommu_ops { unsigned long page_sizes; int (*init)(struct domain *d); void (*hwdom_init)(struct domain *d); - int (*quarantine_init)(device_t *dev, bool scratch_page); - int (*add_device)(uint8_t devfn, device_t *dev); + int (*context_init)(struct domain *d, struct iommu_context *ctx, + u32 flags); + int (*context_teardown)(struct domain *d, struct iommu_context *ctx, + u32 flags); + int (*attach)(struct domain *d, device_t *dev, + struct iommu_context *ctx); + int (*detach)(struct domain *d, device_t *dev, + struct iommu_context *prev_ctx); + int (*reattach)(struct domain *d, device_t *dev, + struct iommu_context *prev_ctx, + struct iommu_context *ctx); + int (*enable_device)(device_t *dev); - int (*remove_device)(uint8_t devfn, device_t *dev); - int (*assign_device)(struct domain *d, uint8_t devfn, device_t *dev, - uint32_t flag); - int (*reassign_device)(struct domain *s, struct domain *t, - uint8_t devfn, device_t *dev); #ifdef CONFIG_HAS_PCI int (*get_device_group_id)(uint16_t seg, uint8_t bus, uint8_t devfn); + int (*add_devfn)(struct domain *d, struct pci_dev *pdev, u16 devfn, + struct iommu_context *ctx); + int (*remove_devfn)(struct domain *d, struct pci_dev *pdev, u16 devfn); #endif /* HAS_PCI */ =20 void (*teardown)(struct domain *d); @@ -274,12 +284,15 @@ struct iommu_ops { */ int __must_check (*map_page)(struct domain *d, dfn_t dfn, mfn_t mfn, unsigned int flags, - unsigned int *flush_flags); + unsigned int *flush_flags, + struct iommu_context *ctx); int __must_check (*unmap_page)(struct domain *d, dfn_t dfn, unsigned int order, - unsigned int *flush_flags); + unsigned int *flush_flags, + struct iommu_context *ctx); int __must_check (*lookup_page)(struct domain *d, dfn_t dfn, mfn_t *mf= n, - unsigned int *flags); + unsigned int *flags, + struct iommu_context *ctx); =20 #ifdef CONFIG_X86 int (*enable_x2apic)(void); @@ -292,14 +305,15 @@ struct iommu_ops { int (*setup_hpet_msi)(struct msi_desc *msi_desc); =20 void (*adjust_irq_affinities)(void); - void (*clear_root_pgtable)(struct domain *d); + void (*clear_root_pgtable)(struct domain *d, struct iommu_context *ctx= ); int (*update_ire_from_msi)(struct msi_desc *msi_desc, struct msi_msg *= msg); #endif /* CONFIG_X86 */ =20 int __must_check (*suspend)(void); void (*resume)(void); void (*crash_shutdown)(void); - int __must_check (*iotlb_flush)(struct domain *d, dfn_t dfn, + int __must_check (*iotlb_flush)(struct domain *d, + struct iommu_context *ctx, dfn_t dfn, unsigned long page_count, unsigned int flush_flags); int (*get_reserved_device_memory)(iommu_grdm_t *func, void *ctxt); @@ -346,15 +360,36 @@ extern int iommu_get_extra_reserved_device_memory(iom= mu_grdm_t *func, struct iommu_context { #ifdef CONFIG_HAS_PASSTHROUGH u16 id; /* Context id (0 means default context) */ + rspinlock_t lock; /* context lock */ + + struct list_head devices; =20 struct arch_iommu_context arch; + + bool opaque; /* context can't be modified nor accessed (e.g HAP) */ + bool dying; /* the context is tearing down */ #endif }; =20 +struct iommu_context_list { + atomic_t initialized; /* has/is context list being initialized ? */ + rwlock_t lock; /* prevent concurrent destruction and access of context= s */ + uint16_t count; /* Context count excluding default context */ + + /* if count > 0 */ + + uint64_t *bitmap; /* bitmap of context allocation */ + struct iommu_context *map; /* Map of contexts */ +}; + + struct domain_iommu { + #ifdef CONFIG_HAS_PASSTHROUGH struct arch_iommu arch; + struct iommu_context default_ctx; + struct iommu_context_list other_contexts; #endif =20 /* iommu_ops */ @@ -415,6 +450,8 @@ int __must_check iommu_suspend(void); void iommu_resume(void); void iommu_crash_shutdown(void); int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt); + +int __init iommu_quarantine_init(void); int iommu_quarantine_dev_init(device_t *dev); =20 #ifdef CONFIG_HAS_PCI @@ -424,6 +461,26 @@ int iommu_do_pci_domctl(struct xen_domctl *domctl, str= uct domain *d, =20 void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev); =20 + +struct iommu_context *iommu_get_context(struct domain *d, u16 ctx_id); +void iommu_put_context(struct iommu_context *ctx); + +#define IOMMU_CONTEXT_INIT_default (1 << 0) +#define IOMMU_CONTEXT_INIT_quarantine (1 << 1) +int iommu_context_init(struct domain *d, struct iommu_context *ctx, u16 ct= x_id, u32 flags); + +#define IOMMU_TEARDOWN_REATTACH_DEFAULT (1 << 0) +#define IOMMU_TEARDOWN_PREEMPT (1 << 1) +int iommu_context_teardown(struct domain *d, struct iommu_context *ctx, u3= 2 flags); + +int iommu_context_alloc(struct domain *d, u16 *ctx_id, u32 flags); +int iommu_context_free(struct domain *d, u16 ctx_id, u32 flags); + +int iommu_reattach_context(struct domain *prev_dom, struct domain *next_do= m, + device_t *dev, u16 ctx_id); +int iommu_attach_context(struct domain *d, device_t *dev, u16 ctx_id); +int iommu_detach_context(struct domain *d, device_t *dev); + /* * The purpose of the iommu_dont_flush_iotlb optional cpu flag is to * avoid unecessary iotlb_flush in the low level IOMMU code. diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index f784e91160..a421ead1a4 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -97,6 +97,7 @@ struct pci_dev_info { struct pci_dev { struct list_head alldevs_list; struct list_head domain_list; + struct list_head context_list; =20 struct list_head msi_list; =20 @@ -104,6 +105,8 @@ struct pci_dev { =20 struct domain *domain; =20 + uint16_t context; /* IOMMU context number of domain */ + const union { struct { uint8_t devfn; --=20 2.47.2 Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sat Nov 1 23:25:40 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=quarantine dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1739787541; cv=none; d=zohomail.com; s=zohoarc; b=FuS17FjpaZyDwHM4ONwXMuagWj0lMQYXN/w8Krgfo8W0oX6AH5jNwZOK0c6k/akCFThXeRJDCVU5uFs9Pzks7JLCYUUH8/Au035ONdZ5YWMTD/pwbdxD0k8UcuRyh90JcMeSb4zly8O9uqjwUliKIzwdbP8E4a+npMUID5U5u6k= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1739787541; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=zoLqeflgQV7od7E9S7uru8jBtRkAuKD+thEBMZ1CtXE=; b=XSvVcNM5LcmYaE4xm4KG421ctn1EmLJss52s5izkeMf1eZIUkI9nwDJ/ltqJfLAz3AfB3tIy3AGidPGocFxG8oevEHVDSgdctvPM2ksOpCZ8MIhqLb3kMY8z+rnpUQJjqMhtsGe00q6c4jEt+MsYUcEycYBwQsJcIpYlEf5YHn8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1739787541532985.2933298799751; Mon, 17 Feb 2025 02:19:01 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.890009.1299101 (Exim 4.92) (envelope-from ) id 1tjyCs-0001le-Ep; Mon, 17 Feb 2025 10:18:30 +0000 Received: by outflank-mailman (output) from mailman id 890009.1299101; Mon, 17 Feb 2025 10:18:30 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCs-0001kh-1O; Mon, 17 Feb 2025 10:18:30 +0000 Received: by outflank-mailman (input) for mailman id 890009; Mon, 17 Feb 2025 10:18:28 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCq-0008Nl-9s for xen-devel@lists.xenproject.org; Mon, 17 Feb 2025 10:18:28 +0000 Received: from mail178-27.suw51.mandrillapp.com (mail178-27.suw51.mandrillapp.com [198.2.178.27]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 86130cc7-ed18-11ef-9aa6-95dc52dad729; Mon, 17 Feb 2025 11:18:25 +0100 (CET) Received: from pmta13.mandrill.prod.suw01.rsglab.com (localhost [127.0.0.1]) by mail178-27.suw51.mandrillapp.com (Mailchimp) with ESMTP id 4YxJWy5LXQz6CPyQT for ; Mon, 17 Feb 2025 10:18:22 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id b40b248698934c5e840df52a6849616b; Mon, 17 Feb 2025 10:18:22 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 86130cc7-ed18-11ef-9aa6-95dc52dad729 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1739787502; x=1740057502; bh=zoLqeflgQV7od7E9S7uru8jBtRkAuKD+thEBMZ1CtXE=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=AecTorfYV5B4nPZ3WL6xVdqUAe5260IYtLVn1YZyh+DU9ZxczFW6OpyppEJDj6312 leSqW+uBvQxhNymD921sBgDOyMTxIT+8bMh3na8GOzfmEbiTMNBH0No5fuddCQl68F 2eQFSxwr5eRIfW4D6BzwzI07gC9iubEN44OsmxzMwX0+bWGLqGu0nY79omwkagGPun 1lIQVoBWiBPTOZRI5Ama3Ht69EoI/8lnJ5UXShB9Vz4OumZHiCAeMH/4MgDv3Jyllq 8pzjlw3OwxFEOuzXFTDPibWkDouJvV/t6mtNYuyYKkSTB57PsuWHqA14guJpmuWQsE hzZ45cfw47Fzg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1739787502; x=1740048002; i=teddy.astie@vates.tech; bh=zoLqeflgQV7od7E9S7uru8jBtRkAuKD+thEBMZ1CtXE=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=BIKXgasbPfGeutyxqRg3EzMJ8Oq3gXx+TBhVeTXFfttneO35gW4OVW/86lEIfhaG0 SD+SUGJ7h6dmyBdfAz3zm2Lr6PBmFEZGH8yq/lVp7ODjOJFxYWiTuf0UIhKCcM4eS0 tHmvBj2wYr7+/2xD2tWrK8aq/1vzy2OxlbBRP99X8jqKg0J9o1aqLbjEpSlYkew/nb rm18oE5RI9rlcygvQTU5vKg888Ze95ZyKjShwZAlsYyUd1DVekV0lE+YiRftJq+SHs tB+M3SwgRHfOllFUKH+FV/WlWONi5PIaGvYj18E9xqnAN/4iLPN9th6RR5kKzHH/TS Uk8NMlsT1INrQ== From: "Teddy Astie" Subject: =?utf-8?Q?[XEN=20RFC=20PATCH=20v6=2009/11]=20x86/iommu:=20Introduce=20IOMMU=20arena?= X-Mailer: git-send-email 2.47.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1739787501844 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Jan Beulich" , "Andrew Cooper" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" Message-Id: <19b58d02c32d35bb422df7934da26855da7e3f87.1739785339.git.teddy.astie@vates.tech> In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.b40b248698934c5e840df52a6849616b?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20250217:md Date: Mon, 17 Feb 2025 10:18:22 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @mandrillapp.com) (identity teddy.astie@vates.tech) X-ZM-MESSAGEID: 1739787543564019100 Content-Type: text/plain; charset="utf-8" Introduce a new facility that reserves a fixed amount of contiguous pages and provide a way to allocate them. It is used to ensure that the guest cannot cause the hypervisor to OOM with unconstrained allocations by abusing the PV-IOMMU interface. Signed-off-by: Teddy Astie --- xen/arch/x86/include/asm/arena.h | 54 +++++++++ xen/arch/x86/include/asm/iommu.h | 3 + xen/drivers/passthrough/x86/Makefile | 1 + xen/drivers/passthrough/x86/arena.c | 157 +++++++++++++++++++++++++++ 4 files changed, 215 insertions(+) create mode 100644 xen/arch/x86/include/asm/arena.h create mode 100644 xen/drivers/passthrough/x86/arena.c diff --git a/xen/arch/x86/include/asm/arena.h b/xen/arch/x86/include/asm/ar= ena.h new file mode 100644 index 0000000000..7555b100e0 --- /dev/null +++ b/xen/arch/x86/include/asm/arena.h @@ -0,0 +1,54 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/** + * Simple arena-based page allocator. + */ + +#ifndef __XEN_IOMMU_ARENA_H__ +#define __XEN_IOMMU_ARENA_H__ + +#include "xen/domain.h" +#include "xen/atomic.h" +#include "xen/mm-frame.h" +#include "xen/types.h" + +/** + * struct page_arena: Page arena structure + */ +struct iommu_arena { + /* mfn of the first page of the memory region */ + mfn_t region_start; + /* bitmap of allocations */ + unsigned long *map; + + /* Order of the arena */ + unsigned int order; + + /* Used page count */ + atomic_t used_pages; +}; + +/** + * Initialize a arena using domheap allocator. + * @param [out] arena Arena to allocate + * @param [in] domain domain that has ownership of arena pages + * @param [in] order order of the arena (power of two of the size) + * @param [in] memflags Flags for domheap_alloc_pages() + * @return -ENOMEM on arena allocation error, 0 otherwise + */ +int iommu_arena_initialize(struct iommu_arena *arena, struct domain *domai= n, + unsigned int order, unsigned int memflags); + +/** + * Teardown a arena. + * @param [out] arena arena to allocate + * @param [in] check check for existing allocations + * @return -EBUSY if check is specified + */ +int iommu_arena_teardown(struct iommu_arena *arena, bool check); + +struct page_info *iommu_arena_allocate_page(struct iommu_arena *arena); +bool iommu_arena_free_page(struct iommu_arena *arena, struct page_info *pa= ge); + +#define iommu_arena_size(arena) (1LLU << (arena)->order) + +#endif diff --git a/xen/arch/x86/include/asm/iommu.h b/xen/arch/x86/include/asm/io= mmu.h index 654a07b9b2..452b98b42d 100644 --- a/xen/arch/x86/include/asm/iommu.h +++ b/xen/arch/x86/include/asm/iommu.h @@ -12,6 +12,8 @@ #include #include =20 +#include "arena.h" + #define DEFAULT_DOMAIN_ADDRESS_WIDTH 48 =20 struct g2m_ioport { @@ -62,6 +64,7 @@ struct arch_iommu { /* Queue for freeing pages */ struct page_list_head free_queue; + struct iommu_arena pt_arena; /* allocator for non-default contexts */ =20 union { /* Intel VT-d */ diff --git a/xen/drivers/passthrough/x86/Makefile b/xen/drivers/passthrough= /x86/Makefile index 75b2885336..1614f3d284 100644 --- a/xen/drivers/passthrough/x86/Makefile +++ b/xen/drivers/passthrough/x86/Makefile @@ -1,2 +1,3 @@ obj-y +=3D iommu.o +obj-y +=3D arena.o obj-$(CONFIG_HVM) +=3D hvm.o diff --git a/xen/drivers/passthrough/x86/arena.c b/xen/drivers/passthrough/= x86/arena.c new file mode 100644 index 0000000000..984bc4d643 --- /dev/null +++ b/xen/drivers/passthrough/x86/arena.c @@ -0,0 +1,157 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/** + * Simple arena-based page allocator. + * + * Allocate a large block using alloc_domheam_pages and allocate single pa= ges + * using iommu_arena_allocate_page and iommu_arena_free_page functions. + * + * Concurrent {allocate/free}_page is thread-safe + * iommu_arena_teardown during {allocate/free}_page is not thread-safe. + * + * Written by Teddy Astie + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +/* Maximum of scan tries if the bit found not available */ +#define ARENA_TSL_MAX_TRIES 5 + +int iommu_arena_initialize(struct iommu_arena *arena, struct domain *d, + unsigned int order, unsigned int memflags) +{ + struct page_info *page; + + /* TODO: Maybe allocate differently ? */ + page =3D alloc_domheap_pages(d, order, memflags); + + if ( !page ) + return -ENOMEM; + + arena->map =3D xzalloc_array(unsigned long, BITS_TO_LONGS(1LLU << orde= r)); + arena->order =3D order; + arena->region_start =3D page_to_mfn(page); + + _atomic_set(&arena->used_pages, 0); + bitmap_zero(arena->map, iommu_arena_size(arena)); + + printk(XENLOG_DEBUG "IOMMU: Allocated arena (%llu pages, start=3D%"PRI= _mfn")\n", + iommu_arena_size(arena), mfn_x(arena->region_start)); + return 0; +} + +int iommu_arena_teardown(struct iommu_arena *arena, bool check) +{ + BUG_ON(mfn_x(arena->region_start) =3D=3D 0); + + /* Check for allocations if check is specified */ + if ( check && (atomic_read(&arena->used_pages) > 0) ) + return -EBUSY; + + free_domheap_pages(mfn_to_page(arena->region_start), arena->order); + + arena->region_start =3D _mfn(0); + _atomic_set(&arena->used_pages, 0); + xfree(arena->map); + arena->map =3D NULL; + + return 0; +} + +struct page_info *iommu_arena_allocate_page(struct iommu_arena *arena) +{ + unsigned int index; + unsigned int tsl_tries =3D 0; + + BUG_ON(mfn_x(arena->region_start) =3D=3D 0); + + if ( atomic_read(&arena->used_pages) =3D=3D iommu_arena_size(arena) ) + /* All pages used */ + return NULL; + + do + { + index =3D find_first_zero_bit(arena->map, iommu_arena_size(arena)); + + if ( index >=3D iommu_arena_size(arena) ) + /* No more free pages */ + return NULL; + + /* + * While there shouldn't be a lot of retries in practice, this loop + * *may* run indefinetly if the found bit is never free due to bei= ng + * overwriten by another CPU core right after. Add a safeguard for + * such very rare cases. + */ + tsl_tries++; + + if ( unlikely(tsl_tries =3D=3D ARENA_TSL_MAX_TRIES) ) + { + printk(XENLOG_ERR "ARENA: Too many TSL retries !"); + return NULL; + } + + /* Make sure that the bit we found is still free */ + } while ( test_and_set_bit(index, arena->map) ); + + atomic_inc(&arena->used_pages); + + return mfn_to_page(mfn_add(arena->region_start, index)); +} + +bool iommu_arena_free_page(struct iommu_arena *arena, struct page_info *pa= ge) +{ + unsigned long index; + mfn_t frame; + + if ( !page ) + { + printk(XENLOG_WARNING "IOMMU: Trying to free NULL page"); + WARN(); + return false; + } + + frame =3D page_to_mfn(page); + + /* Check if page belongs to our arena */ + if ( (mfn_x(frame) < mfn_x(arena->region_start)) + || (mfn_x(frame) >=3D (mfn_x(arena->region_start) + iommu_arena_si= ze(arena))) ) + { + printk(XENLOG_WARNING + "IOMMU: Trying to free outside arena region [mfn=3D%"PRI_mf= n"]", + mfn_x(frame)); + WARN(); + return false; + } + + index =3D mfn_x(frame) - mfn_x(arena->region_start); + + /* Sanity check in case of underflow. */ + ASSERT(index < iommu_arena_size(arena)); + + if ( !test_and_clear_bit(index, arena->map) ) + { + /* + * Bit was free during our arena_free_page, which means that + * either this page was never allocated, or we are in a double-free + * situation. + */ + printk(XENLOG_WARNING + "IOMMU: Freeing non-allocated region (double-free?) [mfn=3D= %"PRI_mfn"]", + mfn_x(frame)); + WARN(); + return false; + } + + atomic_dec(&arena->used_pages); + + return true; +} \ No newline at end of file --=20 2.47.2 Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sat Nov 1 23:25:40 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=quarantine dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1739787545; cv=none; d=zohomail.com; s=zohoarc; b=P35V+Z4/qk+/qR48JjlibFmXXUeaTqj2ASNUCEBwoYgp4d+V7th7su+83MndSmP2A5W7paZpLml2ZRVjTMtomCfzyWLVNotcmW1mnZj9N29adpqwDY2BC7+L75QvdR+eus1Fh7g3nxK4ibYSwaV1TA58iAFux4YsrxoNRBUYjHI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1739787545; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=WbgTGvRMlHFjR5UB7o6+2Ox9ptYDhf0B20gCWQNSGOE=; b=JVoxN6J1MmZ1BlT/ACqxt4dWHMyNG9eCg6c5omF/8Gp6oPIiQ20aUA87hltS5mlGVUOK2uaYUio4KeFHFcQ79ZRH5W8ayBvm+MYr9x1U5Ppv4BCDbA6EIcEiH5nAYp3HIc7yNXZGC+oS5wXNcxtm9on6a6j8GI0/k99cX3ktOMM= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1739787545731702.8250150072975; Mon, 17 Feb 2025 02:19:05 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.890011.1299122 (Exim 4.92) (envelope-from ) id 1tjyCv-0002Pa-2W; Mon, 17 Feb 2025 10:18:33 +0000 Received: by outflank-mailman (output) from mailman id 890011.1299122; Mon, 17 Feb 2025 10:18:32 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCu-0002Og-IK; Mon, 17 Feb 2025 10:18:32 +0000 Received: by outflank-mailman (input) for mailman id 890011; Mon, 17 Feb 2025 10:18:30 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCs-0008Nl-9o for xen-devel@lists.xenproject.org; Mon, 17 Feb 2025 10:18:30 +0000 Received: from mail178-27.suw51.mandrillapp.com (mail178-27.suw51.mandrillapp.com [198.2.178.27]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 86988630-ed18-11ef-9aa6-95dc52dad729; Mon, 17 Feb 2025 11:18:26 +0100 (CET) Received: from pmta13.mandrill.prod.suw01.rsglab.com (localhost [127.0.0.1]) by mail178-27.suw51.mandrillapp.com (Mailchimp) with ESMTP id 4YxJX12b2pz6CPyQ8 for ; Mon, 17 Feb 2025 10:18:25 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id 1139b33da4f64eb4982db78ecf1a9d49; Mon, 17 Feb 2025 10:18:25 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 86988630-ed18-11ef-9aa6-95dc52dad729 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1739787505; x=1740057505; bh=WbgTGvRMlHFjR5UB7o6+2Ox9ptYDhf0B20gCWQNSGOE=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=twBMtl1WPy1ibOZkb6Ce9EyLnJ4aBXCD5vQyVuthdfwbO/wqngaRncHelrQtneLrB TZSAvGJs5pRhxCoDbopEYT6+uBIDbFE4n/+pH3sMvych3PIpWtmwVszPno97/sxlZo D5XxI/+VA+Vhp0THDxmZMlxrLaFytMM6ig5WOjGleTXIJseMZ2jRbmFqfza2K3jfCZ /dt2ofRxbwiVV6LvnWMuziA8WlbV6oDSPm+NY46SDZuaJvROuOhllbD5GfQwxcP9x0 R5fMBBZsV8qIxm4y0rDp4vYWREO08iII0qswJApjvwUjc8qyMm+i2oFIPEmL0lCV1Y hnDmUeiULhedg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1739787505; x=1740048005; i=teddy.astie@vates.tech; bh=WbgTGvRMlHFjR5UB7o6+2Ox9ptYDhf0B20gCWQNSGOE=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=tPt8J4wt6Uf7CH67fK60T8DLCP74ycOeVAWDfer+Y59e7XBEnlW0IE7WR2DVdgMqF 441HHJAYeJRs/ZN27wl7stfTX/VrTQWwqXRBXycF0wL6ATAxKtwh+5H/KE8lW5MiCT r8yUvCHsxaTYRrCeR/3i/ePQBB0kgF6fUbDVA3OTG2G22vXbhWihpCoUy97XpakoD5 z1IxYtTGFxcd2sM6IS/bAIX7SPDrbnWm616yOMeZ/7M4KqAHoTEuBf6vYlTpnLA4Ef 7emVVSVy6f+MeQnts//4rvFmn+oTQJDqyX0RnpmvELFEjbuMVyHurYEDox5YkNOGLk 2u3NfNt1KpOPQ== From: "Teddy Astie" Subject: =?utf-8?Q?[XEN=20RFC=20PATCH=20v6=2010/11]=20iommu:=20Introduce=20PV-IOMMU?= X-Mailer: git-send-email 2.47.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1739787502089 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Jan Beulich" , "Andrew Cooper" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" , "Anthony PERARD" , "Michal Orzel" , "Julien Grall" , "Stefano Stabellini" Message-Id: <9d2d5f255224eca6be95eff0f538d7fd7d93a2ac.1739785339.git.teddy.astie@vates.tech> In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.1139b33da4f64eb4982db78ecf1a9d49?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20250217:md Date: Mon, 17 Feb 2025 10:18:25 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity teddy.astie@vates.tech) (identity @mandrillapp.com) X-ZM-MESSAGEID: 1739787547610019100 Content-Type: text/plain; charset="utf-8" Introduce the PV-IOMMU subsystem as defined in docs/designs/pv-iommu.md. Signed-off-by: Teddy Astie --- xen/arch/x86/include/asm/iommu.h | 3 + xen/common/Makefile | 1 + xen/common/pv-iommu.c | 536 ++++++++++++++++++++ xen/drivers/passthrough/amd/pci_amd_iommu.c | 15 + xen/drivers/passthrough/iommu.c | 105 ++++ xen/drivers/passthrough/vtd/iommu.c | 8 + xen/drivers/passthrough/x86/iommu.c | 61 ++- xen/include/hypercall-defs.c | 6 + xen/include/public/pv-iommu.h | 343 +++++++++++++ xen/include/public/xen.h | 1 + xen/include/xen/iommu.h | 11 + 11 files changed, 1085 insertions(+), 5 deletions(-) create mode 100644 xen/common/pv-iommu.c create mode 100644 xen/include/public/pv-iommu.h diff --git a/xen/arch/x86/include/asm/iommu.h b/xen/arch/x86/include/asm/io= mmu.h index 452b98b42d..09fb512936 100644 --- a/xen/arch/x86/include/asm/iommu.h +++ b/xen/arch/x86/include/asm/iommu.h @@ -136,6 +136,9 @@ int iommu_identity_mapping(struct domain *d, struct iom= mu_context *ctx, p2m_access_t p2ma, paddr_t base, paddr_t end, unsigned int flag); void iommu_identity_map_teardown(struct domain *d, struct iommu_context *c= tx); +bool iommu_identity_map_check(struct domain *d, struct iommu_context *ctx, + mfn_t mfn); + =20 extern bool untrusted_msi; =20 diff --git a/xen/common/Makefile b/xen/common/Makefile index cba3b32733..c8583a80ba 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -37,6 +37,7 @@ obj-y +=3D percpu.o obj-$(CONFIG_PERF_COUNTERS) +=3D perfc.o obj-bin-$(CONFIG_HAS_PMAP) +=3D pmap.init.o obj-y +=3D preempt.o +obj-y +=3D pv-iommu.o obj-y +=3D random.o obj-y +=3D rangeset.o obj-y +=3D radix-tree.o diff --git a/xen/common/pv-iommu.c b/xen/common/pv-iommu.c new file mode 100644 index 0000000000..a1315bf582 --- /dev/null +++ b/xen/common/pv-iommu.c @@ -0,0 +1,536 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * xen/common/pv_iommu.c + * + * PV-IOMMU hypercall interface. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define PVIOMMU_PREFIX "[PV-IOMMU] " + +static int get_paged_frame(struct domain *d, gfn_t gfn, mfn_t *mfn, + struct page_info **page, bool readonly) +{ + int ret =3D 0; + p2m_type_t p2mt =3D p2m_invalid; + + #ifdef CONFIG_X86 + p2m_query_t query =3D P2M_ALLOC; + + if ( !readonly ) + query |=3D P2M_UNSHARE; + + *mfn =3D get_gfn_type(d, gfn_x(gfn), &p2mt, query); + #else + *mfn =3D p2m_lookup(d, gfn, &p2mt); + #endif + + if ( mfn_eq(*mfn, INVALID_MFN) ) + { + /* No mapping ? */ + printk(XENLOG_G_WARNING PVIOMMU_PREFIX + "Trying to map to non-backed page frame (gfn=3D%"PRI_gfn + " p2mt=3D%d d%d)\n", gfn_x(gfn), p2mt, d->domain_id); + + ret =3D -ENOENT; + } + else if ( p2m_is_any_ram(p2mt) && mfn_valid(*mfn) ) + { + *page =3D get_page_from_mfn(*mfn, d); + ret =3D 0; + } + else if ( p2m_is_mmio(p2mt) || + iomem_access_permitted(d, mfn_x(*mfn),mfn_x(*mfn)) ) + { + *page =3D NULL; + ret =3D 0; + } + else + { + printk(XENLOG_G_WARNING PVIOMMU_PREFIX + "Unexpected p2mt %d (d%d gfn=3D%"PRI_gfn" mfn=3D%"PRI_mfn")= \n", + p2mt, d->domain_id, gfn_x(gfn), mfn_x(*mfn)); + + ret =3D -EPERM; + } + + put_gfn(d, gfn_x(gfn)); + return ret; +} + +static bool can_use_iommu_check(struct domain *d) +{ + if ( !is_iommu_enabled(d) ) + { + printk(XENLOG_G_WARNING PVIOMMU_PREFIX + "IOMMU disabled for this domain\n"); + return false; + } + + if ( !dom_iommu(d)->allow_pv_iommu ) + { + printk(XENLOG_G_WARNING PVIOMMU_PREFIX + "PV-IOMMU disabled for this domain\n"); + return false; + } + + return true; +} + +static long capabilities_op(struct pv_iommu_capabilities *cap, struct doma= in *d) +{ + cap->max_ctx_no =3D d->iommu.other_contexts.count; + cap->max_iova_addr =3D iommu_get_max_iova(d); + + cap->max_pasid =3D 0; /* TODO */ + cap->cap_flags =3D 0; + + cap->pgsize_mask =3D PAGE_SIZE_4K; + + return 0; +} + +static long init_op(struct pv_iommu_init *init, struct domain *d) +{ + if (init->max_ctx_no =3D=3D UINT32_MAX) + return -E2BIG; + + return iommu_domain_pviommu_init(d, init->max_ctx_no + 1, init->arena_= order); +} + +static long alloc_context_op(struct pv_iommu_alloc *alloc, struct domain *= d) +{ + u16 ctx_no =3D 0; + int status =3D 0; + + status =3D iommu_context_alloc(d, &ctx_no, 0); + + if ( status ) + return status; + + printk(XENLOG_G_INFO PVIOMMU_PREFIX + "Created IOMMU context %hu in d%d\n", ctx_no, d->domain_id); + + alloc->ctx_no =3D ctx_no; + return 0; +} + +static long free_context_op(struct pv_iommu_free *free, struct domain *d) +{ + int flags =3D IOMMU_TEARDOWN_PREEMPT; + + if ( !free->ctx_no ) + return -EINVAL; + + if ( free->free_flags & IOMMU_FREE_reattach_default ) + flags |=3D IOMMU_TEARDOWN_REATTACH_DEFAULT; + + return iommu_context_free(d, free->ctx_no, flags); +} + +static long reattach_device_op(struct pv_iommu_reattach_device *reattach, + struct domain *d) +{ + int ret; + device_t *pdev; + struct physdev_pci_device dev =3D reattach->dev; + + pcidevs_lock(); + pdev =3D pci_get_pdev(d, PCI_SBDF(dev.seg, dev.bus, dev.devfn)); + + if ( !pdev ) + { + pcidevs_unlock(); + return -ENOENT; + } + + ret =3D iommu_reattach_context(d, d, pdev, reattach->ctx_no); + + pcidevs_unlock(); + return ret; +} + +static long map_pages_op(struct pv_iommu_map_pages *map, struct domain *d) +{ + struct iommu_context *ctx; + int ret =3D 0, flush_ret; + struct page_info *page =3D NULL; + mfn_t mfn, mfn_lookup; + unsigned int flags =3D 0, flush_flags =3D 0; + size_t i =3D 0; + dfn_t dfn0 =3D _dfn(map->dfn); /* original map->dfn */ + + if ( !map->ctx_no || !(ctx =3D iommu_get_context(d, map->ctx_no)) ) + return -EINVAL; + + if ( map->map_flags & IOMMU_MAP_readable ) + flags |=3D IOMMUF_readable; + + if ( map->map_flags & IOMMU_MAP_writeable ) + flags |=3D IOMMUF_writable; + + for (i =3D 0; i < map->nr_pages; i++) + { + gfn_t gfn =3D _gfn(map->gfn + i); + dfn_t dfn =3D _dfn(map->dfn + i); + +#ifdef CONFIG_X86 + if ( iommu_identity_map_check(d, ctx, _mfn(map->dfn)) ) + { + ret =3D -EADDRNOTAVAIL; + break; + } +#endif + + ret =3D get_paged_frame(d, gfn, &mfn, &page, 0); + + if ( ret ) + break; + + /* Check for conflict with existing mappings */ + if ( !iommu_lookup_page(d, dfn, &mfn_lookup, &flags, map->ctx_no) ) + { + if ( page ) + put_page(page); + + ret =3D -EADDRINUSE; + break; + } + + ret =3D iommu_map(d, dfn, mfn, 1, flags, &flush_flags, map->ctx_no= ); + + if ( ret ) + { + if ( page ) + put_page(page); + + break; + } + + map->mapped++; + + if ( (i & 0xff) && hypercall_preempt_check() ) + { + i++; + + map->gfn +=3D i; + map->dfn +=3D i; + map->nr_pages -=3D i; + + ret =3D -ERESTART; + break; + } + } + + flush_ret =3D iommu_iotlb_flush(d, dfn0, i, flush_flags, map->ctx_no); + + iommu_put_context(ctx); + + if ( flush_ret ) + printk(XENLOG_G_WARNING PVIOMMU_PREFIX + "Flush operation failed for d%dc%d (%d)\n", d->domain_id, + ctx->id, flush_ret); + + return ret; +} + +static long unmap_pages_op(struct pv_iommu_unmap_pages *unmap, struct doma= in *d) +{ + struct iommu_context *ctx; + mfn_t mfn; + int ret =3D 0, flush_ret; + unsigned int flags, flush_flags =3D 0; + size_t i =3D 0; + dfn_t dfn0 =3D _dfn(unmap->dfn); /* original unmap->dfn */ + + if ( !unmap->ctx_no || !(ctx =3D iommu_get_context(d, unmap->ctx_no)) ) + return -EINVAL; + + for (i =3D 0; i < unmap->nr_pages; i++) + { + dfn_t dfn =3D _dfn(unmap->dfn + i); + +#ifdef CONFIG_X86 + if ( iommu_identity_map_check(d, ctx, _mfn(unmap->dfn)) ) + { + ret =3D -EADDRNOTAVAIL; + break; + } +#endif + + /* Check if there is a valid mapping for this domain */ + if ( iommu_lookup_page(d, dfn, &mfn, &flags, unmap->ctx_no) ) { + ret =3D -ENOENT; + break; + } + + ret =3D iommu_unmap(d, dfn, 1, 0, &flush_flags, unmap->ctx_no); + + if ( ret ) + break; + + unmap->unmapped++; + + /* Decrement reference counter (if needed) */ + if ( mfn_valid(mfn) ) + put_page(mfn_to_page(mfn)); + + if ( (i & 0xff) && hypercall_preempt_check() ) + { + i++; + + unmap->dfn +=3D i; + unmap->nr_pages -=3D i; + + ret =3D -ERESTART; + break; + } + } + + flush_ret =3D iommu_iotlb_flush(d, dfn0, i, flush_flags, unmap->ctx_no= ); + + iommu_put_context(ctx); + + if ( flush_ret ) + printk(XENLOG_WARNING PVIOMMU_PREFIX + "Flush operation failed for d%dc%d (%d)\n", d->domain_id, + ctx->id, flush_ret); + + return ret; +} + +static long do_iommu_subop(int subop, XEN_GUEST_HANDLE_PARAM(void) arg, + struct domain *d, bool remote); + +static long remote_cmd_op(struct pv_iommu_remote_cmd *remote_cmd, + struct domain *current_domain) +{ + long ret =3D 0; + struct domain *d; + + /* TODO: use a better permission logic */ + if ( !is_hardware_domain(current_domain) ) + return -EPERM; + + d =3D get_domain_by_id(remote_cmd->domid); + + if ( !d ) + return -ENOENT; + + ret =3D do_iommu_subop(remote_cmd->subop, remote_cmd->arg, d, true); + + put_domain(d); + + return ret; +} + +static long do_iommu_subop(int subop, XEN_GUEST_HANDLE_PARAM(void) arg, + struct domain *d, bool remote) +{ + long ret =3D 0; + + switch ( subop ) + { + case IOMMU_noop: + break; + + case IOMMU_query_capabilities: + { + struct pv_iommu_capabilities cap; + + ret =3D capabilities_op(&cap, d); + + if ( unlikely(copy_to_guest(arg, &cap, 1)) ) + ret =3D -EFAULT; + + break; + } + + case IOMMU_init: + { + struct pv_iommu_init init; + + if ( unlikely(copy_from_guest(&init, arg, 1)) ) + { + ret =3D -EFAULT; + break; + } + + ret =3D init_op(&init, d); + } + + case IOMMU_alloc_context: + { + struct pv_iommu_alloc alloc; + + if ( unlikely(copy_from_guest(&alloc, arg, 1)) ) + { + ret =3D -EFAULT; + break; + } + + ret =3D alloc_context_op(&alloc, d); + + if ( unlikely(copy_to_guest(arg, &alloc, 1)) ) + ret =3D -EFAULT; + + break; + } + + case IOMMU_free_context: + { + struct pv_iommu_free free; + + if ( unlikely(copy_from_guest(&free, arg, 1)) ) + { + ret =3D -EFAULT; + break; + } + + ret =3D free_context_op(&free, d); + break; + } + + case IOMMU_reattach_device: + { + struct pv_iommu_reattach_device reattach; + + if ( unlikely(copy_from_guest(&reattach, arg, 1)) ) + { + ret =3D -EFAULT; + break; + } + + ret =3D reattach_device_op(&reattach, d); + break; + } + + case IOMMU_map_pages: + { + struct pv_iommu_map_pages map; + + if ( unlikely(copy_from_guest(&map, arg, 1)) ) + { + ret =3D -EFAULT; + break; + } + + ret =3D map_pages_op(&map, d); + + if ( unlikely(copy_to_guest(arg, &map, 1)) ) + ret =3D -EFAULT; + + break; + } + + case IOMMU_unmap_pages: + { + struct pv_iommu_unmap_pages unmap; + + if ( unlikely(copy_from_guest(&unmap, arg, 1)) ) + { + ret =3D -EFAULT; + break; + } + + ret =3D unmap_pages_op(&unmap, d); + + if ( unlikely(copy_to_guest(arg, &unmap, 1)) ) + ret =3D -EFAULT; + + break; + } + + case IOMMU_remote_cmd: + { + struct pv_iommu_remote_cmd remote_cmd; + + if ( remote ) + { + /* Prevent remote_cmd from being called recursively */ + ret =3D -EINVAL; + break; + } + + if ( unlikely(copy_from_guest(&remote_cmd, arg, 1)) ) + { + ret =3D -EFAULT; + break; + } + + ret =3D remote_cmd_op(&remote_cmd, d); + break; + } + + /* + * TODO + */ + case IOMMU_alloc_nested: + { + ret =3D -EOPNOTSUPP; + break; + } + + case IOMMU_flush_nested: + { + ret =3D -EOPNOTSUPP; + break; + } + + case IOMMU_attach_pasid: + { + ret =3D -EOPNOTSUPP; + break; + } + + case IOMMU_detach_pasid: + { + ret =3D -EOPNOTSUPP; + break; + } + + default: + return -EOPNOTSUPP; + } + + return ret; +} + +long do_iommu_op(unsigned int subop, XEN_GUEST_HANDLE_PARAM(void) arg) +{ + long ret =3D 0; + + if ( !can_use_iommu_check(current->domain) ) + return -ENODEV; + + ret =3D do_iommu_subop(subop, arg, current->domain, false); + + if ( ret =3D=3D -ERESTART ) + return hypercall_create_continuation(__HYPERVISOR_iommu_op, "ih", = subop, arg); + + return ret; +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/pass= through/amd/pci_amd_iommu.c index 366d5eb982..0b561ff99b 100644 --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -714,6 +714,20 @@ static void cf_check amd_dump_page_tables(struct domai= n *d) hd->arch.amd.paging_mode, 0, 0); } =20 +uint64_t amd_get_max_iova(struct domain *d) +{ + struct domain_iommu *hd =3D dom_iommu(d); + unsigned int bits =3D 12 + hd->arch.amd.paging_mode * 9; + + /* If paging_mode =3D=3D 6, which indicates 6-level page tables, + we have bits =3D=3D 66 while the GPA space is still 64-bits + */ + if (bits >=3D 64) + return ~0LLU; + + return (1LLU << bits) - 1; +} + static const struct iommu_ops __initconst_cf_clobber _iommu_ops =3D { .page_sizes =3D PAGE_SIZE_4K | PAGE_SIZE_2M | PAGE_SIZE_1G, .init =3D amd_iommu_domain_init, @@ -742,6 +756,7 @@ static const struct iommu_ops __initconst_cf_clobber _i= ommu_ops =3D { .crash_shutdown =3D amd_iommu_crash_shutdown, .get_reserved_device_memory =3D amd_iommu_get_reserved_device_memory, .dump_page_tables =3D amd_dump_page_tables, + .get_max_iova =3D amd_get_max_iova, }; =20 static const struct iommu_init_ops __initconstrel _iommu_init_ops =3D { diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iomm= u.c index f92835a2ed..c26a2160f9 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -193,6 +193,99 @@ static void __hwdom_init check_hwdom_reqs(struct domai= n *d) arch_iommu_check_autotranslated_hwdom(d); } =20 + +int iommu_domain_pviommu_init(struct domain *d, uint16_t nb_ctx, uint32_t = arena_order) +{ + struct domain_iommu *hd =3D dom_iommu(d); + int rc; + + BUG_ON(nb_ctx =3D=3D 0); /* sanity check (prevent underflow) */ + + /* + * hd->other_contexts.count is always reported as 0 during initializat= ion + * preventing misuse of partially initialized IOMMU contexts. + */ + + if ( atomic_cmpxchg(&hd->other_contexts.initialized, 0, 1) =3D=3D 1 ) + return -EACCES; + + if ( (nb_ctx - 1) > 0 ) { + /* Initialize context bitmap */ + size_t i; + + hd->other_contexts.bitmap =3D xzalloc_array(unsigned long, + BITS_TO_LONGS(nb_ctx - 1= )); + + if (!hd->other_contexts.bitmap) + { + rc =3D -ENOMEM; + goto cleanup; + } + + hd->other_contexts.map =3D xzalloc_array(struct iommu_context, nb_= ctx - 1); + + if (!hd->other_contexts.map) + { + rc =3D -ENOMEM; + goto cleanup; + } + + for (i =3D 0; i < (nb_ctx - 1); i++) + rspin_lock_init(&hd->other_contexts.map[i].lock); + } + + rc =3D arch_iommu_pviommu_init(d, nb_ctx, arena_order); + + if ( rc ) + goto cleanup; + + /* Make sure initialization is complete before making it visible to ot= her CPUs. */ + smp_wmb(); + + hd->other_contexts.count =3D nb_ctx - 1; + + printk(XENLOG_INFO "Dom%d uses %lu IOMMU contexts (%llu pages arena)\n= ", + d->domain_id, (unsigned long)nb_ctx, 1llu << arena_order); + + return 0; + +cleanup: + /* TODO: Reset hd->other_contexts.initialized */ + if ( hd->other_contexts.bitmap ) + { + xfree(hd->other_contexts.bitmap); + hd->other_contexts.bitmap =3D NULL; + } + + if ( hd->other_contexts.map ) + { + xfree(hd->other_contexts.map); + hd->other_contexts.bitmap =3D NULL; + } + + return rc; +} + +int iommu_domain_pviommu_teardown(struct domain *d) +{ + struct domain_iommu *hd =3D dom_iommu(d); + int i; + /* FIXME: Potential race condition with remote_op ? */ + + for (i =3D 0; i < hd->other_contexts.count; i++) + WARN_ON(iommu_context_free(d, i, IOMMU_TEARDOWN_REATTACH_DEFAULT) = !=3D ENOENT); + + hd->other_contexts.count =3D 0; + + if ( hd->other_contexts.bitmap ) + xfree(hd->other_contexts.bitmap); + + if ( hd->other_contexts.map ) + xfree(hd->other_contexts.map); + + return 0; +} + int iommu_domain_init(struct domain *d, unsigned int opts) { struct domain_iommu *hd =3D dom_iommu(d); @@ -238,6 +331,8 @@ int iommu_domain_init(struct domain *d, unsigned int op= ts) =20 ASSERT(!(hd->need_sync && hd->hap_pt_share)); =20 + hd->allow_pv_iommu =3D true; + rspin_lock(&hd->default_ctx.lock); ret =3D iommu_context_init(d, &hd->default_ctx, 0, IOMMU_CONTEXT_INIT_= default); rspin_unlock(&hd->default_ctx.lock); @@ -1204,6 +1299,16 @@ bool iommu_has_feature(struct domain *d, enum iommu_= feature feature) return is_iommu_enabled(d) && test_bit(feature, dom_iommu(d)->features= ); } =20 +uint64_t iommu_get_max_iova(struct domain *d) +{ + struct domain_iommu *hd =3D dom_iommu(d); + + if ( !hd->platform_ops->get_max_iova ) + return 0; + + return iommu_call(hd->platform_ops, get_max_iova, d); +} + #define MAX_EXTRA_RESERVED_RANGES 20 struct extra_reserved_range { unsigned long start; diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/= vtd/iommu.c index bb53cff158..20afb68399 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -2605,6 +2605,13 @@ static int cf_check intel_iommu_remove_devfn(struct = domain *d, struct pci_dev *p return unapply_context_single(d, drhd->iommu, NULL, pdev->bus, devfn); } =20 +static uint64_t cf_check intel_iommu_get_max_iova(struct domain *d) +{ + struct domain_iommu *hd =3D dom_iommu(d); + + return (1LLU << agaw_to_width(hd->arch.vtd.agaw)) - 1; +} + static const struct iommu_ops __initconst_cf_clobber vtd_ops =3D { .page_sizes =3D PAGE_SIZE_4K, .init =3D intel_iommu_domain_init, @@ -2636,6 +2643,7 @@ static const struct iommu_ops __initconst_cf_clobber = vtd_ops =3D { .iotlb_flush =3D iommu_flush_iotlb, .get_reserved_device_memory =3D intel_iommu_get_reserved_device_memory, .dump_page_tables =3D vtd_dump_page_tables, + .get_max_iova =3D intel_iommu_get_max_iova, }; =20 const struct iommu_init_ops __initconstrel intel_iommu_init_ops =3D { diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/= x86/iommu.c index 7b7fac0db8..79efc6ad47 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -215,6 +215,32 @@ int arch_iommu_context_teardown(struct domain *d, stru= ct iommu_context *ctx, u32 return 0; } =20 +int arch_iommu_pviommu_init(struct domain *d, uint16_t nb_ctx, uint32_t ar= ena_order) +{ + struct domain_iommu *hd =3D dom_iommu(d); + + if ( arena_order =3D=3D 0 ) + return 0; + + return iommu_arena_initialize(&hd->arch.pt_arena, NULL, arena_order, 0= ); +} + +int arch_iommu_pviommu_teardown(struct domain *d) +{ + struct domain_iommu *hd =3D dom_iommu(d); + + if ( iommu_arena_teardown(&hd->arch.pt_arena, true) ) + { + printk(XENLOG_WARNING "IOMMU Arena used while being destroyed\n"); + WARN(); + + /* Teardown anyway */ + iommu_arena_teardown(&hd->arch.pt_arena, false); + } + + return 0; +} + void arch_iommu_domain_destroy(struct domain *d) { } @@ -394,6 +420,19 @@ void iommu_identity_map_teardown(struct domain *d, str= uct iommu_context *ctx) } } =20 +bool iommu_identity_map_check(struct domain *d, struct iommu_context *ctx, + mfn_t mfn) +{ + struct identity_map *map; + uint64_t addr =3D pfn_to_paddr(mfn_x(mfn)); + + list_for_each_entry ( map, &ctx->arch.identity_maps, list ) + if (addr >=3D map->base && addr < map->end) + return true; + + return false; +} + static int __hwdom_init cf_check map_subtract(unsigned long s, unsigned lo= ng e, void *data) { @@ -669,7 +708,7 @@ void iommu_free_domid(domid_t domid, unsigned long *map) BUG(); } =20 -int iommu_free_pgtables(struct domain *d, struct iommu_context *ctx) +int cf_check iommu_free_pgtables(struct domain *d, struct iommu_context *c= tx) { struct domain_iommu *hd =3D dom_iommu(d); struct page_info *pg; @@ -686,7 +725,10 @@ int iommu_free_pgtables(struct domain *d, struct iommu= _context *ctx) =20 while ( (pg =3D page_list_remove_head(&ctx->arch.pgtables)) ) { - free_domheap_page(pg); + if (ctx->id =3D=3D 0) + free_domheap_page(pg); + else + iommu_arena_free_page(&hd->arch.pt_arena, pg); =20 if ( !(++done & 0xff) && general_preempt_check() ) return -ERESTART; @@ -708,7 +750,11 @@ struct page_info *iommu_alloc_pgtable(struct domain_io= mmu *hd, memflags =3D MEMF_node(hd->node); #endif =20 - pg =3D alloc_domheap_page(NULL, memflags); + if (ctx->id =3D=3D 0) + pg =3D alloc_domheap_page(NULL, memflags); + else + pg =3D iommu_arena_allocate_page(&hd->arch.pt_arena); + if ( !pg ) return NULL; =20 @@ -787,9 +833,14 @@ void iommu_queue_free_pgtable(struct domain *d, struct= iommu_context *ctx, =20 page_list_del(pg, &ctx->arch.pgtables); =20 - page_list_add_tail(pg, &per_cpu(free_pgt_list, cpu)); + if ( !ctx->id ) + { + page_list_add_tail(pg, &per_cpu(free_pgt_list, cpu)); =20 - tasklet_schedule(&per_cpu(free_pgt_tasklet, cpu)); + tasklet_schedule(&per_cpu(free_pgt_tasklet, cpu)); + } + else + iommu_arena_free_page(&dom_iommu(d)->arch.pt_arena, pg); } =20 static int cf_check cpu_callback( diff --git a/xen/include/hypercall-defs.c b/xen/include/hypercall-defs.c index 7720a29ade..78ca87b57f 100644 --- a/xen/include/hypercall-defs.c +++ b/xen/include/hypercall-defs.c @@ -209,6 +209,9 @@ hypfs_op(unsigned int cmd, const char *arg1, unsigned l= ong arg2, void *arg3, uns #ifdef CONFIG_X86 xenpmu_op(unsigned int op, xen_pmu_params_t *arg) #endif +#ifdef CONFIG_HAS_PASSTHROUGH +iommu_op(unsigned int subop, void *arg) +#endif =20 #ifdef CONFIG_PV caller: pv64 @@ -295,5 +298,8 @@ mca do do - = - - #ifndef CONFIG_PV_SHIM_EXCLUSIVE paging_domctl_cont do do do do - #endif +#ifdef CONFIG_HAS_PASSTHROUGH +iommu_op do do do do - +#endif =20 #endif /* !CPPCHECK */ diff --git a/xen/include/public/pv-iommu.h b/xen/include/public/pv-iommu.h new file mode 100644 index 0000000000..6f50aea4b7 --- /dev/null +++ b/xen/include/public/pv-iommu.h @@ -0,0 +1,343 @@ +/* SPDX-License-Identifier: MIT */ +/** + * pv-iommu.h + * + * Paravirtualized IOMMU driver interface. + * + * Copyright (c) 2024 Teddy Astie + */ + +#ifndef __XEN_PUBLIC_PV_IOMMU_H__ +#define __XEN_PUBLIC_PV_IOMMU_H__ + +#include "xen.h" +#include "physdev.h" + +#ifndef uint64_aligned_t +#define uint64_aligned_t uint64_t +#endif + +#define IOMMU_DEFAULT_CONTEXT (0) + +enum pv_iommu_cmd { + /* Basic cmd */ + IOMMU_noop =3D 0, + IOMMU_query_capabilities =3D 1, + IOMMU_init =3D 2, + IOMMU_alloc_context =3D 3, + IOMMU_free_context =3D 4, + IOMMU_reattach_device =3D 5, + IOMMU_map_pages =3D 6, + IOMMU_unmap_pages =3D 7, + IOMMU_remote_cmd =3D 8, + + /* Extended cmd */ + IOMMU_alloc_nested =3D 9, /* if IOMMUCAP_nested */ + IOMMU_flush_nested =3D 10, /* if IOMMUCAP_nested */ + IOMMU_attach_pasid =3D 11, /* if IOMMUCAP_pasid */ + IOMMU_detach_pasid =3D 12, /* if IOMMUCAP_pasid */ +}; + +/** + * If set, default context allow DMA to domain memory. + * If cleared, default context blocks all DMA to domain memory. + */ +#define IOMMUCAP_default_identity (1U << 0) + +/** + * IOMMU_MAP_cache support. + */ +#define IOMMUCAP_cache (1U << 1) + +/** + * If set, IOMMU_alloc_nested and IOMMU_flush_nested are supported. + */ +#define IOMMUCAP_nested (1U << 2) + +/** + * If set, IOMMU_attach_pasid and IOMMU_detach_pasid are supported and + * a device PASID can be specified in reattach_context. + */ +#define IOMMUCAP_pasid (1U << 3) + +/** + * If set, IOMMU_ALLOC_identity is supported in pv_iommu_alloc. + */ +#define IOMMUCAP_identity (1U << 4) + +/** + * IOMMU_query_capabilities + * Query PV-IOMMU capabilities for this domain. + */ +struct pv_iommu_capabilities { + /* + * OUT: Maximum device address (iova) that the guest can use for mappi= ngs. + */ + uint64_aligned_t max_iova_addr; + + /* OUT: IOMMU capabilities flags */ + uint32_t cap_flags; + + /* OUT: Mask of all supported page sizes. */ + uint32_t pgsize_mask; + + /* OUT: Maximum pasid (if IOMMUCAP_pasid) */ + uint32_t max_pasid; + + /* OUT: Maximum number of IOMMU context this domain can use. */ + uint16_t max_ctx_no; + + uint16_t pad0; +}; +typedef struct pv_iommu_capabilities pv_iommu_capabilities_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_capabilities_t); + +/** + * IOMMU_init + * Initialize PV-IOMMU for this domain. + * + * Fails with -EACCESS if PV-IOMMU is already initialized. + */ +struct pv_iommu_init { + /* IN: Maximum number of IOMMU context this domain can use. */ + uint32_t max_ctx_no; + + /* IN: Arena size in pages (in power of two) */ + uint32_t arena_order; +}; +typedef struct pv_iommu_init pv_iommu_init_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_init_t); + +/** + * Create a 1:1 identity mapped context to domain memory + * (needs IOMMUCAP_identity). + */ +#define IOMMU_ALLOC_identity (1 << 0) + +/** + * IOMMU_alloc_context + * Allocate an IOMMU context. + * Fails with -ENOSPC if no context number is available. + */ +struct pv_iommu_alloc { + /* OUT: allocated IOMMU context number */ + uint16_t ctx_no; + + /* IN: allocation flags */ + uint32_t alloc_flags; +}; +typedef struct pv_iommu_alloc pv_iommu_alloc_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_alloc_t); + +/** + * Move all devices to default context before freeing the context. + */ +#define IOMMU_FREE_reattach_default (1 << 0) + +/** + * IOMMU_free_context + * Destroy a IOMMU context. + * + * If IOMMU_FREE_reattach_default is specified, move all context devices to + * default context before destroying this context. + * + * If there are devices in the context and IOMMU_FREE_reattach_default is = not + * specified, fail with -EBUSY. + * + * The default context can't be destroyed. + */ +struct pv_iommu_free { + /* IN: IOMMU context number to free */ + uint16_t ctx_no; + + /* IN: Free operation specific flags */ + uint32_t free_flags; +}; +typedef struct pv_iommu_free pv_iommu_free_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_free_t); + +/* Device has read access */ +#define IOMMU_MAP_readable (1 << 0) + +/* Device has write access */ +#define IOMMU_MAP_writeable (1 << 1) + +/* Enforce DMA coherency */ +#define IOMMU_MAP_cache (1 << 2) + +/** + * IOMMU_map_pages + * Map pages on a IOMMU context. + * + * pgsize must be supported by pgsize_mask. + * Fails with -EINVAL if mapping on top of another mapping. + * Report actually mapped page count in mapped field (regardless of failur= e). + */ +struct pv_iommu_map_pages { + /* IN: IOMMU context number */ + uint16_t ctx_no; + + /* IN: Guest frame number */ + uint64_aligned_t gfn; + + /* IN: Device frame number */ + uint64_aligned_t dfn; + + /* IN: Map flags */ + uint32_t map_flags; + + /* IN: Size of pages to map */ + uint32_t pgsize; + + /* IN: Number of pages to map */ + uint32_t nr_pages; + + /* OUT: Number of pages actually mapped */ + uint32_t mapped; +}; +typedef struct pv_iommu_map_pages pv_iommu_map_pages_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_map_pages_t); + +/** + * IOMMU_unmap_pages + * Unmap pages on a IOMMU context. + * + * pgsize must be supported by pgsize_mask. + * Report actually unmapped page count in mapped field (regardless of fail= ure). + * Fails with -ENOENT when attempting to unmap a page without any mapping + */ +struct pv_iommu_unmap_pages { + /* IN: IOMMU context number */ + uint16_t ctx_no; + + /* IN: Device frame number */ + uint64_aligned_t dfn; + + /* IN: Size of pages to unmap */ + uint32_t pgsize; + + /* IN: Number of pages to unmap */ + uint32_t nr_pages; + + /* OUT: Number of pages actually unmapped */ + uint32_t unmapped; +}; +typedef struct pv_iommu_unmap_pages pv_iommu_unmap_pages_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_unmap_pages_t); + +/** + * IOMMU_reattach_device + * Reattach a device to another IOMMU context. + * Fails with -ENODEV if no such device exist. + */ +struct pv_iommu_reattach_device { + /* IN: Target IOMMU context number */ + uint16_t ctx_no; + + /* IN: Physical device to move */ + struct physdev_pci_device dev; + + /* IN: PASID of the device (if IOMMUCAP_pasid) */ + uint32_t pasid; +}; +typedef struct pv_iommu_reattach_device pv_iommu_reattach_device_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_reattach_device_t); + + +/** + * IOMMU_remote_cmd + * Do a PV-IOMMU operation on another domain. + * Current domain needs to be allowed to act on the target domain, otherwi= se + * fails with -EPERM. + */ +struct pv_iommu_remote_cmd { + /* IN: Target domain to do the subop on */ + uint16_t domid; + + /* IN: Command to do on target domain. */ + uint16_t subop; + + /* INOUT: Command argument from current domain memory */ + XEN_GUEST_HANDLE(void) arg; +}; +typedef struct pv_iommu_remote_cmd pv_iommu_remote_cmd_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_remote_cmd_t); + +/** + * IOMMU_alloc_nested + * Create a nested IOMMU context (needs IOMMUCAP_nested). + * + * This context uses a platform-specific page table from domain address sp= ace + * specified in pgtable_gfn and use it for nested translations. + * + * Explicit flushes needs to be submited with IOMMU_flush_nested on + * modification of the nested pagetable to ensure coherency between IOTLB = and + * nested page table. + * + * This context can be destroyed using IOMMU_free_context. + * This context cannot be modified using map_pages, unmap_pages. + */ +struct pv_iommu_alloc_nested { + /* OUT: allocated IOMMU context number */ + uint16_t ctx_no; + + /* IN: guest frame number of the nested page table */ + uint64_aligned_t pgtable_gfn; + + /* IN: nested mode flags */ + uint64_aligned_t nested_flags; +}; +typedef struct pv_iommu_alloc_nested pv_iommu_alloc_nested_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_alloc_nested_t); + +/** + * IOMMU_flush_nested (needs IOMMUCAP_nested) + * Flush the IOTLB for nested translation. + */ +struct pv_iommu_flush_nested { + /* TODO */ +}; +typedef struct pv_iommu_flush_nested pv_iommu_flush_nested_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_flush_nested_t); + +/** + * IOMMU_attach_pasid (needs IOMMUCAP_pasid) + * Attach a new device-with-pasid to a IOMMU context. + * If a matching device-with-pasid already exists (globally), + * fail with -EEXIST. + * If pasid is 0, fails with -EINVAL. + * If physical device doesn't exist in domain, fail with -ENOENT. + */ +struct pv_iommu_attach_pasid { + /* IN: IOMMU context to add the device-with-pasid in */ + uint16_t ctx_no; + + /* IN: Physical device */ + struct physdev_pci_device dev; + + /* IN: pasid of the device to attach */ + uint32_t pasid; +}; +typedef struct pv_iommu_attach_pasid pv_iommu_attach_pasid_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_attach_pasid_t); + +/** + * IOMMU_detach_pasid (needs IOMMUCAP_pasid) + * detach a device-with-pasid. + * If the device-with-pasid doesn't exist or belong to the domain, + * fail with -ENOENT. + * If pasid is 0, fails with -EINVAL. + */ +struct pv_iommu_detach_pasid { + /* IN: Physical device */ + struct physdev_pci_device dev; + + /* pasid of the device to detach */ + uint32_t pasid; +}; +typedef struct pv_iommu_detach_pasid pv_iommu_detach_pasid_t; +DEFINE_XEN_GUEST_HANDLE(pv_iommu_detach_pasid_t); + +/* long do_iommu_op(int subop, XEN_GUEST_HANDLE_PARAM(void) arg) */ + +#endif \ No newline at end of file diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h index e051f989a5..d5bdedfee5 100644 --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -118,6 +118,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t); #define __HYPERVISOR_xenpmu_op 40 #define __HYPERVISOR_dm_op 41 #define __HYPERVISOR_hypfs_op 42 +#define __HYPERVISOR_iommu_op 43 =20 /* Architecture-specific hypercall definitions. */ #define __HYPERVISOR_arch_0 48 diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h index 15250da119..e115642b86 100644 --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -328,6 +328,8 @@ struct iommu_ops { */ int (*dt_xlate)(device_t *dev, const struct dt_phandle_args *args); #endif + + uint64_t (*get_max_iova)(struct domain *d); }; =20 /* @@ -409,6 +411,10 @@ struct domain_iommu { /* SAF-2-safe enum constant in arithmetic operation */ DECLARE_BITMAP(features, IOMMU_FEAT_count); =20 + + /* Is the domain allowed to use PV-IOMMU ? */ + bool allow_pv_iommu; + /* Does the guest share HAP mapping with the IOMMU? */ bool hap_pt_share; =20 @@ -446,6 +452,8 @@ static inline int iommu_do_domctl(struct xen_domctl *do= mctl, struct domain *d, } #endif =20 +int iommu_domain_pviommu_init(struct domain *d, uint16_t nb_ctx, uint32_t = arena_order); + int __must_check iommu_suspend(void); void iommu_resume(void); void iommu_crash_shutdown(void); @@ -461,6 +469,7 @@ int iommu_do_pci_domctl(struct xen_domctl *domctl, stru= ct domain *d, =20 void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev); =20 +uint64_t iommu_get_max_iova(struct domain *d); =20 struct iommu_context *iommu_get_context(struct domain *d, u16 ctx_id); void iommu_put_context(struct iommu_context *ctx); @@ -496,6 +505,8 @@ DECLARE_PER_CPU(bool, iommu_dont_flush_iotlb); extern struct spinlock iommu_pt_cleanup_lock; extern struct page_list_head iommu_pt_cleanup_list; =20 +int arch_iommu_pviommu_init(struct domain *d, uint16_t nb_ctx, uint32_t ar= ena_order); +int arch_iommu_pviommu_teardown(struct domain *d); bool arch_iommu_use_permitted(const struct domain *d); =20 #ifdef CONFIG_X86 --=20 2.47.2 Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech From nobody Sat Nov 1 23:25:40 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=quarantine dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1739787533; cv=none; d=zohomail.com; s=zohoarc; b=buJFnkgkW1BplVGlFsGdwqr9bTZiDDThNxEE3qsGnl5Fq5hvZ1hsZVPL82Mr65pRg2q1qDETblwR7XTl+qRT/hN7IBqohhLpzgOUskk4922wOEA+03DCC3aj2FoH8OitWOp1FsS/N/ElRJKHEL4EpWryriGtLHOc6NQvbG24D1g= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1739787533; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=gXb3wUVO5oKXKkBCZARASVXNl7WclyPN6YadW2p3Vio=; b=mY1T8eeIUW/6p3q8juREn6em6sfRBGs5+m5rPn5D5uMXNgOs3HSEgM8ek5BKJk1nwAQND0wnwK2ilOFPeB/T5JY9s7OLO+eOBcP/nTqKcJtbblwW4OuUCl1G4XRnjdia9CIFPbqf/f4/rkJNW7lM7PB0UpA9hW9Et3qBl3Qwd+w= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1739787533558442.728644733363; Mon, 17 Feb 2025 02:18:53 -0800 (PST) Received: from list by lists.xenproject.org with outflank-mailman.890008.1299093 (Exim 4.92) (envelope-from ) id 1tjyCr-0001fo-Ni; Mon, 17 Feb 2025 10:18:29 +0000 Received: by outflank-mailman (output) from mailman id 890008.1299093; Mon, 17 Feb 2025 10:18:29 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCr-0001ec-Hd; Mon, 17 Feb 2025 10:18:29 +0000 Received: by outflank-mailman (input) for mailman id 890008; Mon, 17 Feb 2025 10:18:27 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1tjyCp-0008Nl-8x for xen-devel@lists.xenproject.org; Mon, 17 Feb 2025 10:18:27 +0000 Received: from mail178-27.suw51.mandrillapp.com (mail178-27.suw51.mandrillapp.com [198.2.178.27]) by se1-gles-sth1.inumbo.com (Halon) with ESMTPS id 861780a1-ed18-11ef-9aa6-95dc52dad729; Mon, 17 Feb 2025 11:18:25 +0100 (CET) Received: from pmta13.mandrill.prod.suw01.rsglab.com (localhost [127.0.0.1]) by mail178-27.suw51.mandrillapp.com (Mailchimp) with ESMTP id 4YxJWz3Wf3z6CPyQP for ; Mon, 17 Feb 2025 10:18:23 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id 0b3be3f917cc4109bfb1d6d54bfdd02d; Mon, 17 Feb 2025 10:18:23 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 861780a1-ed18-11ef-9aa6-95dc52dad729 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1739787503; x=1740057503; bh=gXb3wUVO5oKXKkBCZARASVXNl7WclyPN6YadW2p3Vio=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=2DN702Xxo5b15o1FfO5cWPPO8ZFH20Iq3E21m5DpGIuLIUqcith/VMajomMM1Sk2Z rfc2zUp00DqGaBx0ZfCJihaxdfZXvzkfKQw+jkcsxEVJIEF53MJ/IvFDEzmjfFklLV GQejxcZ6EIUQSwQAqgtWBdtRg8nRwwuUNg4M5t+A3YvmDpzQzPm+OMHnCMrvdeMjOI KEf0xRuUYoJEzHy7bDm8csCtn2CVfywYOWTG8T7otF/5mIyUZnTv4IB12OSyU0SsUe xxe8sCckIEm39nScwzM6zCuyKp9fR+v5C7TQQ9klDnmL4EkVIe6vqZNixcFRk9bIRc ZJKM2I/zpXR9Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1739787503; x=1740048003; i=teddy.astie@vates.tech; bh=gXb3wUVO5oKXKkBCZARASVXNl7WclyPN6YadW2p3Vio=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=EBfgtgH6T162c1dLEThxISHY/VP9+Dw2nJuoTtsytkSS0nu3aMkSPzBcwdn6d3Kcp K9RJC2tThFwC1ZdXfpGobi+xGUXPWNmzZRGqftAfSbZMS841UhEXukj8wVbjqxzzpT ZAMrEhkvUgalfh4vd8TRtvWfLF5GbeGcjEqtbIrZJ8/hxzNfM6CVVPaViVXP88ILfz En/0fzA4RCh14rc9Kh1n76muCA7qTbRFLYQ5KxwLc++msILLZnIo+fhaJvGSC6Cgs3 s9g3JZaSlgMCkT9sNBE9APBpiGRN6fJV2fOf/DrY1L3jEyR0fj8IJCeHnyuXZP4R5z MMLMdmpExg7Nw== From: "Teddy Astie" Subject: =?utf-8?Q?[XEN=20RFC=20PATCH=20v6=2011/11]=20iommu:=20Introduce=20no-dma=20feature?= X-Mailer: git-send-email 2.47.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1739787502340 To: xen-devel@lists.xenproject.org Cc: "Teddy Astie" , "Andrew Cooper" , "Anthony PERARD" , "Michal Orzel" , "Jan Beulich" , "Julien Grall" , "=?utf-8?Q?Roger=20Pau=20Monn=C3=A9?=" , "Stefano Stabellini" Message-Id: <998adb8e82b0b4610d800b12b89d47e6341e565a.1739785339.git.teddy.astie@vates.tech> In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.0b3be3f917cc4109bfb1d6d54bfdd02d?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20250217:md Date: Mon, 17 Feb 2025 10:18:23 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @mandrillapp.com) (identity teddy.astie@vates.tech) X-ZM-MESSAGEID: 1739787535474019100 Content-Type: text/plain; charset="utf-8" This feature exposed through `dom0-iommu=3Dno-dma` prevents the devices of default context to have access to domain's memory. This basically enforces DMA protection by default. The domain will need to prepare a specific IOMMU context to do DMA. This feature needs the guest to provide a PV-IOMMU driver. Signed-off-by: Teddy Astie --- xen/common/pv-iommu.c | 3 +++ xen/drivers/passthrough/iommu.c | 10 ++++++++++ xen/drivers/passthrough/x86/iommu.c | 4 ++++ xen/include/xen/iommu.h | 3 +++ 4 files changed, 20 insertions(+) diff --git a/xen/common/pv-iommu.c b/xen/common/pv-iommu.c index a1315bf582..9c7d04b4c7 100644 --- a/xen/common/pv-iommu.c +++ b/xen/common/pv-iommu.c @@ -99,6 +99,9 @@ static long capabilities_op(struct pv_iommu_capabilities = *cap, struct domain *d) cap->max_pasid =3D 0; /* TODO */ cap->cap_flags =3D 0; =20 + if ( !dom_iommu(d)->no_dma ) + cap->cap_flags |=3D IOMMUCAP_default_identity; + cap->pgsize_mask =3D PAGE_SIZE_4K; =20 return 0; diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iomm= u.c index c26a2160f9..59a4c64915 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -55,6 +55,7 @@ static bool __hwdom_initdata iommu_hwdom_none; bool __hwdom_initdata iommu_hwdom_strict; bool __read_mostly iommu_hwdom_passthrough; bool __hwdom_initdata iommu_hwdom_inclusive; +bool __read_mostly iommu_hwdom_no_dma =3D false; int8_t __hwdom_initdata iommu_hwdom_reserved =3D -1; =20 #ifndef iommu_hap_pt_share @@ -172,6 +173,8 @@ static int __init cf_check parse_dom0_iommu_param(const= char *s) iommu_hwdom_reserved =3D val; else if ( !cmdline_strcmp(s, "none") ) iommu_hwdom_none =3D true; + else if ( (val =3D parse_boolean("dma", s, ss)) >=3D 0 ) + iommu_hwdom_no_dma =3D !val; else rc =3D -EINVAL; =20 @@ -329,6 +332,13 @@ int iommu_domain_init(struct domain *d, unsigned int o= pts) if ( !is_hardware_domain(d) || iommu_hwdom_strict ) hd->need_sync =3D !iommu_use_hap_pt(d); =20 + if ( hd->no_dma ) + { + /* No-DMA mode is exclusive with HAP and sync_pt. */ + hd->hap_pt_share =3D false; + hd->need_sync =3D false; + } + ASSERT(!(hd->need_sync && hd->hap_pt_share)); =20 hd->allow_pv_iommu =3D true; diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/= x86/iommu.c index 79efc6ad47..174c218b9b 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -529,6 +529,10 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain = *d) if ( iommu_hwdom_reserved =3D=3D -1 ) iommu_hwdom_reserved =3D 1; =20 + if ( iommu_hwdom_no_dma ) + /* Skip special mappings with no-dma mode */ + return; + if ( iommu_hwdom_inclusive ) { printk(XENLOG_WARNING diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h index e115642b86..fb38c1be86 100644 --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -106,6 +106,7 @@ extern bool iommu_debug; extern bool amd_iommu_perdev_intremap; =20 extern bool iommu_hwdom_strict, iommu_hwdom_passthrough, iommu_hwdom_inclu= sive; +extern bool iommu_hwdom_no_dma; extern int8_t iommu_hwdom_reserved; =20 extern unsigned int iommu_dev_iotlb_timeout; @@ -411,6 +412,8 @@ struct domain_iommu { /* SAF-2-safe enum constant in arithmetic operation */ DECLARE_BITMAP(features, IOMMU_FEAT_count); =20 + /* Do the IOMMU block all DMA on default context (implies !has_pt_shar= e) ? */ + bool no_dma; =20 /* Is the domain allowed to use PV-IOMMU ? */ bool allow_pv_iommu; --=20 2.47.2 Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech