From nobody Mon Nov 25 04:26:06 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=quarantine dis=none) header.from=vates.tech ARC-Seal: i=1; a=rsa-sha256; t=1718281041; cv=none; d=zohomail.com; s=zohoarc; b=jk5o7nOpZhvbAKDSjURVdTxyz9m/mLYYu/tmzePxc+YCxlF5TVtfu+xW8JyNpWMQac+5G/2+EvqlWXn/Wy1wTMMmpk8wrKSe6PeDBght9QgtfJNNEo4wwQAuZ6b+HCtJQFgrhYzdSyXF1dKrD0h7h7ANjYOVu6SNBNWWPVgkddU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1718281041; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=Oa/BVbybk5tb2G/uKn7Rk/ZHGV0IpjnUp3ozImGDpyM=; b=Emx4QV264w7aWrswIS7JqMetOcF3BAHn9OYPxvELX1gPOuyvfmGfN8Kgf4g85MMDdbArFNBG+tJkV1SgsEuNnopfviy2bd39zwrwDEUD4hWcKzbLnTPyzOwmaU64/4qidI4PylvGl5r2mnkHJ2P75evfCPuBXyKecz74sHcnO1E= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=teddy.astie@vates.tech; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 171828104099429.504022871576808; Thu, 13 Jun 2024 05:17:20 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.739889.1146888 (Exim 4.92) (envelope-from ) id 1sHjNv-0001JL-DM; Thu, 13 Jun 2024 12:16:55 +0000 Received: by outflank-mailman (output) from mailman id 739889.1146888; Thu, 13 Jun 2024 12:16:55 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1sHjNv-0001JA-9s; Thu, 13 Jun 2024 12:16:55 +0000 Received: by outflank-mailman (input) for mailman id 739889; Thu, 13 Jun 2024 12:16:54 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1sHjNu-0008Rn-9Q for xen-devel@lists.xenproject.org; Thu, 13 Jun 2024 12:16:54 +0000 Received: from mail187-11.suw11.mandrillapp.com (mail187-11.suw11.mandrillapp.com [198.2.187.11]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id d0050012-297e-11ef-b4bb-af5377834399; Thu, 13 Jun 2024 14:16:51 +0200 (CEST) Received: from pmta09.mandrill.prod.suw01.rsglab.com (localhost [127.0.0.1]) by mail187-11.suw11.mandrillapp.com (Mailchimp) with ESMTP id 4W0LxZ1bqyzLfMFLl for ; Thu, 13 Jun 2024 12:16:50 +0000 (GMT) Received: from [37.26.189.201] by mandrillapp.com id 585464d0f5044fc79899a47ef17fb25e; Thu, 13 Jun 2024 12:16:50 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: d0050012-297e-11ef-b4bb-af5377834399 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mandrillapp.com; s=mte1; t=1718281010; x=1718541510; bh=Oa/BVbybk5tb2G/uKn7Rk/ZHGV0IpjnUp3ozImGDpyM=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=oTOqQx1qVD645I4gdcg+g1sD7EFRdFI24HkBlUzcy2Tv1b1QDYqyxjFRafvwnAxzI xReROCMXc/vX5LOSqZeuooW5mEeiSlR4goWmoQNAv7XFOFU7gDDO3KBCBHBxxtEf8z LaWy1NHnsUu+Qw0yeO7TR+Gy15swOlVlw0nhbPB0KdN/qwP8k27UoKhs71SMlm/4/J 0bf695mExdK988ByOjd+30m+6ldhrXtcvpu3rzcsSYaN6pKNs86EyWK/OwzhZglo5r fMLZ/usGhyDvCzmKLmMSRWsRrpCVdYW4A4AMQ8OrY6taqLuNSy1C/T0fAz3PlOn+a2 5NMbs3ZkkrZGA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vates.tech; s=mte1; t=1718281010; x=1718541510; i=teddy.astie@vates.tech; bh=Oa/BVbybk5tb2G/uKn7Rk/ZHGV0IpjnUp3ozImGDpyM=; h=From:Subject:To:Cc:Message-Id:In-Reply-To:References:Feedback-ID: Date:MIME-Version:Content-Type:Content-Transfer-Encoding:CC:Date: Subject:From; b=JvkJNbGMBClGuqqCbefzFckbO/1FEsyOQtgBpgPwPmcBpfTrD99YKEyB98kPKzc3m lBaSCUzjOMUcEqxP5YFnTRVSoPZgjJGNlSuRihYP7N1FKfeqfmTzO5wUWEDAkcuU4C 7wj0j2asV2NomUb6ck7EJJFf2vAy5mb58/4i3KhbSHNfqK2MhmA41Qe1OYXURnliqz xrfm43BWdGPpWbzpZWpxpz7hMJaQ9Dw1/bU+TZYTK4MbZkV7VvHHUy8ulhXlwnio6G sMGxK5JaNPPXKz4PgYEz8Z2GtpPZshwOUR+diVLUW2T0RNXisDLJLvu6+bSdS1LPv8 ExoLjyq61ne5w== From: Teddy Astie Subject: =?utf-8?Q?[RFC=20XEN=20PATCH=203/5]=20IOMMU:=20Introduce=20redesigned=20IOMMU=20subsystem?= X-Mailer: git-send-email 2.45.2 X-Bm-Disclaimer: Yes X-Bm-Milter-Handled: 4ffbd6c1-ee69-4e1b-aabd-f977039bd3e2 X-Bm-Transport-Timestamp: 1718281003958 To: xen-devel@lists.xenproject.org Cc: Teddy Astie , Jan Beulich , Andrew Cooper , =?utf-8?Q?Roger=20Pau=20Monn=C3=A9?= , George Dunlap , Julien Grall , Stefano Stabellini , Lukasz Hawrylko , "Daniel P. Smith" , =?utf-8?Q?Mateusz=20M=C3=B3wka?= , =?utf-8?Q?Marek=20Marczykowski-G=C3=B3recki?= Message-Id: <99d93c1a8100c0d20d40d80c0e94f46f906a986b.1718269097.git.teddy.astie@vates.tech> In-Reply-To: References: X-Native-Encoded: 1 X-Report-Abuse: =?UTF-8?Q?Please=20forward=20a=20copy=20of=20this=20message,=20including=20all=20headers,=20to=20abuse@mandrill.com.=20You=20can=20also=20report=20abuse=20here:=20https://mandrillapp.com/contact/abuse=3Fid=3D30504962.585464d0f5044fc79899a47ef17fb25e?= X-Mandrill-User: md_30504962 Feedback-ID: 30504962:30504962.20240613:md Date: Thu, 13 Jun 2024 12:16:50 +0000 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity teddy.astie@vates.tech) (identity @mandrillapp.com) X-ZM-MESSAGEID: 1718281041562100001 Content-Type: text/plain; charset="utf-8" Based on docs/designs/iommu-contexts.md, implement the redesigned IOMMU sub= system. Signed-off-by Teddy Astie --- Missing in this RFC Quarantine implementation is incomplete Automatic determination of max ctx_no (maximum IOMMU context count) using on PCI device count. Automatic determination of max ctx_no (for dom_io). Empty/no default IOMMU context mode (UEFI IOMMU based boot). Support for DomU (and configuration using e.g libxl). --- xen/arch/x86/domain.c | 2 +- xen/arch/x86/mm/p2m-ept.c | 2 +- xen/arch/x86/pv/dom0_build.c | 4 +- xen/arch/x86/tboot.c | 4 +- xen/common/memory.c | 4 +- xen/drivers/passthrough/Makefile | 3 + xen/drivers/passthrough/context.c | 626 +++++++++++++++++++++++++++ xen/drivers/passthrough/iommu.c | 333 ++++---------- xen/drivers/passthrough/pci.c | 49 ++- xen/drivers/passthrough/quarantine.c | 49 +++ xen/include/xen/iommu.h | 118 ++++- xen/include/xen/pci.h | 3 + 12 files changed, 897 insertions(+), 300 deletions(-) create mode 100644 xen/drivers/passthrough/context.c create mode 100644 xen/drivers/passthrough/quarantine.c diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 00a3aaa576..52de634c81 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -2381,7 +2381,7 @@ int domain_relinquish_resources(struct domain *d) =20 PROGRESS(iommu_pagetables): =20 - ret =3D iommu_free_pgtables(d); + ret =3D iommu_free_pgtables(d, iommu_default_context(d)); if ( ret ) return ret; =20 diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index f83610cb8c..94c3631818 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -970,7 +970,7 @@ out: rc =3D iommu_iotlb_flush(d, _dfn(gfn), 1ul << order, (iommu_flags ? IOMMU_FLUSHF_added : 0) | (vtd_pte_present ? IOMMU_FLUSHF_modified - : 0)); + : 0), 0); else if ( need_iommu_pt_sync(d) ) rc =3D iommu_flags ? iommu_legacy_map(d, _dfn(gfn), mfn, 1ul << order, iommu_fl= ags) : diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c index d8043fa58a..db7298737d 100644 --- a/xen/arch/x86/pv/dom0_build.c +++ b/xen/arch/x86/pv/dom0_build.c @@ -76,7 +76,7 @@ static __init void mark_pv_pt_pages_rdonly(struct domain = *d, * iommu_memory_setup() ended up mapping them. */ if ( need_iommu_pt_sync(d) && - iommu_unmap(d, _dfn(mfn_x(page_to_mfn(page))), 1, 0, flush_fl= ags) ) + iommu_unmap(d, _dfn(mfn_x(page_to_mfn(page))), 1, 0, flush_fl= ags, 0) ) BUG(); =20 /* Read-only mapping + PGC_allocated + page-table page. */ @@ -127,7 +127,7 @@ static void __init iommu_memory_setup(struct domain *d,= const char *what, =20 while ( (rc =3D iommu_map(d, _dfn(mfn_x(mfn)), mfn, nr, IOMMUF_readable | IOMMUF_writable | IOMMUF_pre= empt, - flush_flags)) > 0 ) + flush_flags, 0)) > 0 ) { mfn =3D mfn_add(mfn, rc); nr -=3D rc; diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c index ba0700d2d5..ca55306830 100644 --- a/xen/arch/x86/tboot.c +++ b/xen/arch/x86/tboot.c @@ -216,9 +216,9 @@ static void tboot_gen_domain_integrity(const uint8_t ke= y[TB_KEY_SIZE], =20 if ( is_iommu_enabled(d) && is_vtd ) { - const struct domain_iommu *dio =3D dom_iommu(d); + struct domain_iommu *dio =3D dom_iommu(d); =20 - update_iommu_mac(&ctx, dio->arch.vtd.pgd_maddr, + update_iommu_mac(&ctx, iommu_default_context(d)->arch.vtd.pgd_= maddr, agaw_to_level(dio->arch.vtd.agaw)); } } diff --git a/xen/common/memory.c b/xen/common/memory.c index de2cc7ad92..0eb0f9da7b 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -925,7 +925,7 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_= add_to_physmap *xatp, this_cpu(iommu_dont_flush_iotlb) =3D 0; =20 ret =3D iommu_iotlb_flush(d, _dfn(xatp->idx - done), done, - IOMMU_FLUSHF_modified); + IOMMU_FLUSHF_modified, 0); if ( unlikely(ret) && rc >=3D 0 ) rc =3D ret; =20 @@ -939,7 +939,7 @@ int xenmem_add_to_physmap(struct domain *d, struct xen_= add_to_physmap *xatp, put_page(pages[i]); =20 ret =3D iommu_iotlb_flush(d, _dfn(xatp->gpfn - done), done, - IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified= ); + IOMMU_FLUSHF_added | IOMMU_FLUSHF_modified= , 0); if ( unlikely(ret) && rc >=3D 0 ) rc =3D ret; } diff --git a/xen/drivers/passthrough/Makefile b/xen/drivers/passthrough/Mak= efile index a1621540b7..69327080ab 100644 --- a/xen/drivers/passthrough/Makefile +++ b/xen/drivers/passthrough/Makefile @@ -4,6 +4,9 @@ obj-$(CONFIG_X86) +=3D x86/ obj-$(CONFIG_ARM) +=3D arm/ =20 obj-y +=3D iommu.o +obj-y +=3D context.o +obj-y +=3D quarantine.o + obj-$(CONFIG_HAS_PCI) +=3D pci.o obj-$(CONFIG_HAS_DEVICE_TREE) +=3D device_tree.o obj-$(CONFIG_HAS_PCI) +=3D ats.o diff --git a/xen/drivers/passthrough/context.c b/xen/drivers/passthrough/co= ntext.c new file mode 100644 index 0000000000..3cc7697164 --- /dev/null +++ b/xen/drivers/passthrough/context.c @@ -0,0 +1,626 @@ +/* + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License f= or + * more details. + * + * You should have received a copy of the GNU General Public License along= with + * this program; If not, see . + */ + +#include +#include +#include +#include +#include +#include + +bool iommu_check_context(struct domain *d, u16 ctx_no) { + struct domain_iommu *hd =3D dom_iommu(d); + + if (ctx_no =3D=3D 0) + return 1; /* Default context always exist. */ + + if ((ctx_no - 1) >=3D hd->other_contexts.count) + return 0; /* out of bounds */ + + return test_bit(ctx_no - 1, hd->other_contexts.bitmap); +} + +struct iommu_context *iommu_get_context(struct domain *d, u16 ctx_no) { + struct domain_iommu *hd =3D dom_iommu(d); + + if (!iommu_check_context(d, ctx_no)) + return NULL; + + if (ctx_no =3D=3D 0) + return &hd->default_ctx; + else + return &hd->other_contexts.map[ctx_no - 1]; +} + +static unsigned int mapping_order(const struct domain_iommu *hd, + dfn_t dfn, mfn_t mfn, unsigned long nr) +{ + unsigned long res =3D dfn_x(dfn) | mfn_x(mfn); + unsigned long sizes =3D hd->platform_ops->page_sizes; + unsigned int bit =3D find_first_set_bit(sizes), order =3D 0; + + ASSERT(bit =3D=3D PAGE_SHIFT); + + while ( (sizes =3D (sizes >> bit) & ~1) ) + { + unsigned long mask; + + bit =3D find_first_set_bit(sizes); + mask =3D (1UL << bit) - 1; + if ( nr <=3D mask || (res & mask) ) + break; + order +=3D bit; + nr >>=3D bit; + res >>=3D bit; + } + + return order; +} + +long _iommu_map(struct domain *d, dfn_t dfn0, mfn_t mfn0, + unsigned long page_count, unsigned int flags, + unsigned int *flush_flags, u16 ctx_no) +{ + struct domain_iommu *hd =3D dom_iommu(d); + unsigned long i; + unsigned int order, j =3D 0; + int rc =3D 0; + + if ( !is_iommu_enabled(d) ) + return 0; + + if (!iommu_check_context(d, ctx_no)) + return -ENOENT; + + ASSERT(!IOMMUF_order(flags)); + + for ( i =3D 0; i < page_count; i +=3D 1UL << order ) + { + dfn_t dfn =3D dfn_add(dfn0, i); + mfn_t mfn =3D mfn_add(mfn0, i); + + order =3D mapping_order(hd, dfn, mfn, page_count - i); + + if ( (flags & IOMMUF_preempt) && + ((!(++j & 0xfff) && general_preempt_check()) || + i > LONG_MAX - (1UL << order)) ) + return i; + + rc =3D iommu_call(hd->platform_ops, map_page, d, dfn, mfn, + flags | IOMMUF_order(order), flush_flags, + iommu_get_context(d, ctx_no)); + + if ( likely(!rc) ) + continue; + + if ( !d->is_shutting_down && printk_ratelimit() ) + printk(XENLOG_ERR + "d%d: IOMMU mapping dfn %"PRI_dfn" to mfn %"PRI_mfn" fa= iled: %d\n", + d->domain_id, dfn_x(dfn), mfn_x(mfn), rc); + + /* while statement to satisfy __must_check */ + while ( _iommu_unmap(d, dfn0, i, 0, flush_flags, ctx_no) ) + break; + + if ( !ctx_no && !is_hardware_domain(d) ) + domain_crash(d); + + break; + } + + /* + * Something went wrong so, if we were dealing with more than a single + * page, flush everything and clear flush flags. + */ + if ( page_count > 1 && unlikely(rc) && + !iommu_iotlb_flush_all(d, *flush_flags) ) + *flush_flags =3D 0; + + return rc; +} + +long iommu_map(struct domain *d, dfn_t dfn0, mfn_t mfn0, + unsigned long page_count, unsigned int flags, + unsigned int *flush_flags, u16 ctx_no) +{ + struct domain_iommu *hd =3D dom_iommu(d); + long ret; + + spin_lock(&hd->lock); + ret =3D _iommu_map(d, dfn0, mfn0, page_count, flags, flush_flags, ctx_= no); + spin_unlock(&hd->lock); + + return ret; +} + +int iommu_legacy_map(struct domain *d, dfn_t dfn, mfn_t mfn, + unsigned long page_count, unsigned int flags) +{ + struct domain_iommu *hd =3D dom_iommu(d); + unsigned int flush_flags =3D 0; + int rc; + + ASSERT(!(flags & IOMMUF_preempt)); + + spin_lock(&hd->lock); + rc =3D _iommu_map(d, dfn, mfn, page_count, flags, &flush_flags, 0); + + if ( !this_cpu(iommu_dont_flush_iotlb) && !rc ) + rc =3D _iommu_iotlb_flush(d, dfn, page_count, flush_flags, 0); + spin_unlock(&hd->lock); + + return rc; +} + +long iommu_unmap(struct domain *d, dfn_t dfn0, unsigned long page_count, + unsigned int flags, unsigned int *flush_flags, + u16 ctx_no) +{ + struct domain_iommu *hd =3D dom_iommu(d); + long ret; + + spin_lock(&hd->lock); + ret =3D _iommu_unmap(d, dfn0, page_count, flags, flush_flags, ctx_no); + spin_unlock(&hd->lock); + + return ret; +} + +long _iommu_unmap(struct domain *d, dfn_t dfn0, unsigned long page_count, + unsigned int flags, unsigned int *flush_flags, + u16 ctx_no) +{ + struct domain_iommu *hd =3D dom_iommu(d); + unsigned long i; + unsigned int order, j =3D 0; + int rc =3D 0; + + if ( !is_iommu_enabled(d) ) + return 0; + + if ( !iommu_check_context(d, ctx_no) ) + return -ENOENT; + + ASSERT(!(flags & ~IOMMUF_preempt)); + + for ( i =3D 0; i < page_count; i +=3D 1UL << order ) + { + dfn_t dfn =3D dfn_add(dfn0, i); + int err; + + order =3D mapping_order(hd, dfn, _mfn(0), page_count - i); + + if ( (flags & IOMMUF_preempt) && + ((!(++j & 0xfff) && general_preempt_check()) || + i > LONG_MAX - (1UL << order)) ) + return i; + + err =3D iommu_call(hd->platform_ops, unmap_page, d, dfn, + flags | IOMMUF_order(order), flush_flags, + iommu_get_context(d, ctx_no)); + + if ( likely(!err) ) + continue; + + if ( !d->is_shutting_down && printk_ratelimit() ) + printk(XENLOG_ERR + "d%d: IOMMU unmapping dfn %"PRI_dfn" failed: %d\n", + d->domain_id, dfn_x(dfn), err); + + if ( !rc ) + rc =3D err; + + if ( !is_hardware_domain(d) ) + { + domain_crash(d); + break; + } + } + + /* + * Something went wrong so, if we were dealing with more than a single + * page, flush everything and clear flush flags. + */ + if ( page_count > 1 && unlikely(rc) && + !iommu_iotlb_flush_all(d, *flush_flags) ) + *flush_flags =3D 0; + + return rc; +} + +int iommu_legacy_unmap(struct domain *d, dfn_t dfn, unsigned long page_cou= nt) +{ + unsigned int flush_flags =3D 0; + struct domain_iommu *hd =3D dom_iommu(d); + int rc; + + spin_lock(&hd->lock); + rc =3D _iommu_unmap(d, dfn, page_count, 0, &flush_flags, 0); + + if ( !this_cpu(iommu_dont_flush_iotlb) && !rc ) + rc =3D _iommu_iotlb_flush(d, dfn, page_count, flush_flags, 0); + spin_unlock(&hd->lock); + + return rc; +} + +int iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn, + unsigned int *flags, u16 ctx_no) +{ + struct domain_iommu *hd =3D dom_iommu(d); + int ret; + + if ( !is_iommu_enabled(d) || !hd->platform_ops->lookup_page ) + return -EOPNOTSUPP; + + if (!iommu_check_context(d, ctx_no)) + return -ENOENT; + + spin_lock(&hd->lock); + ret =3D iommu_call(hd->platform_ops, lookup_page, d, dfn, mfn, flags, = iommu_get_context(d, ctx_no)); + spin_unlock(&hd->lock); + + return ret; +} + +int _iommu_iotlb_flush(struct domain *d, dfn_t dfn, unsigned long page_cou= nt, + unsigned int flush_flags, u16 ctx_no) +{ + struct domain_iommu *hd =3D dom_iommu(d); + int rc; + + if ( !is_iommu_enabled(d) || !hd->platform_ops->iotlb_flush || + !page_count || !flush_flags ) + return 0; + + if ( dfn_eq(dfn, INVALID_DFN) ) + return -EINVAL; + + if ( !iommu_check_context(d, ctx_no) ) { + spin_unlock(&hd->lock); + return -ENOENT; + } + + rc =3D iommu_call(hd->platform_ops, iotlb_flush, d, iommu_get_context(= d, ctx_no), + dfn, page_count, flush_flags); + if ( unlikely(rc) ) + { + if ( !d->is_shutting_down && printk_ratelimit() ) + printk(XENLOG_ERR + "d%d: IOMMU IOTLB flush failed: %d, dfn %"PRI_dfn", pag= e count %lu flags %x\n", + d->domain_id, rc, dfn_x(dfn), page_count, flush_flags); + + if ( !is_hardware_domain(d) ) + domain_crash(d); + } + + return rc; +} + +int iommu_iotlb_flush(struct domain *d, dfn_t dfn, unsigned long page_coun= t, + unsigned int flush_flags, u16 ctx_no) +{ + struct domain_iommu *hd =3D dom_iommu(d); + int ret; + + spin_lock(&hd->lock); + ret =3D _iommu_iotlb_flush(d, dfn, page_count, flush_flags, ctx_no); + spin_unlock(&hd->lock); + + return ret; +} + +int iommu_context_init(struct domain *d, struct iommu_context *ctx, u16 ct= x_no, u32 flags) +{ + if ( !dom_iommu(d)->platform_ops->context_init ) + return -ENOSYS; + + INIT_LIST_HEAD(&ctx->devices); + ctx->id =3D ctx_no; + ctx->dying =3D false; + + return iommu_call(dom_iommu(d)->platform_ops, context_init, d, ctx, fl= ags); +} + +int iommu_context_alloc(struct domain *d, u16 *ctx_no, u32 flags) +{ + unsigned int i; + int ret; + struct domain_iommu *hd =3D dom_iommu(d); + + spin_lock(&hd->lock); + + /* TODO: use TSL instead ? */ + i =3D find_first_zero_bit(hd->other_contexts.bitmap, hd->other_context= s.count); + + if ( i < hd->other_contexts.count ) + set_bit(i, hd->other_contexts.bitmap); + + if ( i >=3D hd->other_contexts.count ) /* no free context */ + return -ENOSPC; + + *ctx_no =3D i + 1; + + ret =3D iommu_context_init(d, iommu_get_context(d, *ctx_no), *ctx_no, = flags); + + if ( ret ) + __clear_bit(*ctx_no, hd->other_contexts.bitmap); + + spin_unlock(&hd->lock); + + return ret; +} + +int _iommu_attach_context(struct domain *d, device_t *dev, u16 ctx_no) +{ + struct iommu_context *ctx; + int ret; + + pcidevs_lock(); + + if ( !iommu_check_context(d, ctx_no) ) + { + ret =3D -ENOENT; + goto unlock; + } + + ctx =3D iommu_get_context(d, ctx_no); + + if ( ctx->dying ) + { + ret =3D -EINVAL; + goto unlock; + } + + ret =3D iommu_call(dom_iommu(d)->platform_ops, attach, d, dev, ctx); + + if ( !ret ) + { + dev->context =3D ctx_no; + list_add(&dev->context_list, &ctx->devices); + } + +unlock: + pcidevs_unlock(); + return ret; +} + +int iommu_attach_context(struct domain *d, device_t *dev, u16 ctx_no) +{ + struct domain_iommu *hd =3D dom_iommu(d); + int ret; + + spin_lock(&hd->lock); + ret =3D _iommu_attach_context(d, dev, ctx_no); + spin_unlock(&hd->lock); + + return ret; +} + +int _iommu_dettach_context(struct domain *d, device_t *dev) +{ + struct iommu_context *ctx; + int ret; + + if (!dev->domain) + { + printk("IOMMU: Trying to dettach a non-attached device."); + WARN(); + return 0; + } + + /* Make sure device is actually in the domain. */ + ASSERT(d =3D=3D dev->domain); + + pcidevs_lock(); + + ctx =3D iommu_get_context(d, dev->context); + ASSERT(ctx); /* device is using an invalid context ? + dev->context invalid ? */ + + ret =3D iommu_call(dom_iommu(d)->platform_ops, dettach, d, dev, ctx); + + if ( !ret ) + { + list_del(&dev->context_list); + + /** TODO: Do we need to remove the device from domain ? + * Reattaching to something (quarantine, hardware domain ?) + */ + + /* + * rcu_lock_domain ? + * list_del(&dev->domain_list); + * dev->domain =3D ?; + */ + } + + pcidevs_unlock(); + return ret; +} + +int iommu_dettach_context(struct domain *d, device_t *dev) +{ + int ret; + struct domain_iommu *hd =3D dom_iommu(d); + + spin_lock(&hd->lock); + ret =3D _iommu_dettach_context(d, dev); + spin_unlock(&hd->lock); + + return ret; +} + +int _iommu_reattach_context(struct domain *prev_dom, struct domain *next_d= om, + device_t *dev, u16 ctx_no) +{ + struct domain_iommu *hd; + u16 prev_ctx_no; + device_t *ctx_dev; + struct iommu_context *prev_ctx, *next_ctx; + int ret; + bool same_domain; + + /* Make sure we actually are doing something meaningful */ + BUG_ON(!prev_dom && !next_dom); + + /// TODO: Do such cases exists ? + // /* Platform ops must match */ + // if (dom_iommu(prev_dom)->platform_ops !=3D dom_iommu(next_dom)->pla= tform_ops) + // return -EINVAL; + + pcidevs_lock(); + + if (!prev_dom) + return _iommu_attach_context(next_dom, dev, ctx_no); + + if (!next_dom) + return _iommu_dettach_context(prev_dom, dev); + + hd =3D dom_iommu(prev_dom); + same_domain =3D prev_dom =3D=3D next_dom; + + prev_ctx_no =3D dev->context; + + if ( !same_domain && (ctx_no =3D=3D prev_ctx_no) ) + { + printk(XENLOG_DEBUG "Reattaching %pp to same IOMMU context c%hu\n"= , &dev, ctx_no); + ret =3D 0; + goto unlock; + } + + if ( !iommu_check_context(next_dom, ctx_no) ) + { + ret =3D -ENOENT; + goto unlock; + } + + prev_ctx =3D iommu_get_context(prev_dom, prev_ctx_no); + next_ctx =3D iommu_get_context(next_dom, ctx_no); + + if ( next_ctx->dying ) + { + ret =3D -EINVAL; + goto unlock; + } + + ret =3D iommu_call(hd->platform_ops, reattach, next_dom, dev, prev_ctx, + next_ctx); + + if ( ret ) + goto unlock; + + /* Remove device from previous context, and add it to new one. */ + list_for_each_entry(ctx_dev, &prev_ctx->devices, context_list) + { + if ( ctx_dev =3D=3D dev ) + { + list_del(&ctx_dev->context_list); + list_add(&ctx_dev->context_list, &next_ctx->devices); + break; + } + } + + if ( !same_domain ) + { + /* Update domain pci devices accordingly */ + + /** TODO: should be done here or elsewhere ? */ + } + + if (!ret) + dev->context =3D ctx_no; /* update device context*/ + +unlock: + pcidevs_unlock(); + return ret; +} + +int iommu_reattach_context(struct domain *prev_dom, struct domain *next_do= m, + device_t *dev, u16 ctx_no) +{ + int ret; + struct domain_iommu *prev_hd =3D dom_iommu(prev_dom); + struct domain_iommu *next_hd =3D dom_iommu(next_dom); + + spin_lock(&prev_hd->lock); + + if (prev_dom !=3D next_dom) + spin_lock(&next_hd->lock); + + ret =3D _iommu_reattach_context(prev_dom, next_dom, dev, ctx_no); + + spin_unlock(&prev_hd->lock); + + if (prev_dom !=3D next_dom) + spin_unlock(&next_hd->lock); + + return ret; +} + +int _iommu_context_teardown(struct domain *d, struct iommu_context *ctx, u= 32 flags) +{ + struct domain_iommu *hd =3D dom_iommu(d); + + if ( !dom_iommu(d)->platform_ops->context_teardown ) + return -ENOSYS; + + ctx->dying =3D true; + + /* first reattach devices back to default context if needed */ + if ( flags & IOMMU_TEARDOWN_REATTACH_DEFAULT ) + { + struct pci_dev *device; + list_for_each_entry(device, &ctx->devices, context_list) + _iommu_reattach_context(d, d, device, 0); + } + else if (!list_empty(&ctx->devices)) + return -EBUSY; /* there is a device in context */ + + return iommu_call(hd->platform_ops, context_teardown, d, ctx, flags); +} + +int iommu_context_teardown(struct domain *d, struct iommu_context *ctx, u3= 2 flags) +{ + struct domain_iommu *hd =3D dom_iommu(d); + int ret; + + spin_lock(&hd->lock); + ret =3D _iommu_context_teardown(d, ctx, flags); + spin_unlock(&hd->lock); + + return ret; +} + +int iommu_context_free(struct domain *d, u16 ctx_no, u32 flags) +{ + int ret; + struct domain_iommu *hd =3D dom_iommu(d); + + if ( ctx_no =3D=3D 0 ) + return -EINVAL; + + spin_lock(&hd->lock); + if ( !iommu_check_context(d, ctx_no) ) + return -ENOENT; + + ret =3D _iommu_context_teardown(d, iommu_get_context(d, ctx_no), flags= ); + + if ( !ret ) + clear_bit(ctx_no - 1, hd->other_contexts.bitmap); + + spin_unlock(&hd->lock); + + return ret; +} diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iomm= u.c index ba18136c46..a9e2a8a49b 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -12,6 +12,7 @@ * this program; If not, see . */ =20 +#include #include #include #include @@ -21,6 +22,10 @@ #include #include #include +#include +#include +#include +#include =20 #ifdef CONFIG_X86 #include @@ -35,22 +40,6 @@ bool __read_mostly force_iommu; bool __read_mostly iommu_verbose; static bool __read_mostly iommu_crash_disable; =20 -#define IOMMU_quarantine_none 0 /* aka false */ -#define IOMMU_quarantine_basic 1 /* aka true */ -#define IOMMU_quarantine_scratch_page 2 -#ifdef CONFIG_HAS_PCI -uint8_t __read_mostly iommu_quarantine =3D -# if defined(CONFIG_IOMMU_QUARANTINE_NONE) - IOMMU_quarantine_none; -# elif defined(CONFIG_IOMMU_QUARANTINE_BASIC) - IOMMU_quarantine_basic; -# elif defined(CONFIG_IOMMU_QUARANTINE_SCRATCH_PAGE) - IOMMU_quarantine_scratch_page; -# endif -#else -# define iommu_quarantine IOMMU_quarantine_none -#endif /* CONFIG_HAS_PCI */ - static bool __hwdom_initdata iommu_hwdom_none; bool __hwdom_initdata iommu_hwdom_strict; bool __read_mostly iommu_hwdom_passthrough; @@ -61,6 +50,13 @@ int8_t __hwdom_initdata iommu_hwdom_reserved =3D -1; bool __read_mostly iommu_hap_pt_share =3D true; #endif =20 +uint16_t __read_mostly iommu_hwdom_nb_ctx =3D 8; +bool __read_mostly iommu_hwdom_nb_ctx_forced =3D false; + +#ifdef CONFIG_X86 +unsigned int __read_mostly iommu_hwdom_arena_order =3D CONFIG_X86_ARENA_OR= DER; +#endif + bool __read_mostly iommu_debug; =20 DEFINE_PER_CPU(bool, iommu_dont_flush_iotlb); @@ -156,6 +152,7 @@ static int __init cf_check parse_dom0_iommu_param(const= char *s) int rc =3D 0; =20 do { + long long ll_val; int val; =20 ss =3D strchr(s, ','); @@ -172,6 +169,20 @@ static int __init cf_check parse_dom0_iommu_param(cons= t char *s) iommu_hwdom_reserved =3D val; else if ( !cmdline_strcmp(s, "none") ) iommu_hwdom_none =3D true; + else if ( !parse_signed_integer("nb-ctx", s, ss, &ll_val) ) + { + if (ll_val > 0 && ll_val < UINT16_MAX) + iommu_hwdom_nb_ctx =3D ll_val; + else + printk(XENLOG_WARNING "'nb-ctx=3D%lld' value out of range!= \n", ll_val); + } + else if ( !parse_signed_integer("arena-order", s, ss, &ll_val) ) + { + if (ll_val > 0) + iommu_hwdom_arena_order =3D ll_val; + else + printk(XENLOG_WARNING "'arena-order=3D%lld' value out of r= ange!\n", ll_val); + } else rc =3D -EINVAL; =20 @@ -193,9 +204,26 @@ static void __hwdom_init check_hwdom_reqs(struct domai= n *d) arch_iommu_check_autotranslated_hwdom(d); } =20 +uint16_t __hwdom_init iommu_hwdom_ctx_count(void) +{ + if (iommu_hwdom_nb_ctx_forced) + return iommu_hwdom_nb_ctx; + + /* TODO: Find a proper way of counting devices ? */ + return 256; + + /* + if (iommu_hwdom_nb_ctx !=3D UINT16_MAX) + iommu_hwdom_nb_ctx++; + else + printk(XENLOG_WARNING " IOMMU: Can't prepare more contexts: too mu= ch devices"); + */ +} + int iommu_domain_init(struct domain *d, unsigned int opts) { struct domain_iommu *hd =3D dom_iommu(d); + uint16_t other_context_count; int ret =3D 0; =20 if ( is_hardware_domain(d) ) @@ -236,6 +264,37 @@ int iommu_domain_init(struct domain *d, unsigned int o= pts) =20 ASSERT(!(hd->need_sync && hd->hap_pt_share)); =20 + iommu_hwdom_nb_ctx =3D iommu_hwdom_ctx_count(); + + if ( is_hardware_domain(d) ) + { + BUG_ON(iommu_hwdom_nb_ctx =3D=3D 0); /* sanity check (prevent unde= rflow) */ + printk(XENLOG_INFO "Dom0 uses %lu IOMMU contexts\n", + (unsigned long)iommu_hwdom_nb_ctx); + hd->other_contexts.count =3D iommu_hwdom_nb_ctx - 1; + } + else if ( d =3D=3D dom_io ) + { + /* TODO: Determine count differently */ + hd->other_contexts.count =3D 128; + } + else + hd->other_contexts.count =3D 0; + + other_context_count =3D hd->other_contexts.count; + if (other_context_count > 0) { + /* Initialize context bitmap */ + hd->other_contexts.bitmap =3D xzalloc_array(unsigned long, + BITS_TO_LONGS(other_cont= ext_count)); + hd->other_contexts.map =3D xzalloc_array(struct iommu_context, + other_context_count); + } else { + hd->other_contexts.bitmap =3D NULL; + hd->other_contexts.map =3D NULL; + } + + iommu_context_init(d, &hd->default_ctx, 0, IOMMU_CONTEXT_INIT_default); + return 0; } =20 @@ -249,13 +308,12 @@ static void cf_check iommu_dump_page_tables(unsigned = char key) =20 for_each_domain(d) { - if ( is_hardware_domain(d) || !is_iommu_enabled(d) ) + if ( !is_iommu_enabled(d) ) continue; =20 if ( iommu_use_hap_pt(d) ) { printk("%pd sharing page tables\n", d); - continue; } =20 iommu_vcall(dom_iommu(d)->platform_ops, dump_page_tables, d); @@ -276,10 +334,13 @@ void __hwdom_init iommu_hwdom_init(struct domain *d) iommu_vcall(hd->platform_ops, hwdom_init, d); } =20 -static void iommu_teardown(struct domain *d) +void iommu_domain_destroy(struct domain *d) { struct domain_iommu *hd =3D dom_iommu(d); =20 + if ( !is_iommu_enabled(d) ) + return; + /* * During early domain creation failure, we may reach here with the * ops not yet initialized. @@ -288,224 +349,10 @@ static void iommu_teardown(struct domain *d) return; =20 iommu_vcall(hd->platform_ops, teardown, d); -} - -void iommu_domain_destroy(struct domain *d) -{ - if ( !is_iommu_enabled(d) ) - return; - - iommu_teardown(d); =20 arch_iommu_domain_destroy(d); } =20 -static unsigned int mapping_order(const struct domain_iommu *hd, - dfn_t dfn, mfn_t mfn, unsigned long nr) -{ - unsigned long res =3D dfn_x(dfn) | mfn_x(mfn); - unsigned long sizes =3D hd->platform_ops->page_sizes; - unsigned int bit =3D find_first_set_bit(sizes), order =3D 0; - - ASSERT(bit =3D=3D PAGE_SHIFT); - - while ( (sizes =3D (sizes >> bit) & ~1) ) - { - unsigned long mask; - - bit =3D find_first_set_bit(sizes); - mask =3D (1UL << bit) - 1; - if ( nr <=3D mask || (res & mask) ) - break; - order +=3D bit; - nr >>=3D bit; - res >>=3D bit; - } - - return order; -} - -long iommu_map(struct domain *d, dfn_t dfn0, mfn_t mfn0, - unsigned long page_count, unsigned int flags, - unsigned int *flush_flags) -{ - const struct domain_iommu *hd =3D dom_iommu(d); - unsigned long i; - unsigned int order, j =3D 0; - int rc =3D 0; - - if ( !is_iommu_enabled(d) ) - return 0; - - ASSERT(!IOMMUF_order(flags)); - - for ( i =3D 0; i < page_count; i +=3D 1UL << order ) - { - dfn_t dfn =3D dfn_add(dfn0, i); - mfn_t mfn =3D mfn_add(mfn0, i); - - order =3D mapping_order(hd, dfn, mfn, page_count - i); - - if ( (flags & IOMMUF_preempt) && - ((!(++j & 0xfff) && general_preempt_check()) || - i > LONG_MAX - (1UL << order)) ) - return i; - - rc =3D iommu_call(hd->platform_ops, map_page, d, dfn, mfn, - flags | IOMMUF_order(order), flush_flags); - - if ( likely(!rc) ) - continue; - - if ( !d->is_shutting_down && printk_ratelimit() ) - printk(XENLOG_ERR - "d%d: IOMMU mapping dfn %"PRI_dfn" to mfn %"PRI_mfn" fa= iled: %d\n", - d->domain_id, dfn_x(dfn), mfn_x(mfn), rc); - - /* while statement to satisfy __must_check */ - while ( iommu_unmap(d, dfn0, i, 0, flush_flags) ) - break; - - if ( !is_hardware_domain(d) ) - domain_crash(d); - - break; - } - - /* - * Something went wrong so, if we were dealing with more than a single - * page, flush everything and clear flush flags. - */ - if ( page_count > 1 && unlikely(rc) && - !iommu_iotlb_flush_all(d, *flush_flags) ) - *flush_flags =3D 0; - - return rc; -} - -int iommu_legacy_map(struct domain *d, dfn_t dfn, mfn_t mfn, - unsigned long page_count, unsigned int flags) -{ - unsigned int flush_flags =3D 0; - int rc; - - ASSERT(!(flags & IOMMUF_preempt)); - rc =3D iommu_map(d, dfn, mfn, page_count, flags, &flush_flags); - - if ( !this_cpu(iommu_dont_flush_iotlb) && !rc ) - rc =3D iommu_iotlb_flush(d, dfn, page_count, flush_flags); - - return rc; -} - -long iommu_unmap(struct domain *d, dfn_t dfn0, unsigned long page_count, - unsigned int flags, unsigned int *flush_flags) -{ - const struct domain_iommu *hd =3D dom_iommu(d); - unsigned long i; - unsigned int order, j =3D 0; - int rc =3D 0; - - if ( !is_iommu_enabled(d) ) - return 0; - - ASSERT(!(flags & ~IOMMUF_preempt)); - - for ( i =3D 0; i < page_count; i +=3D 1UL << order ) - { - dfn_t dfn =3D dfn_add(dfn0, i); - int err; - - order =3D mapping_order(hd, dfn, _mfn(0), page_count - i); - - if ( (flags & IOMMUF_preempt) && - ((!(++j & 0xfff) && general_preempt_check()) || - i > LONG_MAX - (1UL << order)) ) - return i; - - err =3D iommu_call(hd->platform_ops, unmap_page, d, dfn, - flags | IOMMUF_order(order), flush_flags); - - if ( likely(!err) ) - continue; - - if ( !d->is_shutting_down && printk_ratelimit() ) - printk(XENLOG_ERR - "d%d: IOMMU unmapping dfn %"PRI_dfn" failed: %d\n", - d->domain_id, dfn_x(dfn), err); - - if ( !rc ) - rc =3D err; - - if ( !is_hardware_domain(d) ) - { - domain_crash(d); - break; - } - } - - /* - * Something went wrong so, if we were dealing with more than a single - * page, flush everything and clear flush flags. - */ - if ( page_count > 1 && unlikely(rc) && - !iommu_iotlb_flush_all(d, *flush_flags) ) - *flush_flags =3D 0; - - return rc; -} - -int iommu_legacy_unmap(struct domain *d, dfn_t dfn, unsigned long page_cou= nt) -{ - unsigned int flush_flags =3D 0; - int rc =3D iommu_unmap(d, dfn, page_count, 0, &flush_flags); - - if ( !this_cpu(iommu_dont_flush_iotlb) && !rc ) - rc =3D iommu_iotlb_flush(d, dfn, page_count, flush_flags); - - return rc; -} - -int iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn, - unsigned int *flags) -{ - const struct domain_iommu *hd =3D dom_iommu(d); - - if ( !is_iommu_enabled(d) || !hd->platform_ops->lookup_page ) - return -EOPNOTSUPP; - - return iommu_call(hd->platform_ops, lookup_page, d, dfn, mfn, flags); -} - -int iommu_iotlb_flush(struct domain *d, dfn_t dfn, unsigned long page_coun= t, - unsigned int flush_flags) -{ - const struct domain_iommu *hd =3D dom_iommu(d); - int rc; - - if ( !is_iommu_enabled(d) || !hd->platform_ops->iotlb_flush || - !page_count || !flush_flags ) - return 0; - - if ( dfn_eq(dfn, INVALID_DFN) ) - return -EINVAL; - - rc =3D iommu_call(hd->platform_ops, iotlb_flush, d, dfn, page_count, - flush_flags); - if ( unlikely(rc) ) - { - if ( !d->is_shutting_down && printk_ratelimit() ) - printk(XENLOG_ERR - "d%d: IOMMU IOTLB flush failed: %d, dfn %"PRI_dfn", pag= e count %lu flags %x\n", - d->domain_id, rc, dfn_x(dfn), page_count, flush_flags); - - if ( !is_hardware_domain(d) ) - domain_crash(d); - } - - return rc; -} - int iommu_iotlb_flush_all(struct domain *d, unsigned int flush_flags) { const struct domain_iommu *hd =3D dom_iommu(d); @@ -515,7 +362,7 @@ int iommu_iotlb_flush_all(struct domain *d, unsigned in= t flush_flags) !flush_flags ) return 0; =20 - rc =3D iommu_call(hd->platform_ops, iotlb_flush, d, INVALID_DFN, 0, + rc =3D iommu_call(hd->platform_ops, iotlb_flush, d, NULL, INVALID_DFN,= 0, flush_flags | IOMMU_FLUSHF_all); if ( unlikely(rc) ) { @@ -531,24 +378,6 @@ int iommu_iotlb_flush_all(struct domain *d, unsigned i= nt flush_flags) return rc; } =20 -int iommu_quarantine_dev_init(device_t *dev) -{ - const struct domain_iommu *hd =3D dom_iommu(dom_io); - - if ( !iommu_quarantine || !hd->platform_ops->quarantine_init ) - return 0; - - return iommu_call(hd->platform_ops, quarantine_init, - dev, iommu_quarantine =3D=3D IOMMU_quarantine_scratc= h_page); -} - -static int __init iommu_quarantine_init(void) -{ - dom_io->options |=3D XEN_DOMCTL_CDF_iommu; - - return iommu_domain_init(dom_io, 0); -} - int __init iommu_setup(void) { int rc =3D -ENODEV; diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 5a446d3dce..46c8a01801 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -1,6 +1,6 @@ /* * Copyright (C) 2008, Netronome Systems, Inc. - * =20 + * * This program is free software; you can redistribute it and/or modify it * under the terms and conditions of the GNU General Public License, * version 2, as published by the Free Software Foundation. @@ -286,14 +286,14 @@ static void apply_quirks(struct pci_dev *pdev) * Device [8086:2fc0] * Erratum HSE43 * CONFIG_TDP_NOMINAL CSR Implemented at Incorrect Offset - * http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-= v3-spec-update.html=20 + * http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-= v3-spec-update.html */ { PCI_VENDOR_ID_INTEL, 0x2fc0 }, /* * Devices [8086:6f60,6fa0,6fc0] * Errata BDF2 / BDX2 * PCI BARs in the Home Agent Will Return Non-Zero Values During E= numeration - * http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-= v4-spec-update.html=20 + * http://www.intel.com/content/www/us/en/processors/xeon/xeon-e5-= v4-spec-update.html */ { PCI_VENDOR_ID_INTEL, 0x6f60 }, { PCI_VENDOR_ID_INTEL, 0x6fa0 }, @@ -870,8 +870,8 @@ static int deassign_device(struct domain *d, uint16_t s= eg, uint8_t bus, devfn +=3D pdev->phantom_stride; if ( PCI_SLOT(devfn) !=3D PCI_SLOT(pdev->devfn) ) break; - ret =3D iommu_call(hd->platform_ops, reassign_device, d, target, d= evfn, - pci_to_dev(pdev)); + ret =3D iommu_call(hd->platform_ops, add_devfn, d, pci_to_dev(pdev= ), devfn, + &target->iommu.default_ctx); if ( ret ) goto out; } @@ -880,9 +880,9 @@ static int deassign_device(struct domain *d, uint16_t s= eg, uint8_t bus, vpci_deassign_device(pdev); write_unlock(&d->pci_lock); =20 - devfn =3D pdev->devfn; - ret =3D iommu_call(hd->platform_ops, reassign_device, d, target, devfn, - pci_to_dev(pdev)); + ret =3D iommu_call(hd->platform_ops, reattach, target, pci_to_dev(pdev= ), + iommu_get_context(d, pdev->context), + iommu_default_context(target)); if ( ret ) goto out; =20 @@ -890,6 +890,7 @@ static int deassign_device(struct domain *d, uint16_t s= eg, uint8_t bus, pdev->quarantine =3D false; =20 pdev->fault.count =3D 0; + pdev->domain =3D target; =20 write_lock(&target->pci_lock); /* Re-assign back to hardware_domain */ @@ -1329,12 +1330,7 @@ static int cf_check _dump_pci_devices(struct pci_seg= *pseg, void *arg) list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list ) { printk("%pp - ", &pdev->sbdf); -#ifdef CONFIG_X86 - if ( pdev->domain =3D=3D dom_io ) - printk("DomIO:%x", pdev->arch.pseudo_domid); - else -#endif - printk("%pd", pdev->domain); + printk("%pd", pdev->domain); printk(" - node %-3d", (pdev->node !=3D NUMA_NO_NODE) ? pdev->node= : -1); pdev_dump_msi(pdev); printk("\n"); @@ -1373,7 +1369,7 @@ static int iommu_add_device(struct pci_dev *pdev) if ( !is_iommu_enabled(pdev->domain) ) return 0; =20 - rc =3D iommu_call(hd->platform_ops, add_device, devfn, pci_to_dev(pdev= )); + rc =3D iommu_attach_context(pdev->domain, pci_to_dev(pdev), 0); if ( rc || !pdev->phantom_stride ) return rc; =20 @@ -1382,7 +1378,9 @@ static int iommu_add_device(struct pci_dev *pdev) devfn +=3D pdev->phantom_stride; if ( PCI_SLOT(devfn) !=3D PCI_SLOT(pdev->devfn) ) return 0; - rc =3D iommu_call(hd->platform_ops, add_device, devfn, pci_to_dev(= pdev)); + + rc =3D iommu_call(hd->platform_ops, add_devfn, pdev->domain, pdev,= devfn, + iommu_default_context(pdev->domain)); if ( rc ) printk(XENLOG_WARNING "IOMMU: add %pp failed (%d)\n", &PCI_SBDF(pdev->seg, pdev->bus, devfn), rc); @@ -1409,6 +1407,7 @@ static int iommu_enable_device(struct pci_dev *pdev) static int iommu_remove_device(struct pci_dev *pdev) { const struct domain_iommu *hd; + struct iommu_context *ctx; u8 devfn; =20 if ( !pdev->domain ) @@ -1418,6 +1417,10 @@ static int iommu_remove_device(struct pci_dev *pdev) if ( !is_iommu_enabled(pdev->domain) ) return 0; =20 + ctx =3D iommu_get_context(pdev->domain, pdev->context); + if ( !ctx ) + return -EINVAL; + for ( devfn =3D pdev->devfn ; pdev->phantom_stride; ) { int rc; @@ -1425,8 +1428,8 @@ static int iommu_remove_device(struct pci_dev *pdev) devfn +=3D pdev->phantom_stride; if ( PCI_SLOT(devfn) !=3D PCI_SLOT(pdev->devfn) ) break; - rc =3D iommu_call(hd->platform_ops, remove_device, devfn, - pci_to_dev(pdev)); + rc =3D iommu_call(hd->platform_ops, remove_devfn, pdev->domain, pd= ev, + devfn, ctx); if ( !rc ) continue; =20 @@ -1437,7 +1440,7 @@ static int iommu_remove_device(struct pci_dev *pdev) =20 devfn =3D pdev->devfn; =20 - return iommu_call(hd->platform_ops, remove_device, devfn, pci_to_dev(p= dev)); + return iommu_call(hd->platform_ops, dettach, pdev->domain, pdev, ctx); } =20 static int device_assigned(u16 seg, u8 bus, u8 devfn) @@ -1497,22 +1500,22 @@ static int assign_device(struct domain *d, u16 seg,= u8 bus, u8 devfn, u32 flag) if ( pdev->domain !=3D dom_io ) { rc =3D iommu_quarantine_dev_init(pci_to_dev(pdev)); + /** TODO: Consider phantom functions */ if ( rc ) goto done; } =20 pdev->fault.count =3D 0; =20 - rc =3D iommu_call(hd->platform_ops, assign_device, d, devfn, pci_to_de= v(pdev), - flag); + iommu_attach_context(d, pci_to_dev(pdev), 0); =20 while ( pdev->phantom_stride && !rc ) { devfn +=3D pdev->phantom_stride; if ( PCI_SLOT(devfn) !=3D PCI_SLOT(pdev->devfn) ) break; - rc =3D iommu_call(hd->platform_ops, assign_device, d, devfn, - pci_to_dev(pdev), flag); + rc =3D iommu_call(hd->platform_ops, add_devfn, d, pci_to_dev(pdev), + devfn, iommu_default_context(d)); } =20 if ( rc ) diff --git a/xen/drivers/passthrough/quarantine.c b/xen/drivers/passthrough= /quarantine.c new file mode 100644 index 0000000000..b58f136ad8 --- /dev/null +++ b/xen/drivers/passthrough/quarantine.c @@ -0,0 +1,49 @@ +#include +#include +#include + +#ifdef CONFIG_HAS_PCI +uint8_t __read_mostly iommu_quarantine =3D +# if defined(CONFIG_IOMMU_QUARANTINE_NONE) + IOMMU_quarantine_none; +# elif defined(CONFIG_IOMMU_QUARANTINE_BASIC) + IOMMU_quarantine_basic; +# elif defined(CONFIG_IOMMU_QUARANTINE_SCRATCH_PAGE) + IOMMU_quarantine_scratch_page; +# endif +#else +# define iommu_quarantine IOMMU_quarantine_none +#endif /* CONFIG_HAS_PCI */ + +int iommu_quarantine_dev_init(device_t *dev) +{ + int ret; + u16 ctx_no; + + if ( !iommu_quarantine ) + return 0; + + ret =3D iommu_context_alloc(dom_io, &ctx_no, IOMMU_CONTEXT_INIT_quaran= tine); + + if ( ret ) + return ret; + + /** TODO: Setup scratch page, mappings... */ + + ret =3D iommu_reattach_context(dev->domain, dom_io, dev, ctx_no); + + if ( ret ) + { + ASSERT(!iommu_context_free(dom_io, ctx_no, 0)); + return ret; + } + + return ret; +} + +int __init iommu_quarantine_init(void) +{ + dom_io->options |=3D XEN_DOMCTL_CDF_iommu; + + return iommu_domain_init(dom_io, 0); +} diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h index 442ae5322d..41b0e50827 100644 --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -52,7 +52,11 @@ static inline bool dfn_eq(dfn_t x, dfn_t y) #ifdef CONFIG_HAS_PASSTHROUGH extern bool iommu_enable, iommu_enabled; extern bool force_iommu, iommu_verbose; + /* Boolean except for the specific purposes of drivers/passthrough/iommu.c= . */ +#define IOMMU_quarantine_none 0 /* aka false */ +#define IOMMU_quarantine_basic 1 /* aka true */ +#define IOMMU_quarantine_scratch_page 2 extern uint8_t iommu_quarantine; #else #define iommu_enabled false @@ -107,6 +111,11 @@ extern bool amd_iommu_perdev_intremap; =20 extern bool iommu_hwdom_strict, iommu_hwdom_passthrough, iommu_hwdom_inclu= sive; extern int8_t iommu_hwdom_reserved; +extern uint16_t iommu_hwdom_nb_ctx; + +#ifdef CONFIG_X86 +extern unsigned int iommu_hwdom_arena_order; +#endif =20 extern unsigned int iommu_dev_iotlb_timeout; =20 @@ -161,11 +170,16 @@ enum */ long __must_check iommu_map(struct domain *d, dfn_t dfn0, mfn_t mfn0, unsigned long page_count, unsigned int flags, - unsigned int *flush_flags); + unsigned int *flush_flags, u16 ctx_no); +long __must_check _iommu_map(struct domain *d, dfn_t dfn0, mfn_t mfn0, + unsigned long page_count, unsigned int flags, + unsigned int *flush_flags, u16 ctx_no); long __must_check iommu_unmap(struct domain *d, dfn_t dfn0, unsigned long page_count, unsigned int flags, - unsigned int *flush_flags); - + unsigned int *flush_flags, u16 ctx_no); +long __must_check _iommu_unmap(struct domain *d, dfn_t dfn0, + unsigned long page_count, unsigned int flag= s, + unsigned int *flush_flags, u16 ctx_no); int __must_check iommu_legacy_map(struct domain *d, dfn_t dfn, mfn_t mfn, unsigned long page_count, unsigned int flags); @@ -173,11 +187,16 @@ int __must_check iommu_legacy_unmap(struct domain *d,= dfn_t dfn, unsigned long page_count); =20 int __must_check iommu_lookup_page(struct domain *d, dfn_t dfn, mfn_t *mfn, - unsigned int *flags); + unsigned int *flags, u16 ctx_no); =20 int __must_check iommu_iotlb_flush(struct domain *d, dfn_t dfn, unsigned long page_count, - unsigned int flush_flags); + unsigned int flush_flags, + u16 ctx_no); +int __must_check _iommu_iotlb_flush(struct domain *d, dfn_t dfn, + unsigned long page_count, + unsigned int flush_flags, + u16 ctx_no); int __must_check iommu_iotlb_flush_all(struct domain *d, unsigned int flush_flags); =20 @@ -250,20 +269,31 @@ struct page_info; */ typedef int iommu_grdm_t(xen_pfn_t start, xen_ulong_t nr, u32 id, void *ct= xt); =20 +struct iommu_context; + struct iommu_ops { unsigned long page_sizes; int (*init)(struct domain *d); void (*hwdom_init)(struct domain *d); - int (*quarantine_init)(device_t *dev, bool scratch_page); - int (*add_device)(uint8_t devfn, device_t *dev); + int (*context_init)(struct domain *d, struct iommu_context *ctx, + u32 flags); + int (*context_teardown)(struct domain *d, struct iommu_context *ctx, + u32 flags); + int (*attach)(struct domain *d, device_t *dev, + struct iommu_context *ctx); + int (*dettach)(struct domain *d, device_t *dev, + struct iommu_context *prev_ctx); + int (*reattach)(struct domain *d, device_t *dev, + struct iommu_context *prev_ctx, + struct iommu_context *ctx); + int (*enable_device)(device_t *dev); - int (*remove_device)(uint8_t devfn, device_t *dev); - int (*assign_device)(struct domain *d, uint8_t devfn, device_t *dev, - uint32_t flag); - int (*reassign_device)(struct domain *s, struct domain *t, - uint8_t devfn, device_t *dev); #ifdef CONFIG_HAS_PCI int (*get_device_group_id)(uint16_t seg, uint8_t bus, uint8_t devfn); + int (*add_devfn)(struct domain *d, struct pci_dev *pdev, u16 devfn, + struct iommu_context *ctx); + int (*remove_devfn)(struct domain *d, struct pci_dev *pdev, u16 devfn, + struct iommu_context *ctx); #endif /* HAS_PCI */ =20 void (*teardown)(struct domain *d); @@ -274,12 +304,15 @@ struct iommu_ops { */ int __must_check (*map_page)(struct domain *d, dfn_t dfn, mfn_t mfn, unsigned int flags, - unsigned int *flush_flags); + unsigned int *flush_flags, + struct iommu_context *ctx); int __must_check (*unmap_page)(struct domain *d, dfn_t dfn, unsigned int order, - unsigned int *flush_flags); + unsigned int *flush_flags, + struct iommu_context *ctx); int __must_check (*lookup_page)(struct domain *d, dfn_t dfn, mfn_t *mf= n, - unsigned int *flags); + unsigned int *flags, + struct iommu_context *ctx); =20 #ifdef CONFIG_X86 int (*enable_x2apic)(void); @@ -292,14 +325,15 @@ struct iommu_ops { int (*setup_hpet_msi)(struct msi_desc *msi_desc); =20 void (*adjust_irq_affinities)(void); - void (*clear_root_pgtable)(struct domain *d); + void (*clear_root_pgtable)(struct domain *d, struct iommu_context *ctx= ); int (*update_ire_from_msi)(struct msi_desc *msi_desc, struct msi_msg *= msg); #endif /* CONFIG_X86 */ =20 int __must_check (*suspend)(void); void (*resume)(void); void (*crash_shutdown)(void); - int __must_check (*iotlb_flush)(struct domain *d, dfn_t dfn, + int __must_check (*iotlb_flush)(struct domain *d, + struct iommu_context *ctx, dfn_t dfn, unsigned long page_count, unsigned int flush_flags); int (*get_reserved_device_memory)(iommu_grdm_t *func, void *ctxt); @@ -343,11 +377,36 @@ extern int iommu_get_extra_reserved_device_memory(iom= mu_grdm_t *func, # define iommu_vcall iommu_call #endif =20 +struct iommu_context { + u16 id; /* Context id (0 means default context) */ + struct list_head devices; + + struct arch_iommu_context arch; + + bool opaque; /* context can't be modified nor accessed (e.g HAP) */ + bool dying; /* the context is tearing down */ +}; + +struct iommu_context_list { + uint16_t count; /* Context count excluding default context */ + + /* if count > 0 */ + + uint64_t *bitmap; /* bitmap of context allocation */ + struct iommu_context *map; /* Map of contexts */ +}; + + struct domain_iommu { + spinlock_t lock; /* iommu lock */ + #ifdef CONFIG_HAS_PASSTHROUGH struct arch_iommu arch; #endif =20 + struct iommu_context default_ctx; + struct iommu_context_list other_contexts; + /* iommu_ops */ const struct iommu_ops *platform_ops; =20 @@ -380,6 +439,7 @@ struct domain_iommu { #define dom_iommu(d) (&(d)->iommu) #define iommu_set_feature(d, f) set_bit(f, dom_iommu(d)->features) #define iommu_clear_feature(d, f) clear_bit(f, dom_iommu(d)->features) +#define iommu_default_context(d) (&dom_iommu(d)->default_ctx) =20 /* Are we using the domain P2M table as its IOMMU pagetable? */ #define iommu_use_hap_pt(d) (IS_ENABLED(CONFIG_HVM) && \ @@ -405,6 +465,8 @@ int __must_check iommu_suspend(void); void iommu_resume(void); void iommu_crash_shutdown(void); int iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt); + +int __init iommu_quarantine_init(void); int iommu_quarantine_dev_init(device_t *dev); =20 #ifdef CONFIG_HAS_PCI @@ -414,6 +476,28 @@ int iommu_do_pci_domctl(struct xen_domctl *domctl, str= uct domain *d, =20 void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev); =20 +struct iommu_context *iommu_get_context(struct domain *d, u16 ctx_no); +bool iommu_check_context(struct domain *d, u16 ctx_no); + +#define IOMMU_CONTEXT_INIT_default (1 << 0) +#define IOMMU_CONTEXT_INIT_quarantine (1 << 1) +int iommu_context_init(struct domain *d, struct iommu_context *ctx, u16 ct= x_no, u32 flags); + +#define IOMMU_TEARDOWN_REATTACH_DEFAULT (1 << 0) +#define IOMMU_TEARDOWN_PREEMPT (1 << 1) +int iommu_context_teardown(struct domain *d, struct iommu_context *ctx, u3= 2 flags); + +int iommu_context_alloc(struct domain *d, u16 *ctx_no, u32 flags); +int iommu_context_free(struct domain *d, u16 ctx_no, u32 flags); + +int iommu_reattach_context(struct domain *prev_dom, struct domain *next_do= m, + device_t *dev, u16 ctx_no); +int iommu_attach_context(struct domain *d, device_t *dev, u16 ctx_no); +int iommu_dettach_context(struct domain *d, device_t *dev); + +int _iommu_attach_context(struct domain *d, device_t *dev, u16 ctx_no); +int _iommu_dettach_context(struct domain *d, device_t *dev); + /* * The purpose of the iommu_dont_flush_iotlb optional cpu flag is to * avoid unecessary iotlb_flush in the low level IOMMU code. diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 63e49f0117..d6d4aaa6a5 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -97,6 +97,7 @@ struct pci_dev_info { struct pci_dev { struct list_head alldevs_list; struct list_head domain_list; + struct list_head context_list; =20 struct list_head msi_list; =20 @@ -104,6 +105,8 @@ struct pci_dev { =20 struct domain *domain; =20 + uint16_t context; /* IOMMU context number of domain */ + const union { struct { uint8_t devfn; --=20 2.45.2 Teddy Astie | Vates XCP-ng Intern XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech