From nobody Fri May 3 22:42:50 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1513056028080824.9103308654034; Mon, 11 Dec 2017 21:20:28 -0800 (PST) Received: from localhost ([::1]:56720 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eOczP-0006U4-7u for importer@patchew.org; Tue, 12 Dec 2017 00:20:24 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47308) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eOcy5-0005oA-MZ for qemu-devel@nongnu.org; Tue, 12 Dec 2017 00:19:03 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eOcy1-0007F7-Ke for qemu-devel@nongnu.org; Tue, 12 Dec 2017 00:19:01 -0500 Received: from ozlabs.ru ([107.173.13.209]:60598) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eOcy1-0007Eg-BN; Tue, 12 Dec 2017 00:18:57 -0500 Received: from vpl1.ozlabs.ibm.com (localhost [IPv6:::1]) by ozlabs.ru (Postfix) with ESMTP id 444EE3A60051; Tue, 12 Dec 2017 00:18:18 -0500 (EST) From: Alexey Kardashevskiy To: qemu-devel@nongnu.org Date: Tue, 12 Dec 2017 16:18:53 +1100 Message-Id: <20171212051853.24583-1-aik@ozlabs.ru> X-Mailer: git-send-email 2.11.0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 107.173.13.209 Subject: [Qemu-devel] [PATCH qemu] RFC: spapr/iommu: Enable in-kernel TCE acceleration via VFIO KVM device X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Alexey Kardashevskiy , Paolo Bonzini , Alex Williamson , qemu-ppc@nongnu.org, David Gibson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" In order to enable TCE operations support in KVM, we have to inform the KVM about VFIO groups being attached to specific LIOBNs. The KVM already knows about VFIO groups, the only bit missing is which in-kernel TCE table (the one with user visible TCEs) should update the attached broups. There is an KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE attribute of the VFIO KVM device which receives a groupfd/tablefd couple. This adds get_attr()/set_attr() to IOMMUMemoryRegionClass, like iommu_ops::domain_get_attr/domain_set_attr in the Linux kernel. This implements get_attr() for sPAPR IOMMU to return a TCE table fd as an IOMMU_ATTR_KVM_FD attribute. This also reads now the KVM_CAP_SPAPR_TCE_VFIO capability to prevent the TCE table from reallocating to the userspace if the KVM can accelerate TCE operations. This finally notifies the VFIO KVM device about new group being attached to a LIOBN. Signed-off-by: Alexey Kardashevskiy --- Assuming it is accepted, does it make sense to split include/exec/memory.h out and get merged separately? --- include/exec/memory.h | 10 ++++++++++ target/ppc/kvm_ppc.h | 6 ++++++ hw/ppc/spapr_iommu.c | 19 +++++++++++++++++++ hw/vfio/common.c | 24 ++++++++++++++++++++++++ target/ppc/kvm.c | 7 ++++++- hw/vfio/trace-events | 1 + 6 files changed, 66 insertions(+), 1 deletion(-) diff --git a/include/exec/memory.h b/include/exec/memory.h index 5ed4042..6395c6f 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -190,6 +190,10 @@ struct MemoryRegionOps { const MemoryRegionMmio old_mmio; }; =20 +enum IOMMUMemoryRegionAttr { + IOMMU_ATTR_KVM_FD +}; + typedef struct IOMMUMemoryRegionClass { /* private */ struct DeviceClass parent_class; @@ -210,6 +214,12 @@ typedef struct IOMMUMemoryRegionClass { IOMMUNotifierFlag new_flags); /* Set this up to provide customized IOMMU replay function */ void (*replay)(IOMMUMemoryRegion *iommu, IOMMUNotifier *notifier); + + /* Get/set IOMMU misc attributes */ + int (*get_attr)(IOMMUMemoryRegion *iommu, enum IOMMUMemoryRegionAttr, + void *data); + int (*set_attr)(IOMMUMemoryRegion *iommu, enum IOMMUMemoryRegionAttr, + void *data); } IOMMUMemoryRegionClass; =20 typedef struct CoalescedMemoryRange CoalescedMemoryRange; diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h index d6be38e..2b985e1 100644 --- a/target/ppc/kvm_ppc.h +++ b/target/ppc/kvm_ppc.h @@ -48,6 +48,7 @@ void *kvmppc_create_spapr_tce(uint32_t liobn, uint32_t pa= ge_shift, int kvmppc_remove_spapr_tce(void *table, int pfd, uint32_t window_size); int kvmppc_reset_htab(int shift_hint); uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift); +bool kvmppc_has_cap_spapr_vfio(void); #endif /* !CONFIG_USER_ONLY */ bool kvmppc_has_cap_epr(void); int kvmppc_define_rtas_kernel_token(uint32_t token, const char *function); @@ -231,6 +232,11 @@ static inline bool kvmppc_is_mem_backend_page_size_ok(= const char *obj_path) return true; } =20 +static inline bool kvmppc_has_cap_spapr_vfio(void) +{ + return false; +} + #endif /* !CONFIG_USER_ONLY */ =20 static inline bool kvmppc_has_cap_epr(void) diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c index 5ccd785..ce8a769 100644 --- a/hw/ppc/spapr_iommu.c +++ b/hw/ppc/spapr_iommu.c @@ -17,6 +17,7 @@ * License along with this library; if not, see . */ #include "qemu/osdep.h" +#include #include "qemu/error-report.h" #include "hw/hw.h" #include "qemu/log.h" @@ -160,6 +161,19 @@ static uint64_t spapr_tce_get_min_page_size(IOMMUMemor= yRegion *iommu) return 1ULL << tcet->page_shift; } =20 +static int spapr_tce_get_attr(IOMMUMemoryRegion *iommu, + enum IOMMUMemoryRegionAttr attr, void *data) +{ + sPAPRTCETable *tcet =3D container_of(iommu, sPAPRTCETable, iommu); + + if (attr =3D=3D IOMMU_ATTR_KVM_FD && kvmppc_has_cap_spapr_vfio()) { + *(int *) data =3D tcet->fd; + return 0; + } + + return -EINVAL; +} + static void spapr_tce_notify_flag_changed(IOMMUMemoryRegion *iommu, IOMMUNotifierFlag old, IOMMUNotifierFlag new) @@ -284,6 +298,10 @@ void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool= need_vfio) =20 tcet->need_vfio =3D need_vfio; =20 + if (!need_vfio || (tcet->fd !=3D -1 && kvmppc_has_cap_spapr_vfio())) { + return; + } + oldtable =3D tcet->table; =20 tcet->table =3D spapr_tce_alloc_table(tcet->liobn, @@ -643,6 +661,7 @@ static void spapr_iommu_memory_region_class_init(Object= Class *klass, void *data) imrc->translate =3D spapr_tce_translate_iommu; imrc->get_min_page_size =3D spapr_tce_get_min_page_size; imrc->notify_flag_changed =3D spapr_tce_notify_flag_changed; + imrc->get_attr =3D spapr_tce_get_attr; } =20 static const TypeInfo spapr_iommu_memory_region_info =3D { diff --git a/hw/vfio/common.c b/hw/vfio/common.c index cd81cc9..ed7717d 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -480,6 +480,30 @@ static void vfio_listener_region_add(MemoryListener *l= istener, if (memory_region_is_iommu(section->mr)) { VFIOGuestIOMMU *giommu; IOMMUMemoryRegion *iommu_mr =3D IOMMU_MEMORY_REGION(section->mr); +#ifdef CONFIG_KVM + struct kvm_vfio_spapr_tce param; + IOMMUMemoryRegionClass *imrc =3D IOMMU_MEMORY_REGION_GET_CLASS(iom= mu_mr); + VFIOGroup *group; + struct kvm_device_attr attr =3D { + .group =3D KVM_DEV_VFIO_GROUP, + .attr =3D KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE, + .addr =3D (uint64_t)(unsigned long)¶m, + }; + + if (kvm_enabled() && imrc->get_attr && + !imrc->get_attr(iommu_mr, IOMMU_ATTR_KVM_FD, ¶m.tablefd)) { + + QLIST_FOREACH(group, &container->group_list, container_next) { + param.groupfd =3D group->fd; + if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr))= { + error_report("vfio: failed to setup fd %d for a group = with fd %d: %s", + param.tablefd, param.groupfd, strerror(er= rno)); + return; + } + trace_vfio_spapr_group_attach(param.groupfd, param.tablefd= ); + } + } +#endif =20 trace_vfio_listener_region_add_iommu(iova, end); /* diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c index 9d57deb..da7590a 100644 --- a/target/ppc/kvm.c +++ b/target/ppc/kvm.c @@ -136,7 +136,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s) cap_spapr_tce =3D kvm_check_extension(s, KVM_CAP_SPAPR_TCE); cap_spapr_tce_64 =3D kvm_check_extension(s, KVM_CAP_SPAPR_TCE_64); cap_spapr_multitce =3D kvm_check_extension(s, KVM_CAP_SPAPR_MULTITCE); - cap_spapr_vfio =3D false; + cap_spapr_vfio =3D kvm_vm_check_extension(s, KVM_CAP_SPAPR_TCE_VFIO); cap_one_reg =3D kvm_check_extension(s, KVM_CAP_ONE_REG); cap_hior =3D kvm_check_extension(s, KVM_CAP_PPC_HIOR); cap_epr =3D kvm_check_extension(s, KVM_CAP_PPC_EPR); @@ -2474,6 +2474,11 @@ bool kvmppc_has_cap_mmu_hash_v3(void) return cap_mmu_hash_v3; } =20 +bool kvmppc_has_cap_spapr_vfio(void) +{ + return cap_spapr_vfio; +} + PowerPCCPUClass *kvm_ppc_get_host_cpu_class(void) { uint32_t host_pvr =3D mfpvr(); diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index fae096c..3d34fe8 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -123,3 +123,4 @@ vfio_prereg_register(uint64_t va, uint64_t size, int re= t) "va=3D0x%"PRIx64" size=3D0 vfio_prereg_unregister(uint64_t va, uint64_t size, int ret) "va=3D0x%"PRIx= 64" size=3D0x%"PRIx64" ret=3D%d" vfio_spapr_create_window(int ps, uint64_t ws, uint64_t off) "pageshift=3D0= x%x winsize=3D0x%"PRIx64" offset=3D0x%"PRIx64 vfio_spapr_remove_window(uint64_t off) "offset=3D0x%"PRIx64 +vfio_spapr_group_attach(int groupfd, int tablefd) "Attached groupfd %d to = liobn fd %d" --=20 2.11.0