From nobody Wed Nov 27 00:30:51 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 170301242411237.812224931786204; Tue, 19 Dec 2023 11:00:24 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rFfJb-00057w-2K; Tue, 19 Dec 2023 13:59:41 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rFfIi-00040F-91 for qemu-devel@nongnu.org; Tue, 19 Dec 2023 13:58:45 -0500 Received: from mail.ozlabs.org ([2404:9400:2221:ea00::3] helo=gandalf.ozlabs.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rFfIf-0007IZ-9i for qemu-devel@nongnu.org; Tue, 19 Dec 2023 13:58:43 -0500 Received: from gandalf.ozlabs.org (gandalf.ozlabs.org [150.107.74.76]) by gandalf.ozlabs.org (Postfix) with ESMTP id 4SvmDv1vwKz4xCp; Wed, 20 Dec 2023 05:58:39 +1100 (AEDT) Received: from authenticated.ozlabs.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ozlabs.org (Postfix) with ESMTPSA id 4SvmDp4Z66z4xSc; Wed, 20 Dec 2023 05:58:34 +1100 (AEDT) From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= To: qemu-devel@nongnu.org Cc: Eric Auger , Zhenzhong Duan , Peter Maydell , Richard Henderson , Nicholas Piggin , Harsh Prateek Bora , Thomas Huth , Eric Farman , Alex Williamson , Matthew Rosato , Yi Liu , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= , Nicolin Chen Subject: [PULL 20/47] backends/iommufd: Introduce the iommufd object Date: Tue, 19 Dec 2023 19:56:16 +0100 Message-ID: <20231219185643.725448-21-clg@redhat.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20231219185643.725448-1-clg@redhat.com> References: <20231219185643.725448-1-clg@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2404:9400:2221:ea00::3; envelope-from=SRS0=7/MV=H6=redhat.com=clg@ozlabs.org; helo=gandalf.ozlabs.org X-Spam_score_int: -39 X-Spam_score: -4.0 X-Spam_bar: ---- X-Spam_report: (-4.0 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1703012425997100011 From: Eric Auger Introduce an iommufd object which allows the interaction with the host /dev/iommu device. The /dev/iommu can have been already pre-opened outside of qemu, in which case the fd can be passed directly along with the iommufd object: This allows the iommufd object to be shared accross several subsystems (VFIO, VDPA, ...). For example, libvirt would open the /dev/iommu once. If no fd is passed along with the iommufd object, the /dev/iommu is opened by the qemu code. Suggested-by: Alex Williamson Signed-off-by: Eric Auger Signed-off-by: Yi Liu Signed-off-by: Zhenzhong Duan Reviewed-by: C=C3=A9dric Le Goater Tested-by: Eric Auger Tested-by: Nicolin Chen Signed-off-by: C=C3=A9dric Le Goater --- MAINTAINERS | 8 ++ qapi/qom.json | 19 +++ include/sysemu/iommufd.h | 38 ++++++ backends/iommufd.c | 245 +++++++++++++++++++++++++++++++++++++++ backends/Kconfig | 4 + backends/meson.build | 1 + backends/trace-events | 10 ++ qemu-options.hx | 12 ++ 8 files changed, 337 insertions(+) create mode 100644 include/sysemu/iommufd.h create mode 100644 backends/iommufd.c diff --git a/MAINTAINERS b/MAINTAINERS index 695e0bd34fbba253d77570e5b3ef8dabe7a174b3..a5a446914a1cf505131bfe11354= 0fd501511abb8 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2167,6 +2167,14 @@ F: hw/vfio/ap.c F: docs/system/s390x/vfio-ap.rst L: qemu-s390x@nongnu.org =20 +iommufd +M: Yi Liu +M: Eric Auger +M: Zhenzhong Duan +S: Supported +F: backends/iommufd.c +F: include/sysemu/iommufd.h + vhost M: Michael S. Tsirkin S: Supported diff --git a/qapi/qom.json b/qapi/qom.json index c53ef978ff7e6a81f8c926159fe52c0d349696ac..95516ba325e541e5eeb3e1f5884= 74cdf63ad68a5 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -794,6 +794,23 @@ { 'struct': 'VfioUserServerProperties', 'data': { 'socket': 'SocketAddress', 'device': 'str' } } =20 +## +# @IOMMUFDProperties: +# +# Properties for iommufd objects. +# +# @fd: file descriptor name previously passed via 'getfd' command, +# which represents a pre-opened /dev/iommu. This allows the +# iommufd object to be shared accross several subsystems +# (VFIO, VDPA, ...), and the file descriptor to be shared +# with other process, e.g. DPDK. (default: QEMU opens +# /dev/iommu by itself) +# +# Since: 9.0 +## +{ 'struct': 'IOMMUFDProperties', + 'data': { '*fd': 'str' } } + ## # @RngProperties: # @@ -934,6 +951,7 @@ 'input-barrier', { 'name': 'input-linux', 'if': 'CONFIG_LINUX' }, + 'iommufd', 'iothread', 'main-loop', { 'name': 'memory-backend-epc', @@ -1003,6 +1021,7 @@ 'input-barrier': 'InputBarrierProperties', 'input-linux': { 'type': 'InputLinuxProperties', 'if': 'CONFIG_LINUX' }, + 'iommufd': 'IOMMUFDProperties', 'iothread': 'IothreadProperties', 'main-loop': 'MainLoopProperties', 'memory-backend-epc': { 'type': 'MemoryBackendEpcProperties', diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h new file mode 100644 index 0000000000000000000000000000000000000000..9c5524b0ed15ef5f81be159415b= c216572a283d8 --- /dev/null +++ b/include/sysemu/iommufd.h @@ -0,0 +1,38 @@ +#ifndef SYSEMU_IOMMUFD_H +#define SYSEMU_IOMMUFD_H + +#include "qom/object.h" +#include "qemu/thread.h" +#include "exec/hwaddr.h" +#include "exec/cpu-common.h" + +#define TYPE_IOMMUFD_BACKEND "iommufd" +OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass, IOMMUFD_BACKEND) + +struct IOMMUFDBackendClass { + ObjectClass parent_class; +}; + +struct IOMMUFDBackend { + Object parent; + + /*< protected >*/ + int fd; /* /dev/iommu file descriptor */ + bool owned; /* is the /dev/iommu opened internally */ + QemuMutex lock; + uint32_t users; + + /*< public >*/ +}; + +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp); +void iommufd_backend_disconnect(IOMMUFDBackend *be); + +int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id, + Error **errp); +void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id); +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr i= ova, + ram_addr_t size, void *vaddr, bool readonly); +int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id, + hwaddr iova, ram_addr_t size); +#endif diff --git a/backends/iommufd.c b/backends/iommufd.c new file mode 100644 index 0000000000000000000000000000000000000000..ba58a0eb0d0ba9aae625c987eb7= 28547543dba66 --- /dev/null +++ b/backends/iommufd.c @@ -0,0 +1,245 @@ +/* + * iommufd container backend + * + * Copyright (C) 2023 Intel Corporation. + * Copyright Red Hat, Inc. 2023 + * + * Authors: Yi Liu + * Eric Auger + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#include "qemu/osdep.h" +#include "sysemu/iommufd.h" +#include "qapi/error.h" +#include "qapi/qmp/qerror.h" +#include "qemu/module.h" +#include "qom/object_interfaces.h" +#include "qemu/error-report.h" +#include "monitor/monitor.h" +#include "trace.h" +#include +#include + +static void iommufd_backend_init(Object *obj) +{ + IOMMUFDBackend *be =3D IOMMUFD_BACKEND(obj); + + be->fd =3D -1; + be->users =3D 0; + be->owned =3D true; + qemu_mutex_init(&be->lock); +} + +static void iommufd_backend_finalize(Object *obj) +{ + IOMMUFDBackend *be =3D IOMMUFD_BACKEND(obj); + + if (be->owned) { + close(be->fd); + be->fd =3D -1; + } +} + +static void iommufd_backend_set_fd(Object *obj, const char *str, Error **e= rrp) +{ + IOMMUFDBackend *be =3D IOMMUFD_BACKEND(obj); + int fd =3D -1; + + fd =3D monitor_fd_param(monitor_cur(), str, errp); + if (fd =3D=3D -1) { + error_prepend(errp, "Could not parse remote object fd %s:", str); + return; + } + qemu_mutex_lock(&be->lock); + be->fd =3D fd; + be->owned =3D false; + qemu_mutex_unlock(&be->lock); + trace_iommu_backend_set_fd(be->fd); +} + +static bool iommufd_backend_can_be_deleted(UserCreatable *uc) +{ + IOMMUFDBackend *be =3D IOMMUFD_BACKEND(uc); + + return !be->users; +} + +static void iommufd_backend_class_init(ObjectClass *oc, void *data) +{ + UserCreatableClass *ucc =3D USER_CREATABLE_CLASS(oc); + + ucc->can_be_deleted =3D iommufd_backend_can_be_deleted; + + object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd); +} + +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp) +{ + int fd, ret =3D 0; + + qemu_mutex_lock(&be->lock); + if (be->users =3D=3D UINT32_MAX) { + error_setg(errp, "too many connections"); + ret =3D -E2BIG; + goto out; + } + if (be->owned && !be->users) { + fd =3D qemu_open_old("/dev/iommu", O_RDWR); + if (fd < 0) { + error_setg_errno(errp, errno, "/dev/iommu opening failed"); + ret =3D fd; + goto out; + } + be->fd =3D fd; + } + be->users++; +out: + trace_iommufd_backend_connect(be->fd, be->owned, + be->users, ret); + qemu_mutex_unlock(&be->lock); + return ret; +} + +void iommufd_backend_disconnect(IOMMUFDBackend *be) +{ + qemu_mutex_lock(&be->lock); + if (!be->users) { + goto out; + } + be->users--; + if (!be->users && be->owned) { + close(be->fd); + be->fd =3D -1; + } +out: + trace_iommufd_backend_disconnect(be->fd, be->users); + qemu_mutex_unlock(&be->lock); +} + +int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id, + Error **errp) +{ + int ret, fd =3D be->fd; + struct iommu_ioas_alloc alloc_data =3D { + .size =3D sizeof(alloc_data), + .flags =3D 0, + }; + + ret =3D ioctl(fd, IOMMU_IOAS_ALLOC, &alloc_data); + if (ret) { + error_setg_errno(errp, errno, "Failed to allocate ioas"); + return ret; + } + + *ioas_id =3D alloc_data.out_ioas_id; + trace_iommufd_backend_alloc_ioas(fd, *ioas_id, ret); + + return ret; +} + +void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id) +{ + int ret, fd =3D be->fd; + struct iommu_destroy des =3D { + .size =3D sizeof(des), + .id =3D id, + }; + + ret =3D ioctl(fd, IOMMU_DESTROY, &des); + trace_iommufd_backend_free_id(fd, id, ret); + if (ret) { + error_report("Failed to free id: %u %m", id); + } +} + +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr i= ova, + ram_addr_t size, void *vaddr, bool readonly) +{ + int ret, fd =3D be->fd; + struct iommu_ioas_map map =3D { + .size =3D sizeof(map), + .flags =3D IOMMU_IOAS_MAP_READABLE | + IOMMU_IOAS_MAP_FIXED_IOVA, + .ioas_id =3D ioas_id, + .__reserved =3D 0, + .user_va =3D (uintptr_t)vaddr, + .iova =3D iova, + .length =3D size, + }; + + if (!readonly) { + map.flags |=3D IOMMU_IOAS_MAP_WRITEABLE; + } + + ret =3D ioctl(fd, IOMMU_IOAS_MAP, &map); + trace_iommufd_backend_map_dma(fd, ioas_id, iova, size, + vaddr, readonly, ret); + if (ret) { + ret =3D -errno; + + /* TODO: Not support mapping hardware PCI BAR region for now. */ + if (errno =3D=3D EFAULT) { + warn_report("IOMMU_IOAS_MAP failed: %m, PCI BAR?"); + } else { + error_report("IOMMU_IOAS_MAP failed: %m"); + } + } + return ret; +} + +int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id, + hwaddr iova, ram_addr_t size) +{ + int ret, fd =3D be->fd; + struct iommu_ioas_unmap unmap =3D { + .size =3D sizeof(unmap), + .ioas_id =3D ioas_id, + .iova =3D iova, + .length =3D size, + }; + + ret =3D ioctl(fd, IOMMU_IOAS_UNMAP, &unmap); + /* + * IOMMUFD takes mapping as some kind of object, unmapping + * nonexistent mapping is treated as deleting a nonexistent + * object and return ENOENT. This is different from legacy + * backend which allows it. vIOMMU may trigger a lot of + * redundant unmapping, to avoid flush the log, treat them + * as succeess for IOMMUFD just like legacy backend. + */ + if (ret && errno =3D=3D ENOENT) { + trace_iommufd_backend_unmap_dma_non_exist(fd, ioas_id, iova, size,= ret); + ret =3D 0; + } else { + trace_iommufd_backend_unmap_dma(fd, ioas_id, iova, size, ret); + } + + if (ret) { + ret =3D -errno; + error_report("IOMMU_IOAS_UNMAP failed: %m"); + } + return ret; +} + +static const TypeInfo iommufd_backend_info =3D { + .name =3D TYPE_IOMMUFD_BACKEND, + .parent =3D TYPE_OBJECT, + .instance_size =3D sizeof(IOMMUFDBackend), + .instance_init =3D iommufd_backend_init, + .instance_finalize =3D iommufd_backend_finalize, + .class_size =3D sizeof(IOMMUFDBackendClass), + .class_init =3D iommufd_backend_class_init, + .interfaces =3D (InterfaceInfo[]) { + { TYPE_USER_CREATABLE }, + { } + } +}; + +static void register_types(void) +{ + type_register_static(&iommufd_backend_info); +} + +type_init(register_types); diff --git a/backends/Kconfig b/backends/Kconfig index f35abc16092808b1fe5b033a346908e2d66bff0b..2cb23f62fa1526cedafedcc99a0= 32e098075b846 100644 --- a/backends/Kconfig +++ b/backends/Kconfig @@ -1 +1,5 @@ source tpm/Kconfig + +config IOMMUFD + bool + depends on VFIO diff --git a/backends/meson.build b/backends/meson.build index 914c7c4afb905cfe710ad23dd1ee42907f6d1679..9a5cea480d172d50a641e4d9179= 093e8155f2db1 100644 --- a/backends/meson.build +++ b/backends/meson.build @@ -20,6 +20,7 @@ if have_vhost_user system_ss.add(when: 'CONFIG_VIRTIO', if_true: files('vhost-user.c')) endif system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vhos= t.c')) +system_ss.add(when: 'CONFIG_IOMMUFD', if_true: files('iommufd.c')) if have_vhost_user_crypto system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vh= ost-user.c')) endif diff --git a/backends/trace-events b/backends/trace-events index 652eb76a5723e2053fe97338c481309c58284d6a..d45c6e31a67ed66d94787f60eb0= 8a525cf6ff68b 100644 --- a/backends/trace-events +++ b/backends/trace-events @@ -5,3 +5,13 @@ dbus_vmstate_pre_save(void) dbus_vmstate_post_load(int version_id) "version_id: %d" dbus_vmstate_loading(const char *id) "id: %s" dbus_vmstate_saving(const char *id) "id: %s" + +# iommufd.c +iommufd_backend_connect(int fd, bool owned, uint32_t users, int ret) "fd= =3D%d owned=3D%d users=3D%d (%d)" +iommufd_backend_disconnect(int fd, uint32_t users) "fd=3D%d users=3D%d" +iommu_backend_set_fd(int fd) "pre-opened /dev/iommu fd=3D%d" +iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_= t size, void *vaddr, bool readonly, int ret) " iommufd=3D%d ioas=3D%d iova= =3D0x%"PRIx64" size=3D0x%"PRIx64" addr=3D%p readonly=3D%d (%d)" +iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t i= ova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=3D%d ioas= =3D%d iova=3D0x%"PRIx64" size=3D0x%"PRIx64" (%d)" +iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint6= 4_t size, int ret) " iommufd=3D%d ioas=3D%d iova=3D0x%"PRIx64" size=3D0x%"P= RIx64" (%d)" +iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd= =3D%d ioas=3D%d (%d)" +iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=3D%d = id=3D%d (%d)" diff --git a/qemu-options.hx b/qemu-options.hx index 42fd09e4de96e962cd5873c49501f6e1dbb5e346..5fe8ea57d2d2f9390a976ef2fef= e86463e888bb1 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -5224,6 +5224,18 @@ SRST =20 The ``share`` boolean option is on by default with memfd. =20 + ``-object iommufd,id=3Did[,fd=3Dfd]`` + Creates an iommufd backend which allows control of DMA mapping + through the ``/dev/iommu`` device. + + The ``id`` parameter is a unique ID which frontends (such as + vfio-pci of vdpa) will use to connect with the iommufd backend. + + The ``fd`` parameter is an optional pre-opened file descriptor + resulting from ``/dev/iommu`` opening. Usually the iommufd is shar= ed + across all subsystems, bringing the benefit of centralized + reference counting. + ``-object rng-builtin,id=3Did`` Creates a random number generator backend which obtains entropy from QEMU builtin functions. The ``id`` parameter is a unique ID --=20 2.43.0