From nobody Wed Nov 27 09:44:57 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass header.i=@intel.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=intel.com ARC-Seal: i=1; a=rsa-sha256; t=1699957644; cv=none; d=zohomail.com; s=zohoarc; b=EZ8CwybatTVlQL+yZvO97oB5+wG5X8ZBDWHYSnjet98LF6/yM0eq1mh5S3SKkOaAsq6Od6EQ+hxxVhBtyOuHTcO4z0FMqHEpfweZrsUST01NebsOGj5fWRNC5CavWlw/xcPum6h8TtmQdMd4zdWr6cUdsWECD7x2fsgQlSZktF0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1699957644; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=vB1lbfQ0bq12tdHhgFKO8L1t9eAx6CSqBe8ObdOpFSQ=; b=O3KJNjrnOvRxNptR9fSI6MttXipkit9mvElvrypY0iL0P17Lu4lIAsH5oZDHMiwEhVoSku4R973OiU6msU0Pf55KaYCN+VTZ5j0cClV/p4WALxm8WAMWj1yNweyp5WMA6TD2zO6yDNmqcwPZ7KspeOQEI1R0ggyBC7scVI+k8NA= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=@intel.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1699957644140679.187163466682; Tue, 14 Nov 2023 02:27:24 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1r2qc5-0002sW-0q; Tue, 14 Nov 2023 05:25:45 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r2qbv-0002rz-I7 for qemu-devel@nongnu.org; Tue, 14 Nov 2023 05:25:35 -0500 Received: from mgamail.intel.com ([134.134.136.65]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r2qbr-0007aG-Tt for qemu-devel@nongnu.org; Tue, 14 Nov 2023 05:25:35 -0500 Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Nov 2023 02:25:30 -0800 Received: from duan-server-s2600bt.bj.intel.com ([10.240.192.147]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Nov 2023 02:25:22 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1699957531; x=1731493531; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=beaeu8IqmX/V0s1PBASYA1xe7/Nv1dxUUJPGBO8qNNQ=; b=Q39HfA4lZZX19LsN2tV9ot71Mc6B5jIy/P2ZRj34U89owpOWPMtfOm9n ifn5t+4Lgu+liJZKLozqMbfH0ve7M3l0euRs4ylH0jZlzfWppDtAqjo5r y28zFyL7tU8mN7VbuIBzKdUI7eXs2lFBSXPef+3eLa+DX7AsCXGWiJnDW GOnMJMqVuZy51TKhxHSkYU11wzew/w1KVGZCYoIUEFZwK5qplSW3l3DqG 25m6w1TwZRM7o7VFYRrLomqxSofqw1QSkm3/kNAoeHEI84xdhrEiYrEfU LU3kAydFpulWJW9ENlIBh3Au9NG5fQGPZZ9mjicQ8+Qcz9ZWtQoRU7ZY2 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10893"; a="394543397" X-IronPort-AV: E=Sophos;i="6.03,301,1694761200"; d="scan'208";a="394543397" X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10893"; a="888212788" X-IronPort-AV: E=Sophos;i="6.03,301,1694761200"; d="scan'208";a="888212788" From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Zhenzhong Duan , Paolo Bonzini , Eric Blake , Markus Armbruster , =?UTF-8?q?Daniel=20P=2E=20Berrang=C3=A9?= , Eduardo Habkost Subject: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object Date: Tue, 14 Nov 2023 18:09:35 +0800 Message-Id: <20231114100955.1961974-2-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231114100955.1961974-1-zhenzhong.duan@intel.com> References: <20231114100955.1961974-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=134.134.136.65; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @intel.com) X-ZM-MESSAGEID: 1699957645533100004 Content-Type: text/plain; charset="utf-8" From: Eric Auger Introduce an iommufd object which allows the interaction with the host /dev/iommu device. The /dev/iommu can have been already pre-opened outside of qemu, in which case the fd can be passed directly along with the iommufd object: This allows the iommufd object to be shared accross several subsystems (VFIO, VDPA, ...). For example, libvirt would open the /dev/iommu once. If no fd is passed along with the iommufd object, the /dev/iommu is opened by the qemu code. Suggested-by: Alex Williamson Signed-off-by: Eric Auger Signed-off-by: Yi Liu Signed-off-by: Zhenzhong Duan Reviewed-by: C=C3=A9dric Le Goater --- v6: remove redundant call, alloc_hwpt, get/put_ioas MAINTAINERS | 7 ++ qapi/qom.json | 19 ++++ include/sysemu/iommufd.h | 44 ++++++++ backends/iommufd.c | 228 +++++++++++++++++++++++++++++++++++++++ backends/Kconfig | 4 + backends/meson.build | 1 + backends/trace-events | 10 ++ qemu-options.hx | 12 +++ 8 files changed, 325 insertions(+) create mode 100644 include/sysemu/iommufd.h create mode 100644 backends/iommufd.c diff --git a/MAINTAINERS b/MAINTAINERS index ff1238bb98..a4891f7bda 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2166,6 +2166,13 @@ F: hw/vfio/ap.c F: docs/system/s390x/vfio-ap.rst L: qemu-s390x@nongnu.org =20 +iommufd +M: Yi Liu +M: Eric Auger +S: Supported +F: backends/iommufd.c +F: include/sysemu/iommufd.h + vhost M: Michael S. Tsirkin S: Supported diff --git a/qapi/qom.json b/qapi/qom.json index c53ef978ff..1fd8555a75 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -794,6 +794,23 @@ { 'struct': 'VfioUserServerProperties', 'data': { 'socket': 'SocketAddress', 'device': 'str' } } =20 +## +# @IOMMUFDProperties: +# +# Properties for iommufd objects. +# +# @fd: file descriptor name previously passed via 'getfd' command, +# which represents a pre-opened /dev/iommu. This allows the +# iommufd object to be shared accross several subsystems +# (VFIO, VDPA, ...), and the file descriptor to be shared +# with other process, e.g. DPDK. (default: QEMU opens +# /dev/iommu by itself) +# +# Since: 8.2 +## +{ 'struct': 'IOMMUFDProperties', + 'data': { '*fd': 'str' } } + ## # @RngProperties: # @@ -934,6 +951,7 @@ 'input-barrier', { 'name': 'input-linux', 'if': 'CONFIG_LINUX' }, + 'iommufd', 'iothread', 'main-loop', { 'name': 'memory-backend-epc', @@ -1003,6 +1021,7 @@ 'input-barrier': 'InputBarrierProperties', 'input-linux': { 'type': 'InputLinuxProperties', 'if': 'CONFIG_LINUX' }, + 'iommufd': 'IOMMUFDProperties', 'iothread': 'IothreadProperties', 'main-loop': 'MainLoopProperties', 'memory-backend-epc': { 'type': 'MemoryBackendEpcProperties', diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h new file mode 100644 index 0000000000..9b3a86f57d --- /dev/null +++ b/include/sysemu/iommufd.h @@ -0,0 +1,44 @@ +#ifndef SYSEMU_IOMMUFD_H +#define SYSEMU_IOMMUFD_H + +#include "qom/object.h" +#include "qemu/thread.h" +#include "exec/hwaddr.h" +#include "exec/cpu-common.h" + +#define TYPE_IOMMUFD_BACKEND "iommufd" +OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass, + IOMMUFD_BACKEND) +#define IOMMUFD_BACKEND(obj) \ + OBJECT_CHECK(IOMMUFDBackend, (obj), TYPE_IOMMUFD_BACKEND) +#define IOMMUFD_BACKEND_GET_CLASS(obj) \ + OBJECT_GET_CLASS(IOMMUFDBackendClass, (obj), TYPE_IOMMUFD_BACKEND) +#define IOMMUFD_BACKEND_CLASS(klass) \ + OBJECT_CLASS_CHECK(IOMMUFDBackendClass, (klass), TYPE_IOMMUFD_BACKEND) +struct IOMMUFDBackendClass { + ObjectClass parent_class; +}; + +struct IOMMUFDBackend { + Object parent; + + /*< protected >*/ + int fd; /* /dev/iommu file descriptor */ + bool owned; /* is the /dev/iommu opened internally */ + QemuMutex lock; + uint32_t users; + + /*< public >*/ +}; + +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp); +void iommufd_backend_disconnect(IOMMUFDBackend *be); + +int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id, + Error **errp); +void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id); +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr i= ova, + ram_addr_t size, void *vaddr, bool readonly); +int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id, + hwaddr iova, ram_addr_t size); +#endif diff --git a/backends/iommufd.c b/backends/iommufd.c new file mode 100644 index 0000000000..ea3e2a8f85 --- /dev/null +++ b/backends/iommufd.c @@ -0,0 +1,228 @@ +/* + * iommufd container backend + * + * Copyright (C) 2023 Intel Corporation. + * Copyright Red Hat, Inc. 2023 + * + * Authors: Yi Liu + * Eric Auger + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#include "qemu/osdep.h" +#include "sysemu/iommufd.h" +#include "qapi/error.h" +#include "qapi/qmp/qerror.h" +#include "qemu/module.h" +#include "qom/object_interfaces.h" +#include "qemu/error-report.h" +#include "monitor/monitor.h" +#include "trace.h" +#include +#include + +static void iommufd_backend_init(Object *obj) +{ + IOMMUFDBackend *be =3D IOMMUFD_BACKEND(obj); + + be->fd =3D -1; + be->users =3D 0; + be->owned =3D true; + qemu_mutex_init(&be->lock); +} + +static void iommufd_backend_finalize(Object *obj) +{ + IOMMUFDBackend *be =3D IOMMUFD_BACKEND(obj); + + if (be->owned) { + close(be->fd); + be->fd =3D -1; + } +} + +static void iommufd_backend_set_fd(Object *obj, const char *str, Error **e= rrp) +{ + IOMMUFDBackend *be =3D IOMMUFD_BACKEND(obj); + int fd =3D -1; + + fd =3D monitor_fd_param(monitor_cur(), str, errp); + if (fd =3D=3D -1) { + error_prepend(errp, "Could not parse remote object fd %s:", str); + return; + } + qemu_mutex_lock(&be->lock); + be->fd =3D fd; + be->owned =3D false; + qemu_mutex_unlock(&be->lock); + trace_iommu_backend_set_fd(be->fd); +} + +static void iommufd_backend_class_init(ObjectClass *oc, void *data) +{ + object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd); +} + +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp) +{ + int fd, ret =3D 0; + + qemu_mutex_lock(&be->lock); + if (be->users =3D=3D UINT32_MAX) { + error_setg(errp, "too many connections"); + ret =3D -E2BIG; + goto out; + } + if (be->owned && !be->users) { + fd =3D qemu_open_old("/dev/iommu", O_RDWR); + if (fd < 0) { + error_setg_errno(errp, errno, "/dev/iommu opening failed"); + ret =3D fd; + goto out; + } + be->fd =3D fd; + } + be->users++; +out: + trace_iommufd_backend_connect(be->fd, be->owned, + be->users, ret); + qemu_mutex_unlock(&be->lock); + return ret; +} + +void iommufd_backend_disconnect(IOMMUFDBackend *be) +{ + qemu_mutex_lock(&be->lock); + if (!be->users) { + goto out; + } + be->users--; + if (!be->users && be->owned) { + close(be->fd); + be->fd =3D -1; + } +out: + trace_iommufd_backend_disconnect(be->fd, be->users); + qemu_mutex_unlock(&be->lock); +} + +int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id, + Error **errp) +{ + int ret, fd =3D be->fd; + struct iommu_ioas_alloc alloc_data =3D { + .size =3D sizeof(alloc_data), + .flags =3D 0, + }; + + ret =3D ioctl(fd, IOMMU_IOAS_ALLOC, &alloc_data); + if (ret) { + error_setg_errno(errp, errno, "Failed to allocate ioas"); + return ret; + } + + *ioas_id =3D alloc_data.out_ioas_id; + trace_iommufd_backend_alloc_ioas(fd, *ioas_id, ret); + + return ret; +} + +void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id) +{ + int ret, fd =3D be->fd; + struct iommu_destroy des =3D { + .size =3D sizeof(des), + .id =3D id, + }; + + ret =3D ioctl(fd, IOMMU_DESTROY, &des); + trace_iommufd_backend_free_id(fd, id, ret); + if (ret) { + error_report("Failed to free id: %u %m", id); + } +} + +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr i= ova, + ram_addr_t size, void *vaddr, bool readonly) +{ + int ret, fd =3D be->fd; + struct iommu_ioas_map map =3D { + .size =3D sizeof(map), + .flags =3D IOMMU_IOAS_MAP_READABLE | + IOMMU_IOAS_MAP_FIXED_IOVA, + .ioas_id =3D ioas_id, + .__reserved =3D 0, + .user_va =3D (uintptr_t)vaddr, + .iova =3D iova, + .length =3D size, + }; + + if (!readonly) { + map.flags |=3D IOMMU_IOAS_MAP_WRITEABLE; + } + + ret =3D ioctl(fd, IOMMU_IOAS_MAP, &map); + trace_iommufd_backend_map_dma(fd, ioas_id, iova, size, + vaddr, readonly, ret); + if (ret) { + ret =3D -errno; + error_report("IOMMU_IOAS_MAP failed: %m"); + } + return ret; +} + +int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id, + hwaddr iova, ram_addr_t size) +{ + int ret, fd =3D be->fd; + struct iommu_ioas_unmap unmap =3D { + .size =3D sizeof(unmap), + .ioas_id =3D ioas_id, + .iova =3D iova, + .length =3D size, + }; + + ret =3D ioctl(fd, IOMMU_IOAS_UNMAP, &unmap); + /* + * IOMMUFD takes mapping as some kind of object, unmapping + * nonexistent mapping is treated as deleting a nonexistent + * object and return ENOENT. This is different from legacy + * backend which allows it. vIOMMU may trigger a lot of + * redundant unmapping, to avoid flush the log, treat them + * as succeess for IOMMUFD just like legacy backend. + */ + if (ret && errno =3D=3D ENOENT) { + trace_iommufd_backend_unmap_dma_non_exist(fd, ioas_id, iova, size,= ret); + ret =3D 0; + } else { + trace_iommufd_backend_unmap_dma(fd, ioas_id, iova, size, ret); + } + + if (ret) { + ret =3D -errno; + error_report("IOMMU_IOAS_UNMAP failed: %m"); + } + return ret; +} + +static const TypeInfo iommufd_backend_info =3D { + .name =3D TYPE_IOMMUFD_BACKEND, + .parent =3D TYPE_OBJECT, + .instance_size =3D sizeof(IOMMUFDBackend), + .instance_init =3D iommufd_backend_init, + .instance_finalize =3D iommufd_backend_finalize, + .class_size =3D sizeof(IOMMUFDBackendClass), + .class_init =3D iommufd_backend_class_init, + .interfaces =3D (InterfaceInfo[]) { + { TYPE_USER_CREATABLE }, + { } + } +}; + +static void register_types(void) +{ + type_register_static(&iommufd_backend_info); +} + +type_init(register_types); diff --git a/backends/Kconfig b/backends/Kconfig index f35abc1609..2cb23f62fa 100644 --- a/backends/Kconfig +++ b/backends/Kconfig @@ -1 +1,5 @@ source tpm/Kconfig + +config IOMMUFD + bool + depends on VFIO diff --git a/backends/meson.build b/backends/meson.build index 914c7c4afb..9a5cea480d 100644 --- a/backends/meson.build +++ b/backends/meson.build @@ -20,6 +20,7 @@ if have_vhost_user system_ss.add(when: 'CONFIG_VIRTIO', if_true: files('vhost-user.c')) endif system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vhos= t.c')) +system_ss.add(when: 'CONFIG_IOMMUFD', if_true: files('iommufd.c')) if have_vhost_user_crypto system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vh= ost-user.c')) endif diff --git a/backends/trace-events b/backends/trace-events index 652eb76a57..d45c6e31a6 100644 --- a/backends/trace-events +++ b/backends/trace-events @@ -5,3 +5,13 @@ dbus_vmstate_pre_save(void) dbus_vmstate_post_load(int version_id) "version_id: %d" dbus_vmstate_loading(const char *id) "id: %s" dbus_vmstate_saving(const char *id) "id: %s" + +# iommufd.c +iommufd_backend_connect(int fd, bool owned, uint32_t users, int ret) "fd= =3D%d owned=3D%d users=3D%d (%d)" +iommufd_backend_disconnect(int fd, uint32_t users) "fd=3D%d users=3D%d" +iommu_backend_set_fd(int fd) "pre-opened /dev/iommu fd=3D%d" +iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_= t size, void *vaddr, bool readonly, int ret) " iommufd=3D%d ioas=3D%d iova= =3D0x%"PRIx64" size=3D0x%"PRIx64" addr=3D%p readonly=3D%d (%d)" +iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t i= ova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=3D%d ioas= =3D%d iova=3D0x%"PRIx64" size=3D0x%"PRIx64" (%d)" +iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint6= 4_t size, int ret) " iommufd=3D%d ioas=3D%d iova=3D0x%"PRIx64" size=3D0x%"P= RIx64" (%d)" +iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd= =3D%d ioas=3D%d (%d)" +iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=3D%d = id=3D%d (%d)" diff --git a/qemu-options.hx b/qemu-options.hx index 42fd09e4de..70507c0ee6 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -5224,6 +5224,18 @@ SRST =20 The ``share`` boolean option is on by default with memfd. =20 + ``-object iommufd,id=3Did[,fd=3Dfd]`` + Creates an iommufd backend which allows control of DMA mapping + through the /dev/iommu device. + + The ``id`` parameter is a unique ID which frontends (such as + vfio-pci of vdpa) will use to connect with the iommufd backend. + + The ``fd`` parameter is an optional pre-opened file descriptor + resulting from /dev/iommu opening. Usually the iommufd is shared + across all subsystems, bringing the benefit of centralized + reference counting. + ``-object rng-builtin,id=3Did`` Creates a random number generator backend which obtains entropy from QEMU builtin functions. The ``id`` parameter is a unique ID --=20 2.34.1