Toggle navigation
:p
atchew
Login
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Hello all. The purpose of this patch series is to add IOREQ/DM support to Xen on Arm. You can find an initial discussion at [1] and RFC patch series at [2]. Xen on Arm requires some implementation to forward guest MMIO access to a device model in order to implement virtio-mmio backend or even mediator outside of hypervisor. As Xen on x86 already contains required support this patch series tries to make it common and introduce Arm specific bits plus some new functionality. Patch series is based on Julien's PoC "xen/arm: Add support for Guest IO forwarding to a device emulator". Besides splitting existing IOREQ/DM support and introducing Arm side, the patch series also includes virtio-mmio related changes (last 2 patches for toolstack) for the reviewers to be able to see how the whole picture could look like. According to the initial discussion there are a few open questions/concerns regarding security, performance in VirtIO solution: 1. virtio-mmio vs virtio-pci, SPI vs MSI, different use-cases require different transport... 2. virtio backend is able to access all guest memory, some kind of protection is needed: 'virtio-iommu in Xen' vs 'pre-shared-memory & memcpys in guest' 3. interface between toolstack and 'out-of-qemu' virtio backend, avoid using Xenstore in virtio backend if possible. 4. a lot of 'foreing mapping' could lead to the memory exhaustion, Julien has some idea regarding that. Looks like all of them are valid and worth considering, but the first thing which we need on Arm is a mechanism to forward guest IO to a device emulator, so let's focus on it in the first place. *** There are a lot of changes since RFC series, several critical TODOs were resolved on Arm, Arm code were improved and hardened, but one TODO still remains which is "PIO handling" on Arm. The "PIO handling" TODO is expected to left unaddressed for the current series. It is not an big issue for now while Xen doesn't have support for vPCI on Arm. On Arm64 they are only used for PCI IO Bar and we would probably want to expose them to emulator as PIO access to make a DM completely arch-agnostic. So "PIO handling" should be implemented when we add support for vPCI. I left interface untouched in the following patch "xen/dm: Introduce xendevicemodel_set_irq_level DM op" since there is still an open discussion what interface to use/what information to pass to the hypervisor. Also I decided to drop the following patch: "[RFC PATCH V1 07/12] A collection of tweaks to be able to run emulator in driver domain" as I got an advise to write our own policy using FLASK which would cover our use case (with emulator in driver domain) rather than tweak Xen. *** Patch series [3] was rebased on "1 month old staging branch" (79c2d51 tools: bump library version numbers) and tested on Renesas Salvator-X board + H3 ES3.0 SoC (Arm64) with virtio-mmio disk backend (we will share it later) running in driver domain and unmodified Linux Guest running on existing virtio-blk driver (frontend). No issues were observed. Guest domain 'reboot/destroy' use-cases work properly. Patch series was only build-tested on x86. Please note, build-test passed for the following modes: 1. x86: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default) 2. x86: #CONFIG_HVM is not set / #CONFIG_IOREQ_SERVER is not set 3. Arm64: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default) 4. Arm64: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set (!)5. Arm32: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default) 6. Arm32: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set (!) Please note, the build on Arm32 was broken for the RFC series (see cmpxchg usage in hvm_send_buffered_ioreq()) due to the lack of cmpxchg_64 support on Arm32. But, there is a patch on review to address this issue: https://patchwork.kernel.org/patch/11715559/ Together with the following patch in this series: "xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()" we are able to fix Arm32 and harden IOREQ code on Arm. *** Any feedback/help would be highly appreciated. [1] https://lists.xenproject.org/archives/html/xen-devel/2020-07/msg00825.html [2] https://lists.xenproject.org/archives/html/xen-devel/2020-08/msg00071.html [3] https://github.com/otyshchenko1/xen/commits/ioreq_4.14_ml2 Oleksandr Tyshchenko (16): x86/ioreq: Prepare IOREQ feature for making it common xen/ioreq: Make x86's IOREQ feature common xen/ioreq: Make x86's hvm_ioreq_needs_completion() common xen/ioreq: Provide alias for the handle_mmio() xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common xen/dm: Make x86's DM feature common xen/mm: Make x86's XENMEM_resource_ioreq_server handling common arm/ioreq: Introduce arch specific bits for IOREQ/DM features xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm xen/ioreq: Introduce hvm_domain_has_ioreq_server() xen/dm: Introduce xendevicemodel_set_irq_level DM op xen/ioreq: Make x86's invalidate qemu mapcache handling common xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg() libxl: Introduce basic virtio-mmio support on Arm [RFC] libxl: Add support for virtio-disk configuration MAINTAINERS | 8 +- tools/libs/devicemodel/core.c | 18 + tools/libs/devicemodel/include/xendevicemodel.h | 4 + tools/libs/devicemodel/libxendevicemodel.map | 1 + tools/libxl/Makefile | 4 +- tools/libxl/libxl_arm.c | 94 +- tools/libxl/libxl_create.c | 1 + tools/libxl/libxl_internal.h | 1 + tools/libxl/libxl_types.idl | 16 + tools/libxl/libxl_types_internal.idl | 1 + tools/libxl/libxl_virtio_disk.c | 109 ++ tools/xl/Makefile | 2 +- tools/xl/xl.h | 3 + tools/xl/xl_cmdtable.c | 15 + tools/xl/xl_parse.c | 116 ++ tools/xl/xl_virtio_disk.c | 46 + xen/arch/arm/Kconfig | 1 + xen/arch/arm/Makefile | 2 + xen/arch/arm/dm.c | 67 ++ xen/arch/arm/domain.c | 9 + xen/arch/arm/io.c | 11 +- xen/arch/arm/ioreq.c | 142 +++ xen/arch/arm/p2m.c | 16 + xen/arch/arm/traps.c | 41 +- xen/arch/x86/Kconfig | 1 + xen/arch/x86/hvm/dm.c | 289 +---- xen/arch/x86/hvm/emulate.c | 2 +- xen/arch/x86/hvm/hvm.c | 2 +- xen/arch/x86/hvm/hypercall.c | 9 +- xen/arch/x86/hvm/intercept.c | 1 + xen/arch/x86/hvm/io.c | 16 +- xen/arch/x86/hvm/ioreq.c | 1426 +--------------------- xen/arch/x86/hvm/stdvga.c | 2 +- xen/arch/x86/hvm/vmx/realmode.c | 1 + xen/arch/x86/hvm/vmx/vvmx.c | 3 +- xen/arch/x86/mm.c | 46 +- xen/arch/x86/mm/p2m.c | 5 +- xen/arch/x86/mm/shadow/common.c | 2 +- xen/common/Kconfig | 3 + xen/common/Makefile | 2 + xen/common/dm.c | 288 +++++ xen/common/ioreq.c | 1433 +++++++++++++++++++++++ xen/common/memory.c | 54 +- xen/include/asm-arm/domain.h | 47 + xen/include/asm-arm/hvm/ioreq.h | 108 ++ xen/include/asm-arm/mm.h | 8 - xen/include/asm-arm/mmio.h | 1 + xen/include/asm-arm/p2m.h | 11 +- xen/include/asm-arm/paging.h | 4 + xen/include/asm-x86/hvm/domain.h | 36 +- xen/include/asm-x86/hvm/io.h | 17 - xen/include/asm-x86/hvm/ioreq.h | 47 +- xen/include/asm-x86/hvm/vcpu.h | 7 - xen/include/asm-x86/mm.h | 4 - xen/include/asm-x86/p2m.h | 3 +- xen/include/public/arch-arm.h | 5 + xen/include/public/hvm/dm_op.h | 15 + xen/include/xen/hypercall.h | 12 + xen/include/xen/ioreq.h | 146 +++ xen/include/xen/sched.h | 2 + xen/include/xsm/dummy.h | 4 +- xen/include/xsm/xsm.h | 6 +- xen/xsm/dummy.c | 2 +- xen/xsm/flask/hooks.c | 5 +- 64 files changed, 2940 insertions(+), 1863 deletions(-) create mode 100644 tools/libxl/libxl_virtio_disk.c create mode 100644 tools/xl/xl_virtio_disk.c create mode 100644 xen/arch/arm/dm.c create mode 100644 xen/arch/arm/ioreq.c create mode 100644 xen/common/dm.c create mode 100644 xen/common/ioreq.c create mode 100644 xen/include/asm-arm/hvm/ioreq.h create mode 100644 xen/include/xen/ioreq.h -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> As a lot of x86 code can be re-used on Arm later on, this patch prepares IOREQ support before moving to the common code. This way we will get almost a verbatim copy for a code movement. This support is going to be used on Arm to be able run device emulator outside of Xen hypervisor. Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - new patch, was split from: "[RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common" - fold the check of p->type into hvm_get_ioreq_server_range_type() and make it return success/failure - remove relocate_portio_handler() call from arch_hvm_ioreq_destroy() in arch/x86/hvm/ioreq.c - introduce arch_hvm_destroy_ioreq_server()/arch_handle_hvm_io_completion() --- --- xen/arch/x86/hvm/ioreq.c | 117 ++++++++++++++++++++++++++-------------- xen/include/asm-x86/hvm/ioreq.h | 16 ++++++ 2 files changed, 93 insertions(+), 40 deletions(-) diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/ioreq.c +++ b/xen/arch/x86/hvm/ioreq.c @@ -XXX,XX +XXX,XX @@ static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p) return true; } +bool arch_handle_hvm_io_completion(enum hvm_io_completion io_completion) +{ + switch ( io_completion ) + { + case HVMIO_realmode_completion: + { + struct hvm_emulate_ctxt ctxt; + + hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs()); + vmx_realmode_emulate_one(&ctxt); + hvm_emulate_writeback(&ctxt); + + break; + } + + default: + ASSERT_UNREACHABLE(); + break; + } + + return true; +} + bool handle_hvm_io_completion(struct vcpu *v) { struct domain *d = v->domain; @@ -XXX,XX +XXX,XX @@ bool handle_hvm_io_completion(struct vcpu *v) return handle_pio(vio->io_req.addr, vio->io_req.size, vio->io_req.dir); - case HVMIO_realmode_completion: - { - struct hvm_emulate_ctxt ctxt; - - hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs()); - vmx_realmode_emulate_one(&ctxt); - hvm_emulate_writeback(&ctxt); - - break; - } default: - ASSERT_UNREACHABLE(); - break; + return arch_handle_hvm_io_completion(io_completion); } return true; @@ -XXX,XX +XXX,XX @@ int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, return rc; } +/* Called when target domain is paused */ +int arch_hvm_destroy_ioreq_server(struct hvm_ioreq_server *s) +{ + return p2m_set_ioreq_server(s->target, 0, s); +} + int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id) { struct hvm_ioreq_server *s; @@ -XXX,XX +XXX,XX @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id) domain_pause(d); - p2m_set_ioreq_server(d, 0, s); + arch_hvm_destroy_ioreq_server(s); hvm_ioreq_server_disable(s); @@ -XXX,XX +XXX,XX @@ void hvm_destroy_all_ioreq_servers(struct domain *d) struct hvm_ioreq_server *s; unsigned int id; - if ( !relocate_portio_handler(d, 0xcf8, 0xcf8, 4) ) - return; + arch_hvm_ioreq_destroy(d); spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); @@ -XXX,XX +XXX,XX @@ void hvm_destroy_all_ioreq_servers(struct domain *d) spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); } -struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, - ioreq_t *p) +int hvm_get_ioreq_server_range_type(struct domain *d, + ioreq_t *p, + uint8_t *type, + uint64_t *addr) { - struct hvm_ioreq_server *s; - uint32_t cf8; - uint8_t type; - uint64_t addr; - unsigned int id; + uint32_t cf8 = d->arch.hvm.pci_cf8; if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO ) - return NULL; - - cf8 = d->arch.hvm.pci_cf8; + return -EINVAL; if ( p->type == IOREQ_TYPE_PIO && (p->addr & ~3) == 0xcfc && @@ -XXX,XX +XXX,XX @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, reg = hvm_pci_decode_addr(cf8, p->addr, &sbdf); /* PCI config data cycle */ - type = XEN_DMOP_IO_RANGE_PCI; - addr = ((uint64_t)sbdf.sbdf << 32) | reg; + *type = XEN_DMOP_IO_RANGE_PCI; + *addr = ((uint64_t)sbdf.sbdf << 32) | reg; /* AMD extended configuration space access? */ if ( CF8_ADDR_HI(cf8) && d->arch.cpuid->x86_vendor == X86_VENDOR_AMD && @@ -XXX,XX +XXX,XX @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, if ( !rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) && (msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT)) ) - addr |= CF8_ADDR_HI(cf8); + *addr |= CF8_ADDR_HI(cf8); } } else { - type = (p->type == IOREQ_TYPE_PIO) ? - XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY; - addr = p->addr; + *type = (p->type == IOREQ_TYPE_PIO) ? + XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY; + *addr = p->addr; } + return 0; +} + +struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, + ioreq_t *p) +{ + struct hvm_ioreq_server *s; + uint8_t type; + uint64_t addr; + unsigned int id; + + if ( hvm_get_ioreq_server_range_type(d, p, &type, &addr) ) + return NULL; + FOR_EACH_IOREQ_SERVER(d, id, s) { struct rangeset *r; @@ -XXX,XX +XXX,XX @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p) pg = iorp->va; if ( !pg ) - return X86EMUL_UNHANDLEABLE; + return IOREQ_IO_UNHANDLED; /* * Return 0 for the cases we can't deal with: @@ -XXX,XX +XXX,XX @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p) break; default: gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size); - return X86EMUL_UNHANDLEABLE; + return IOREQ_IO_UNHANDLED; } spin_lock(&s->bufioreq_lock); @@ -XXX,XX +XXX,XX @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p) { /* The queue is full: send the iopacket through the normal path. */ spin_unlock(&s->bufioreq_lock); - return X86EMUL_UNHANDLEABLE; + return IOREQ_IO_UNHANDLED; } pg->buf_ioreq[pg->ptrs.write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp; @@ -XXX,XX +XXX,XX @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p) notify_via_xen_event_channel(d, s->bufioreq_evtchn); spin_unlock(&s->bufioreq_lock); - return X86EMUL_OKAY; + return IOREQ_IO_HANDLED; } int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p, @@ -XXX,XX +XXX,XX @@ int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p, return hvm_send_buffered_ioreq(s, proto_p); if ( unlikely(!vcpu_start_shutdown_deferral(curr)) ) - return X86EMUL_RETRY; + return IOREQ_IO_RETRY; list_for_each_entry ( sv, &s->ioreq_vcpu_list, @@ -XXX,XX +XXX,XX @@ int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p, notify_via_xen_event_channel(d, port); sv->pending = true; - return X86EMUL_RETRY; + return IOREQ_IO_RETRY; } } - return X86EMUL_UNHANDLEABLE; + return IOREQ_IO_UNHANDLED; } unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered) @@ -XXX,XX +XXX,XX @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered) if ( !s->enabled ) continue; - if ( hvm_send_ioreq(s, p, buffered) == X86EMUL_UNHANDLEABLE ) + if ( hvm_send_ioreq(s, p, buffered) == IOREQ_IO_UNHANDLED ) failed++; } @@ -XXX,XX +XXX,XX @@ static int hvm_access_cf8( return X86EMUL_UNHANDLEABLE; } +void arch_hvm_ioreq_init(struct domain *d) +{ + register_portio_handler(d, 0xcf8, 4, hvm_access_cf8); +} + +void arch_hvm_ioreq_destroy(struct domain *d) +{ + +} + void hvm_ioreq_init(struct domain *d) { spin_lock_init(&d->arch.hvm.ioreq_server.lock); - register_portio_handler(d, 0xcf8, 4, hvm_access_cf8); + arch_hvm_ioreq_init(d); } /* diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/ioreq.h +++ b/xen/include/asm-x86/hvm/ioreq.h @@ -XXX,XX +XXX,XX @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered); void hvm_ioreq_init(struct domain *d); +int arch_hvm_destroy_ioreq_server(struct hvm_ioreq_server *s); + +bool arch_handle_hvm_io_completion(enum hvm_io_completion io_completion); + +int hvm_get_ioreq_server_range_type(struct domain *d, + ioreq_t *p, + uint8_t *type, + uint64_t *addr); + +void arch_hvm_ioreq_init(struct domain *d); +void arch_hvm_ioreq_destroy(struct domain *d); + +#define IOREQ_IO_HANDLED X86EMUL_OKAY +#define IOREQ_IO_UNHANDLED X86EMUL_UNHANDLEABLE +#define IOREQ_IO_RETRY X86EMUL_RETRY + #endif /* __ASM_X86_HVM_IOREQ_H__ */ /* -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> As a lot of x86 code can be re-used on Arm later on, this patch moves previously prepared IOREQ support to the common code. The code movement is almost a verbatim copy with re-ordering the headers alphabetically. This support is going to be used on Arm to be able run device emulator outside of Xen hypervisor. Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - was split into three patches: - x86/ioreq: Prepare IOREQ feature for making it common - xen/ioreq: Make x86's IOREQ feature common - xen/ioreq: Make x86's hvm_ioreq_needs_completion() common - update MAINTAINERS file - do not use a separate subdir for the IOREQ stuff, move it to: - xen/common/ioreq.c - xen/include/xen/ioreq.h - update x86's files to include xen/ioreq.h - remove unneeded headers in arch/x86/hvm/ioreq.c - re-order the headers alphabetically in common/ioreq.c - update common/ioreq.c according to the newly introduced arch functions: arch_hvm_destroy_ioreq_server()/arch_handle_hvm_io_completion() --- --- MAINTAINERS | 8 +- xen/arch/x86/Kconfig | 1 + xen/arch/x86/hvm/dm.c | 2 +- xen/arch/x86/hvm/emulate.c | 2 +- xen/arch/x86/hvm/hvm.c | 2 +- xen/arch/x86/hvm/io.c | 2 +- xen/arch/x86/hvm/ioreq.c | 1425 +-------------------------------------- xen/arch/x86/hvm/stdvga.c | 2 +- xen/arch/x86/hvm/vmx/vvmx.c | 3 +- xen/arch/x86/mm.c | 2 +- xen/arch/x86/mm/shadow/common.c | 2 +- xen/common/Kconfig | 3 + xen/common/Makefile | 1 + xen/common/ioreq.c | 1410 ++++++++++++++++++++++++++++++++++++++ xen/include/asm-x86/hvm/ioreq.h | 35 +- xen/include/xen/ioreq.h | 82 +++ 16 files changed, 1533 insertions(+), 1449 deletions(-) create mode 100644 xen/common/ioreq.c create mode 100644 xen/include/xen/ioreq.h diff --git a/MAINTAINERS b/MAINTAINERS index XXXXXXX..XXXXXXX 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -XXX,XX +XXX,XX @@ X: xen/drivers/passthrough/vtd/ X: xen/drivers/passthrough/device_tree.c F: xen/include/xen/iommu.h +I/O EMULATION (IOREQ) +M: Paul Durrant <paul@xen.org> +S: Supported +F: xen/common/ioreq.c +F: xen/include/xen/ioreq.h +F: xen/include/public/hvm/ioreq.h + KCONFIG M: Doug Goldstein <cardoe@cardoe.com> S: Supported @@ -XXX,XX +XXX,XX @@ F: xen/arch/x86/hvm/ioreq.c F: xen/include/asm-x86/hvm/emulate.h F: xen/include/asm-x86/hvm/io.h F: xen/include/asm-x86/hvm/ioreq.h -F: xen/include/public/hvm/ioreq.h X86 MEMORY MANAGEMENT M: Jan Beulich <jbeulich@suse.com> diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/Kconfig +++ b/xen/arch/x86/Kconfig @@ -XXX,XX +XXX,XX @@ config PV_LINEAR_PT config HVM def_bool !PV_SHIM_EXCLUSIVE + select IOREQ_SERVER prompt "HVM support" ---help--- Interfaces to support HVM domains. HVM domains require hardware diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/dm.c +++ b/xen/arch/x86/hvm/dm.c @@ -XXX,XX +XXX,XX @@ #include <xen/event.h> #include <xen/guest_access.h> #include <xen/hypercall.h> +#include <xen/ioreq.h> #include <xen/nospec.h> #include <xen/sched.h> #include <asm/hap.h> #include <asm/hvm/cacheattr.h> -#include <asm/hvm/ioreq.h> #include <asm/shadow.h> #include <xsm/xsm.h> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -XXX,XX +XXX,XX @@ */ #include <xen/init.h> +#include <xen/ioreq.h> #include <xen/lib.h> #include <xen/sched.h> #include <xen/paging.h> @@ -XXX,XX +XXX,XX @@ #include <asm/xstate.h> #include <asm/hvm/emulate.h> #include <asm/hvm/hvm.h> -#include <asm/hvm/ioreq.h> #include <asm/hvm/monitor.h> #include <asm/hvm/trace.h> #include <asm/hvm/support.h> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -XXX,XX +XXX,XX @@ #include <xen/ctype.h> #include <xen/init.h> +#include <xen/ioreq.h> #include <xen/lib.h> #include <xen/trace.h> #include <xen/sched.h> @@ -XXX,XX +XXX,XX @@ #include <asm/hvm/trace.h> #include <asm/hvm/nestedhvm.h> #include <asm/hvm/monitor.h> -#include <asm/hvm/ioreq.h> #include <asm/hvm/viridian.h> #include <asm/hvm/vm_event.h> #include <asm/altp2m.h> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/io.c +++ b/xen/arch/x86/hvm/io.c @@ -XXX,XX +XXX,XX @@ */ #include <xen/init.h> +#include <xen/ioreq.h> #include <xen/mm.h> #include <xen/lib.h> #include <xen/errno.h> @@ -XXX,XX +XXX,XX @@ #include <asm/shadow.h> #include <asm/p2m.h> #include <asm/hvm/hvm.h> -#include <asm/hvm/ioreq.h> #include <asm/hvm/support.h> #include <asm/hvm/vpt.h> #include <asm/hvm/vpic.h> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/ioreq.c +++ b/xen/arch/x86/hvm/ioreq.c @@ -XXX,XX +XXX,XX @@ * this program; If not, see <http://www.gnu.org/licenses/>. */ -#include <xen/ctype.h> -#include <xen/init.h> -#include <xen/lib.h> -#include <xen/trace.h> -#include <xen/sched.h> -#include <xen/irq.h> -#include <xen/softirq.h> #include <xen/domain.h> -#include <xen/event.h> -#include <xen/paging.h> -#include <xen/vpci.h> +#include <xen/ioreq.h> -#include <asm/hvm/emulate.h> -#include <asm/hvm/hvm.h> -#include <asm/hvm/ioreq.h> -#include <asm/hvm/vmx/vmx.h> - -#include <public/hvm/ioreq.h> -#include <public/hvm/params.h> - -static void set_ioreq_server(struct domain *d, unsigned int id, - struct hvm_ioreq_server *s) -{ - ASSERT(id < MAX_NR_IOREQ_SERVERS); - ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]); - - d->arch.hvm.ioreq_server.server[id] = s; -} - -#define GET_IOREQ_SERVER(d, id) \ - (d)->arch.hvm.ioreq_server.server[id] - -static struct hvm_ioreq_server *get_ioreq_server(const struct domain *d, - unsigned int id) -{ - if ( id >= MAX_NR_IOREQ_SERVERS ) - return NULL; - - return GET_IOREQ_SERVER(d, id); -} - -/* - * Iterate over all possible ioreq servers. - * - * NOTE: The iteration is backwards such that more recently created - * ioreq servers are favoured in hvm_select_ioreq_server(). - * This is a semantic that previously existed when ioreq servers - * were held in a linked list. - */ -#define FOR_EACH_IOREQ_SERVER(d, id, s) \ - for ( (id) = MAX_NR_IOREQ_SERVERS; (id) != 0; ) \ - if ( !(s = GET_IOREQ_SERVER(d, --(id))) ) \ - continue; \ - else - -static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v) -{ - shared_iopage_t *p = s->ioreq.va; - - ASSERT((v == current) || !vcpu_runnable(v)); - ASSERT(p != NULL); - - return &p->vcpu_ioreq[v->vcpu_id]; -} - -static struct hvm_ioreq_vcpu *get_pending_vcpu(const struct vcpu *v, - struct hvm_ioreq_server **srvp) -{ - struct domain *d = v->domain; - struct hvm_ioreq_server *s; - unsigned int id; - - FOR_EACH_IOREQ_SERVER(d, id, s) - { - struct hvm_ioreq_vcpu *sv; - - list_for_each_entry ( sv, - &s->ioreq_vcpu_list, - list_entry ) - { - if ( sv->vcpu == v && sv->pending ) - { - if ( srvp ) - *srvp = s; - return sv; - } - } - } - - return NULL; -} - -bool hvm_io_pending(struct vcpu *v) -{ - return get_pending_vcpu(v, NULL); -} - -static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p) -{ - unsigned int prev_state = STATE_IOREQ_NONE; - unsigned int state = p->state; - uint64_t data = ~0; - - smp_rmb(); - - /* - * The only reason we should see this condition be false is when an - * emulator dying races with I/O being requested. - */ - while ( likely(state != STATE_IOREQ_NONE) ) - { - if ( unlikely(state < prev_state) ) - { - gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> %u\n", - prev_state, state); - sv->pending = false; - domain_crash(sv->vcpu->domain); - return false; /* bail */ - } - - switch ( prev_state = state ) - { - case STATE_IORESP_READY: /* IORESP_READY -> NONE */ - p->state = STATE_IOREQ_NONE; - data = p->data; - break; - - case STATE_IOREQ_READY: /* IOREQ_{READY,INPROCESS} -> IORESP_READY */ - case STATE_IOREQ_INPROCESS: - wait_on_xen_event_channel(sv->ioreq_evtchn, - ({ state = p->state; - smp_rmb(); - state != prev_state; })); - continue; - - default: - gdprintk(XENLOG_ERR, "Weird HVM iorequest state %u\n", state); - sv->pending = false; - domain_crash(sv->vcpu->domain); - return false; /* bail */ - } - - break; - } - - p = &sv->vcpu->arch.hvm.hvm_io.io_req; - if ( hvm_ioreq_needs_completion(p) ) - p->data = data; - - sv->pending = false; - - return true; -} - -bool arch_handle_hvm_io_completion(enum hvm_io_completion io_completion) -{ - switch ( io_completion ) - { - case HVMIO_realmode_completion: - { - struct hvm_emulate_ctxt ctxt; - - hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs()); - vmx_realmode_emulate_one(&ctxt); - hvm_emulate_writeback(&ctxt); - - break; - } - - default: - ASSERT_UNREACHABLE(); - break; - } - - return true; -} - -bool handle_hvm_io_completion(struct vcpu *v) -{ - struct domain *d = v->domain; - struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io; - struct hvm_ioreq_server *s; - struct hvm_ioreq_vcpu *sv; - enum hvm_io_completion io_completion; - - if ( has_vpci(d) && vpci_process_pending(v) ) - { - raise_softirq(SCHEDULE_SOFTIRQ); - return false; - } - - sv = get_pending_vcpu(v, &s); - if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) ) - return false; - - vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ? - STATE_IORESP_READY : STATE_IOREQ_NONE; - - msix_write_completion(v); - vcpu_end_shutdown_deferral(v); - - io_completion = vio->io_completion; - vio->io_completion = HVMIO_no_completion; - - switch ( io_completion ) - { - case HVMIO_no_completion: - break; - - case HVMIO_mmio_completion: - return handle_mmio(); - - case HVMIO_pio_completion: - return handle_pio(vio->io_req.addr, vio->io_req.size, - vio->io_req.dir); - - default: - return arch_handle_hvm_io_completion(io_completion); - } - - return true; -} - -static gfn_t hvm_alloc_legacy_ioreq_gfn(struct hvm_ioreq_server *s) -{ - struct domain *d = s->target; - unsigned int i; - - BUILD_BUG_ON(HVM_PARAM_BUFIOREQ_PFN != HVM_PARAM_IOREQ_PFN + 1); - - for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ ) - { - if ( !test_and_clear_bit(i, &d->arch.hvm.ioreq_gfn.legacy_mask) ) - return _gfn(d->arch.hvm.params[i]); - } - - return INVALID_GFN; -} - -static gfn_t hvm_alloc_ioreq_gfn(struct hvm_ioreq_server *s) -{ - struct domain *d = s->target; - unsigned int i; - - for ( i = 0; i < sizeof(d->arch.hvm.ioreq_gfn.mask) * 8; i++ ) - { - if ( test_and_clear_bit(i, &d->arch.hvm.ioreq_gfn.mask) ) - return _gfn(d->arch.hvm.ioreq_gfn.base + i); - } - - /* - * If we are out of 'normal' GFNs then we may still have a 'legacy' - * GFN available. - */ - return hvm_alloc_legacy_ioreq_gfn(s); -} - -static bool hvm_free_legacy_ioreq_gfn(struct hvm_ioreq_server *s, - gfn_t gfn) -{ - struct domain *d = s->target; - unsigned int i; - - for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ ) - { - if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) ) - break; - } - if ( i > HVM_PARAM_BUFIOREQ_PFN ) - return false; - - set_bit(i, &d->arch.hvm.ioreq_gfn.legacy_mask); - return true; -} - -static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s, gfn_t gfn) -{ - struct domain *d = s->target; - unsigned int i = gfn_x(gfn) - d->arch.hvm.ioreq_gfn.base; - - ASSERT(!gfn_eq(gfn, INVALID_GFN)); - - if ( !hvm_free_legacy_ioreq_gfn(s, gfn) ) - { - ASSERT(i < sizeof(d->arch.hvm.ioreq_gfn.mask) * 8); - set_bit(i, &d->arch.hvm.ioreq_gfn.mask); - } -} - -static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf) -{ - struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; - - if ( gfn_eq(iorp->gfn, INVALID_GFN) ) - return; - - destroy_ring_for_helper(&iorp->va, iorp->page); - iorp->page = NULL; - - hvm_free_ioreq_gfn(s, iorp->gfn); - iorp->gfn = INVALID_GFN; -} - -static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf) -{ - struct domain *d = s->target; - struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; - int rc; - - if ( iorp->page ) - { - /* - * If a page has already been allocated (which will happen on - * demand if hvm_get_ioreq_server_frame() is called), then - * mapping a guest frame is not permitted. - */ - if ( gfn_eq(iorp->gfn, INVALID_GFN) ) - return -EPERM; - - return 0; - } - - if ( d->is_dying ) - return -EINVAL; - - iorp->gfn = hvm_alloc_ioreq_gfn(s); - - if ( gfn_eq(iorp->gfn, INVALID_GFN) ) - return -ENOMEM; - - rc = prepare_ring_for_helper(d, gfn_x(iorp->gfn), &iorp->page, - &iorp->va); - - if ( rc ) - hvm_unmap_ioreq_gfn(s, buf); - - return rc; -} - -static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf) -{ - struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; - struct page_info *page; - - if ( iorp->page ) - { - /* - * If a guest frame has already been mapped (which may happen - * on demand if hvm_get_ioreq_server_info() is called), then - * allocating a page is not permitted. - */ - if ( !gfn_eq(iorp->gfn, INVALID_GFN) ) - return -EPERM; - - return 0; - } - - page = alloc_domheap_page(s->target, MEMF_no_refcount); - - if ( !page ) - return -ENOMEM; - - if ( !get_page_and_type(page, s->target, PGT_writable_page) ) - { - /* - * The domain can't possibly know about this page yet, so failure - * here is a clear indication of something fishy going on. - */ - domain_crash(s->emulator); - return -ENODATA; - } - - iorp->va = __map_domain_page_global(page); - if ( !iorp->va ) - goto fail; - - iorp->page = page; - clear_page(iorp->va); - return 0; - - fail: - put_page_alloc_ref(page); - put_page_and_type(page); - - return -ENOMEM; -} - -static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf) -{ - struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; - struct page_info *page = iorp->page; - - if ( !page ) - return; - - iorp->page = NULL; - - unmap_domain_page_global(iorp->va); - iorp->va = NULL; - - put_page_alloc_ref(page); - put_page_and_type(page); -} - -bool is_ioreq_server_page(struct domain *d, const struct page_info *page) -{ - const struct hvm_ioreq_server *s; - unsigned int id; - bool found = false; - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - FOR_EACH_IOREQ_SERVER(d, id, s) - { - if ( (s->ioreq.page == page) || (s->bufioreq.page == page) ) - { - found = true; - break; - } - } - - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - - return found; -} - -static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf) - -{ - struct domain *d = s->target; - struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; - - if ( gfn_eq(iorp->gfn, INVALID_GFN) ) - return; - - if ( guest_physmap_remove_page(d, iorp->gfn, - page_to_mfn(iorp->page), 0) ) - domain_crash(d); - clear_page(iorp->va); -} - -static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf) -{ - struct domain *d = s->target; - struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; - int rc; - - if ( gfn_eq(iorp->gfn, INVALID_GFN) ) - return 0; - - clear_page(iorp->va); - - rc = guest_physmap_add_page(d, iorp->gfn, - page_to_mfn(iorp->page), 0); - if ( rc == 0 ) - paging_mark_pfn_dirty(d, _pfn(gfn_x(iorp->gfn))); - - return rc; -} - -static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s, - struct hvm_ioreq_vcpu *sv) -{ - ASSERT(spin_is_locked(&s->lock)); - - if ( s->ioreq.va != NULL ) - { - ioreq_t *p = get_ioreq(s, sv->vcpu); - - p->vp_eport = sv->ioreq_evtchn; - } -} - -#define HANDLE_BUFIOREQ(s) \ - ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) - -static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s, - struct vcpu *v) -{ - struct hvm_ioreq_vcpu *sv; - int rc; - - sv = xzalloc(struct hvm_ioreq_vcpu); - - rc = -ENOMEM; - if ( !sv ) - goto fail1; - - spin_lock(&s->lock); - - rc = alloc_unbound_xen_event_channel(v->domain, v->vcpu_id, - s->emulator->domain_id, NULL); - if ( rc < 0 ) - goto fail2; - - sv->ioreq_evtchn = rc; - - if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) ) - { - rc = alloc_unbound_xen_event_channel(v->domain, 0, - s->emulator->domain_id, NULL); - if ( rc < 0 ) - goto fail3; - - s->bufioreq_evtchn = rc; - } - - sv->vcpu = v; - - list_add(&sv->list_entry, &s->ioreq_vcpu_list); - - if ( s->enabled ) - hvm_update_ioreq_evtchn(s, sv); - - spin_unlock(&s->lock); - return 0; - - fail3: - free_xen_event_channel(v->domain, sv->ioreq_evtchn); - - fail2: - spin_unlock(&s->lock); - xfree(sv); - - fail1: - return rc; -} - -static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, - struct vcpu *v) -{ - struct hvm_ioreq_vcpu *sv; - - spin_lock(&s->lock); - - list_for_each_entry ( sv, - &s->ioreq_vcpu_list, - list_entry ) - { - if ( sv->vcpu != v ) - continue; - - list_del(&sv->list_entry); - - if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) ) - free_xen_event_channel(v->domain, s->bufioreq_evtchn); - - free_xen_event_channel(v->domain, sv->ioreq_evtchn); - - xfree(sv); - break; - } - - spin_unlock(&s->lock); -} - -static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s) -{ - struct hvm_ioreq_vcpu *sv, *next; - - spin_lock(&s->lock); - - list_for_each_entry_safe ( sv, - next, - &s->ioreq_vcpu_list, - list_entry ) - { - struct vcpu *v = sv->vcpu; - - list_del(&sv->list_entry); - - if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) ) - free_xen_event_channel(v->domain, s->bufioreq_evtchn); - - free_xen_event_channel(v->domain, sv->ioreq_evtchn); - - xfree(sv); - } - - spin_unlock(&s->lock); -} - -static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s) -{ - int rc; - - rc = hvm_map_ioreq_gfn(s, false); - - if ( !rc && HANDLE_BUFIOREQ(s) ) - rc = hvm_map_ioreq_gfn(s, true); - - if ( rc ) - hvm_unmap_ioreq_gfn(s, false); - - return rc; -} - -static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s) -{ - hvm_unmap_ioreq_gfn(s, true); - hvm_unmap_ioreq_gfn(s, false); -} - -static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s) -{ - int rc; - - rc = hvm_alloc_ioreq_mfn(s, false); - - if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) ) - rc = hvm_alloc_ioreq_mfn(s, true); - - if ( rc ) - hvm_free_ioreq_mfn(s, false); - - return rc; -} - -static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s) -{ - hvm_free_ioreq_mfn(s, true); - hvm_free_ioreq_mfn(s, false); -} - -static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s) -{ - unsigned int i; - - for ( i = 0; i < NR_IO_RANGE_TYPES; i++ ) - rangeset_destroy(s->range[i]); -} - -static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, - ioservid_t id) -{ - unsigned int i; - int rc; - - for ( i = 0; i < NR_IO_RANGE_TYPES; i++ ) - { - char *name; - - rc = asprintf(&name, "ioreq_server %d %s", id, - (i == XEN_DMOP_IO_RANGE_PORT) ? "port" : - (i == XEN_DMOP_IO_RANGE_MEMORY) ? "memory" : - (i == XEN_DMOP_IO_RANGE_PCI) ? "pci" : - ""); - if ( rc ) - goto fail; - - s->range[i] = rangeset_new(s->target, name, - RANGESETF_prettyprint_hex); - - xfree(name); - - rc = -ENOMEM; - if ( !s->range[i] ) - goto fail; - - rangeset_limit(s->range[i], MAX_NR_IO_RANGES); - } - - return 0; - - fail: - hvm_ioreq_server_free_rangesets(s); - - return rc; -} - -static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s) -{ - struct hvm_ioreq_vcpu *sv; - - spin_lock(&s->lock); - - if ( s->enabled ) - goto done; - - hvm_remove_ioreq_gfn(s, false); - hvm_remove_ioreq_gfn(s, true); - - s->enabled = true; - - list_for_each_entry ( sv, - &s->ioreq_vcpu_list, - list_entry ) - hvm_update_ioreq_evtchn(s, sv); - - done: - spin_unlock(&s->lock); -} - -static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s) -{ - spin_lock(&s->lock); - - if ( !s->enabled ) - goto done; - - hvm_add_ioreq_gfn(s, true); - hvm_add_ioreq_gfn(s, false); - - s->enabled = false; - - done: - spin_unlock(&s->lock); -} - -static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, - struct domain *d, int bufioreq_handling, - ioservid_t id) -{ - struct domain *currd = current->domain; - struct vcpu *v; - int rc; - - s->target = d; - - get_knownalive_domain(currd); - s->emulator = currd; - - spin_lock_init(&s->lock); - INIT_LIST_HEAD(&s->ioreq_vcpu_list); - spin_lock_init(&s->bufioreq_lock); - - s->ioreq.gfn = INVALID_GFN; - s->bufioreq.gfn = INVALID_GFN; - - rc = hvm_ioreq_server_alloc_rangesets(s, id); - if ( rc ) - return rc; - - s->bufioreq_handling = bufioreq_handling; - - for_each_vcpu ( d, v ) - { - rc = hvm_ioreq_server_add_vcpu(s, v); - if ( rc ) - goto fail_add; - } - - return 0; - - fail_add: - hvm_ioreq_server_remove_all_vcpus(s); - hvm_ioreq_server_unmap_pages(s); - - hvm_ioreq_server_free_rangesets(s); - - put_domain(s->emulator); - return rc; -} - -static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s) -{ - ASSERT(!s->enabled); - hvm_ioreq_server_remove_all_vcpus(s); - - /* - * NOTE: It is safe to call both hvm_ioreq_server_unmap_pages() and - * hvm_ioreq_server_free_pages() in that order. - * This is because the former will do nothing if the pages - * are not mapped, leaving the page to be freed by the latter. - * However if the pages are mapped then the former will set - * the page_info pointer to NULL, meaning the latter will do - * nothing. - */ - hvm_ioreq_server_unmap_pages(s); - hvm_ioreq_server_free_pages(s); - - hvm_ioreq_server_free_rangesets(s); - - put_domain(s->emulator); -} - -int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, - ioservid_t *id) -{ - struct hvm_ioreq_server *s; - unsigned int i; - int rc; - - if ( bufioreq_handling > HVM_IOREQSRV_BUFIOREQ_ATOMIC ) - return -EINVAL; - - s = xzalloc(struct hvm_ioreq_server); - if ( !s ) - return -ENOMEM; - - domain_pause(d); - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - for ( i = 0; i < MAX_NR_IOREQ_SERVERS; i++ ) - { - if ( !GET_IOREQ_SERVER(d, i) ) - break; - } - - rc = -ENOSPC; - if ( i >= MAX_NR_IOREQ_SERVERS ) - goto fail; - - /* - * It is safe to call set_ioreq_server() prior to - * hvm_ioreq_server_init() since the target domain is paused. - */ - set_ioreq_server(d, i, s); - - rc = hvm_ioreq_server_init(s, d, bufioreq_handling, i); - if ( rc ) - { - set_ioreq_server(d, i, NULL); - goto fail; - } - - if ( id ) - *id = i; - - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - domain_unpause(d); - - return 0; - - fail: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - domain_unpause(d); - - xfree(s); - return rc; -} - -/* Called when target domain is paused */ -int arch_hvm_destroy_ioreq_server(struct hvm_ioreq_server *s) -{ - return p2m_set_ioreq_server(s->target, 0, s); -} - -int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id) -{ - struct hvm_ioreq_server *s; - int rc; - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - s = get_ioreq_server(d, id); - - rc = -ENOENT; - if ( !s ) - goto out; - - rc = -EPERM; - if ( s->emulator != current->domain ) - goto out; - - domain_pause(d); - - arch_hvm_destroy_ioreq_server(s); - - hvm_ioreq_server_disable(s); - - /* - * It is safe to call hvm_ioreq_server_deinit() prior to - * set_ioreq_server() since the target domain is paused. - */ - hvm_ioreq_server_deinit(s); - set_ioreq_server(d, id, NULL); - - domain_unpause(d); - - xfree(s); - - rc = 0; - - out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - - return rc; -} - -int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id, - unsigned long *ioreq_gfn, - unsigned long *bufioreq_gfn, - evtchn_port_t *bufioreq_port) -{ - struct hvm_ioreq_server *s; - int rc; - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - s = get_ioreq_server(d, id); - - rc = -ENOENT; - if ( !s ) - goto out; - - rc = -EPERM; - if ( s->emulator != current->domain ) - goto out; - - if ( ioreq_gfn || bufioreq_gfn ) - { - rc = hvm_ioreq_server_map_pages(s); - if ( rc ) - goto out; - } - - if ( ioreq_gfn ) - *ioreq_gfn = gfn_x(s->ioreq.gfn); - - if ( HANDLE_BUFIOREQ(s) ) - { - if ( bufioreq_gfn ) - *bufioreq_gfn = gfn_x(s->bufioreq.gfn); - - if ( bufioreq_port ) - *bufioreq_port = s->bufioreq_evtchn; - } - - rc = 0; - - out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - - return rc; -} - -int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id, - unsigned long idx, mfn_t *mfn) -{ - struct hvm_ioreq_server *s; - int rc; - - ASSERT(is_hvm_domain(d)); - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - s = get_ioreq_server(d, id); - - rc = -ENOENT; - if ( !s ) - goto out; - - rc = -EPERM; - if ( s->emulator != current->domain ) - goto out; - - rc = hvm_ioreq_server_alloc_pages(s); - if ( rc ) - goto out; +#include <public/hvm/ioreq.h> +#include <public/hvm/params.h> - switch ( idx ) +bool arch_handle_hvm_io_completion(enum hvm_io_completion io_completion) +{ + switch ( io_completion ) { - case XENMEM_resource_ioreq_server_frame_bufioreq: - rc = -ENOENT; - if ( !HANDLE_BUFIOREQ(s) ) - goto out; - - *mfn = page_to_mfn(s->bufioreq.page); - rc = 0; - break; + case HVMIO_realmode_completion: + { + struct hvm_emulate_ctxt ctxt; - case XENMEM_resource_ioreq_server_frame_ioreq(0): - *mfn = page_to_mfn(s->ioreq.page); - rc = 0; - break; + hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs()); + vmx_realmode_emulate_one(&ctxt); + hvm_emulate_writeback(&ctxt); - default: - rc = -EINVAL; break; } - out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - - return rc; -} - -int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, - uint32_t type, uint64_t start, - uint64_t end) -{ - struct hvm_ioreq_server *s; - struct rangeset *r; - int rc; - - if ( start > end ) - return -EINVAL; - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - s = get_ioreq_server(d, id); - - rc = -ENOENT; - if ( !s ) - goto out; - - rc = -EPERM; - if ( s->emulator != current->domain ) - goto out; - - switch ( type ) - { - case XEN_DMOP_IO_RANGE_PORT: - case XEN_DMOP_IO_RANGE_MEMORY: - case XEN_DMOP_IO_RANGE_PCI: - r = s->range[type]; - break; - default: - r = NULL; + ASSERT_UNREACHABLE(); break; } - rc = -EINVAL; - if ( !r ) - goto out; - - rc = -EEXIST; - if ( rangeset_overlaps_range(r, start, end) ) - goto out; - - rc = rangeset_add_range(r, start, end); - - out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - - return rc; + return true; } -int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, - uint32_t type, uint64_t start, - uint64_t end) +/* Called when target domain is paused */ +int arch_hvm_destroy_ioreq_server(struct hvm_ioreq_server *s) { - struct hvm_ioreq_server *s; - struct rangeset *r; - int rc; - - if ( start > end ) - return -EINVAL; - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - s = get_ioreq_server(d, id); - - rc = -ENOENT; - if ( !s ) - goto out; - - rc = -EPERM; - if ( s->emulator != current->domain ) - goto out; - - switch ( type ) - { - case XEN_DMOP_IO_RANGE_PORT: - case XEN_DMOP_IO_RANGE_MEMORY: - case XEN_DMOP_IO_RANGE_PCI: - r = s->range[type]; - break; - - default: - r = NULL; - break; - } - - rc = -EINVAL; - if ( !r ) - goto out; - - rc = -ENOENT; - if ( !rangeset_contains_range(r, start, end) ) - goto out; - - rc = rangeset_remove_range(r, start, end); - - out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - - return rc; + return p2m_set_ioreq_server(s->target, 0, s); } /* @@ -XXX,XX +XXX,XX @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, return rc; } -int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, - bool enabled) -{ - struct hvm_ioreq_server *s; - int rc; - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - s = get_ioreq_server(d, id); - - rc = -ENOENT; - if ( !s ) - goto out; - - rc = -EPERM; - if ( s->emulator != current->domain ) - goto out; - - domain_pause(d); - - if ( enabled ) - hvm_ioreq_server_enable(s); - else - hvm_ioreq_server_disable(s); - - domain_unpause(d); - - rc = 0; - - out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - return rc; -} - -int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v) -{ - struct hvm_ioreq_server *s; - unsigned int id; - int rc; - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - FOR_EACH_IOREQ_SERVER(d, id, s) - { - rc = hvm_ioreq_server_add_vcpu(s, v); - if ( rc ) - goto fail; - } - - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - - return 0; - - fail: - while ( ++id != MAX_NR_IOREQ_SERVERS ) - { - s = GET_IOREQ_SERVER(d, id); - - if ( !s ) - continue; - - hvm_ioreq_server_remove_vcpu(s, v); - } - - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - - return rc; -} - -void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v) -{ - struct hvm_ioreq_server *s; - unsigned int id; - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - FOR_EACH_IOREQ_SERVER(d, id, s) - hvm_ioreq_server_remove_vcpu(s, v); - - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); -} - -void hvm_destroy_all_ioreq_servers(struct domain *d) -{ - struct hvm_ioreq_server *s; - unsigned int id; - - arch_hvm_ioreq_destroy(d); - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - /* No need to domain_pause() as the domain is being torn down */ - - FOR_EACH_IOREQ_SERVER(d, id, s) - { - hvm_ioreq_server_disable(s); - - /* - * It is safe to call hvm_ioreq_server_deinit() prior to - * set_ioreq_server() since the target domain is being destroyed. - */ - hvm_ioreq_server_deinit(s); - set_ioreq_server(d, id, NULL); - - xfree(s); - } - - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); -} - int hvm_get_ioreq_server_range_type(struct domain *d, ioreq_t *p, uint8_t *type, @@ -XXX,XX +XXX,XX @@ int hvm_get_ioreq_server_range_type(struct domain *d, return 0; } -struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, - ioreq_t *p) -{ - struct hvm_ioreq_server *s; - uint8_t type; - uint64_t addr; - unsigned int id; - - if ( hvm_get_ioreq_server_range_type(d, p, &type, &addr) ) - return NULL; - - FOR_EACH_IOREQ_SERVER(d, id, s) - { - struct rangeset *r; - - if ( !s->enabled ) - continue; - - r = s->range[type]; - - switch ( type ) - { - unsigned long start, end; - - case XEN_DMOP_IO_RANGE_PORT: - start = addr; - end = start + p->size - 1; - if ( rangeset_contains_range(r, start, end) ) - return s; - - break; - - case XEN_DMOP_IO_RANGE_MEMORY: - start = hvm_mmio_first_byte(p); - end = hvm_mmio_last_byte(p); - - if ( rangeset_contains_range(r, start, end) ) - return s; - - break; - - case XEN_DMOP_IO_RANGE_PCI: - if ( rangeset_contains_singleton(r, addr >> 32) ) - { - p->type = IOREQ_TYPE_PCI_CONFIG; - p->addr = addr; - return s; - } - - break; - } - } - - return NULL; -} - -static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p) -{ - struct domain *d = current->domain; - struct hvm_ioreq_page *iorp; - buffered_iopage_t *pg; - buf_ioreq_t bp = { .data = p->data, - .addr = p->addr, - .type = p->type, - .dir = p->dir }; - /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */ - int qw = 0; - - /* Ensure buffered_iopage fits in a page */ - BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE); - - iorp = &s->bufioreq; - pg = iorp->va; - - if ( !pg ) - return IOREQ_IO_UNHANDLED; - - /* - * Return 0 for the cases we can't deal with: - * - 'addr' is only a 20-bit field, so we cannot address beyond 1MB - * - we cannot buffer accesses to guest memory buffers, as the guest - * may expect the memory buffer to be synchronously accessed - * - the count field is usually used with data_is_ptr and since we don't - * support data_is_ptr we do not waste space for the count field either - */ - if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) ) - return 0; - - switch ( p->size ) - { - case 1: - bp.size = 0; - break; - case 2: - bp.size = 1; - break; - case 4: - bp.size = 2; - break; - case 8: - bp.size = 3; - qw = 1; - break; - default: - gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size); - return IOREQ_IO_UNHANDLED; - } - - spin_lock(&s->bufioreq_lock); - - if ( (pg->ptrs.write_pointer - pg->ptrs.read_pointer) >= - (IOREQ_BUFFER_SLOT_NUM - qw) ) - { - /* The queue is full: send the iopacket through the normal path. */ - spin_unlock(&s->bufioreq_lock); - return IOREQ_IO_UNHANDLED; - } - - pg->buf_ioreq[pg->ptrs.write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp; - - if ( qw ) - { - bp.data = p->data >> 32; - pg->buf_ioreq[(pg->ptrs.write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp; - } - - /* Make the ioreq_t visible /before/ write_pointer. */ - smp_wmb(); - pg->ptrs.write_pointer += qw ? 2 : 1; - - /* Canonicalize read/write pointers to prevent their overflow. */ - while ( (s->bufioreq_handling == HVM_IOREQSRV_BUFIOREQ_ATOMIC) && - qw++ < IOREQ_BUFFER_SLOT_NUM && - pg->ptrs.read_pointer >= IOREQ_BUFFER_SLOT_NUM ) - { - union bufioreq_pointers old = pg->ptrs, new; - unsigned int n = old.read_pointer / IOREQ_BUFFER_SLOT_NUM; - - new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM; - new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM; - cmpxchg(&pg->ptrs.full, old.full, new.full); - } - - notify_via_xen_event_channel(d, s->bufioreq_evtchn); - spin_unlock(&s->bufioreq_lock); - - return IOREQ_IO_HANDLED; -} - -int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p, - bool buffered) -{ - struct vcpu *curr = current; - struct domain *d = curr->domain; - struct hvm_ioreq_vcpu *sv; - - ASSERT(s); - - if ( buffered ) - return hvm_send_buffered_ioreq(s, proto_p); - - if ( unlikely(!vcpu_start_shutdown_deferral(curr)) ) - return IOREQ_IO_RETRY; - - list_for_each_entry ( sv, - &s->ioreq_vcpu_list, - list_entry ) - { - if ( sv->vcpu == curr ) - { - evtchn_port_t port = sv->ioreq_evtchn; - ioreq_t *p = get_ioreq(s, curr); - - if ( unlikely(p->state != STATE_IOREQ_NONE) ) - { - gprintk(XENLOG_ERR, "device model set bad IO state %d\n", - p->state); - break; - } - - if ( unlikely(p->vp_eport != port) ) - { - gprintk(XENLOG_ERR, "device model set bad event channel %d\n", - p->vp_eport); - break; - } - - proto_p->state = STATE_IOREQ_NONE; - proto_p->vp_eport = port; - *p = *proto_p; - - prepare_wait_on_xen_event_channel(port); - - /* - * Following happens /after/ blocking and setting up ioreq - * contents. prepare_wait_on_xen_event_channel() is an implicit - * barrier. - */ - p->state = STATE_IOREQ_READY; - notify_via_xen_event_channel(d, port); - - sv->pending = true; - return IOREQ_IO_RETRY; - } - } - - return IOREQ_IO_UNHANDLED; -} - -unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered) -{ - struct domain *d = current->domain; - struct hvm_ioreq_server *s; - unsigned int id, failed = 0; - - FOR_EACH_IOREQ_SERVER(d, id, s) - { - if ( !s->enabled ) - continue; - - if ( hvm_send_ioreq(s, p, buffered) == IOREQ_IO_UNHANDLED ) - failed++; - } - - return failed; -} - static int hvm_access_cf8( int dir, unsigned int port, unsigned int bytes, uint32_t *val) { @@ -XXX,XX +XXX,XX @@ void arch_hvm_ioreq_destroy(struct domain *d) } -void hvm_ioreq_init(struct domain *d) -{ - spin_lock_init(&d->arch.hvm.ioreq_server.lock); - - arch_hvm_ioreq_init(d); -} - /* * Local variables: * mode: C diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/stdvga.c +++ b/xen/arch/x86/hvm/stdvga.c @@ -XXX,XX +XXX,XX @@ * can have side effects. */ +#include <xen/ioreq.h> #include <xen/types.h> #include <xen/sched.h> #include <xen/domain_page.h> -#include <asm/hvm/ioreq.h> #include <asm/hvm/support.h> #include <xen/numa.h> #include <xen/paging.h> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -XXX,XX +XXX,XX @@ * */ +#include <xen/ioreq.h> + #include <asm/types.h> #include <asm/mtrr.h> #include <asm/p2m.h> -#include <asm/hvm/ioreq.h> #include <asm/hvm/vmx/vmx.h> #include <asm/hvm/vmx/vvmx.h> #include <asm/hvm/nestedhvm.h> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -XXX,XX +XXX,XX @@ */ #include <xen/init.h> +#include <xen/ioreq.h> #include <xen/kernel.h> #include <xen/lib.h> #include <xen/mm.h> @@ -XXX,XX +XXX,XX @@ #include <asm/io_apic.h> #include <asm/pci.h> #include <asm/guest.h> -#include <asm/hvm/ioreq.h> #include <asm/hvm/grant_table.h> #include <asm/pv/domain.h> diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/mm/shadow/common.c +++ b/xen/arch/x86/mm/shadow/common.c @@ -XXX,XX +XXX,XX @@ * along with this program; If not, see <http://www.gnu.org/licenses/>. */ +#include <xen/ioreq.h> #include <xen/types.h> #include <xen/mm.h> #include <xen/trace.h> @@ -XXX,XX +XXX,XX @@ #include <asm/current.h> #include <asm/flushtlb.h> #include <asm/shadow.h> -#include <asm/hvm/ioreq.h> #include <xen/numa.h> #include "private.h" diff --git a/xen/common/Kconfig b/xen/common/Kconfig index XXXXXXX..XXXXXXX 100644 --- a/xen/common/Kconfig +++ b/xen/common/Kconfig @@ -XXX,XX +XXX,XX @@ config HYPFS_CONFIG Disable this option in case you want to spare some memory or you want to hide the .config contents from dom0. +config IOREQ_SERVER + bool + config KEXEC bool "kexec support" default y diff --git a/xen/common/Makefile b/xen/common/Makefile index XXXXXXX..XXXXXXX 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_GRANT_TABLE) += grant_table.o obj-y += guestcopy.o obj-bin-y += gunzip.init.o obj-$(CONFIG_HYPFS) += hypfs.o +obj-$(CONFIG_IOREQ_SERVER) += ioreq.o obj-y += irq.o obj-y += kernel.o obj-y += keyhandler.o diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/xen/common/ioreq.c @@ -XXX,XX +XXX,XX @@ +/* + * common/ioreq.c: hardware virtual machine I/O emulation + * + * Copyright (c) 2016 Citrix Systems Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; If not, see <http://www.gnu.org/licenses/>. + */ + +#include <xen/ctype.h> +#include <xen/domain.h> +#include <xen/domain_page.h> +#include <xen/event.h> +#include <xen/init.h> +#include <xen/ioreq.h> +#include <xen/irq.h> +#include <xen/lib.h> +#include <xen/paging.h> +#include <xen/sched.h> +#include <xen/softirq.h> +#include <xen/trace.h> +#include <xen/vpci.h> + +#include <public/hvm/dm_op.h> +#include <public/hvm/ioreq.h> +#include <public/hvm/params.h> + +static void set_ioreq_server(struct domain *d, unsigned int id, + struct hvm_ioreq_server *s) +{ + ASSERT(id < MAX_NR_IOREQ_SERVERS); + ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]); + + d->arch.hvm.ioreq_server.server[id] = s; +} + +/* + * Iterate over all possible ioreq servers. + * + * NOTE: The iteration is backwards such that more recently created + * ioreq servers are favoured in hvm_select_ioreq_server(). + * This is a semantic that previously existed when ioreq servers + * were held in a linked list. + */ +#define FOR_EACH_IOREQ_SERVER(d, id, s) \ + for ( (id) = MAX_NR_IOREQ_SERVERS; (id) != 0; ) \ + if ( !(s = GET_IOREQ_SERVER(d, --(id))) ) \ + continue; \ + else + +static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v) +{ + shared_iopage_t *p = s->ioreq.va; + + ASSERT((v == current) || !vcpu_runnable(v)); + ASSERT(p != NULL); + + return &p->vcpu_ioreq[v->vcpu_id]; +} + +static struct hvm_ioreq_vcpu *get_pending_vcpu(const struct vcpu *v, + struct hvm_ioreq_server **srvp) +{ + struct domain *d = v->domain; + struct hvm_ioreq_server *s; + unsigned int id; + + FOR_EACH_IOREQ_SERVER(d, id, s) + { + struct hvm_ioreq_vcpu *sv; + + list_for_each_entry ( sv, + &s->ioreq_vcpu_list, + list_entry ) + { + if ( sv->vcpu == v && sv->pending ) + { + if ( srvp ) + *srvp = s; + return sv; + } + } + } + + return NULL; +} + +bool hvm_io_pending(struct vcpu *v) +{ + return get_pending_vcpu(v, NULL); +} + +static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p) +{ + unsigned int prev_state = STATE_IOREQ_NONE; + unsigned int state = p->state; + uint64_t data = ~0; + + smp_rmb(); + + /* + * The only reason we should see this condition be false is when an + * emulator dying races with I/O being requested. + */ + while ( likely(state != STATE_IOREQ_NONE) ) + { + if ( unlikely(state < prev_state) ) + { + gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> %u\n", + prev_state, state); + sv->pending = false; + domain_crash(sv->vcpu->domain); + return false; /* bail */ + } + + switch ( prev_state = state ) + { + case STATE_IORESP_READY: /* IORESP_READY -> NONE */ + p->state = STATE_IOREQ_NONE; + data = p->data; + break; + + case STATE_IOREQ_READY: /* IOREQ_{READY,INPROCESS} -> IORESP_READY */ + case STATE_IOREQ_INPROCESS: + wait_on_xen_event_channel(sv->ioreq_evtchn, + ({ state = p->state; + smp_rmb(); + state != prev_state; })); + continue; + + default: + gdprintk(XENLOG_ERR, "Weird HVM iorequest state %u\n", state); + sv->pending = false; + domain_crash(sv->vcpu->domain); + return false; /* bail */ + } + + break; + } + + p = &sv->vcpu->arch.hvm.hvm_io.io_req; + if ( hvm_ioreq_needs_completion(p) ) + p->data = data; + + sv->pending = false; + + return true; +} + +bool handle_hvm_io_completion(struct vcpu *v) +{ + struct domain *d = v->domain; + struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io; + struct hvm_ioreq_server *s; + struct hvm_ioreq_vcpu *sv; + enum hvm_io_completion io_completion; + + if ( has_vpci(d) && vpci_process_pending(v) ) + { + raise_softirq(SCHEDULE_SOFTIRQ); + return false; + } + + sv = get_pending_vcpu(v, &s); + if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) ) + return false; + + vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ? + STATE_IORESP_READY : STATE_IOREQ_NONE; + + msix_write_completion(v); + vcpu_end_shutdown_deferral(v); + + io_completion = vio->io_completion; + vio->io_completion = HVMIO_no_completion; + + switch ( io_completion ) + { + case HVMIO_no_completion: + break; + + case HVMIO_mmio_completion: + return handle_mmio(); + + case HVMIO_pio_completion: + return handle_pio(vio->io_req.addr, vio->io_req.size, + vio->io_req.dir); + + default: + return arch_handle_hvm_io_completion(io_completion); + } + + return true; +} + +static gfn_t hvm_alloc_legacy_ioreq_gfn(struct hvm_ioreq_server *s) +{ + struct domain *d = s->target; + unsigned int i; + + BUILD_BUG_ON(HVM_PARAM_BUFIOREQ_PFN != HVM_PARAM_IOREQ_PFN + 1); + + for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ ) + { + if ( !test_and_clear_bit(i, &d->arch.hvm.ioreq_gfn.legacy_mask) ) + return _gfn(d->arch.hvm.params[i]); + } + + return INVALID_GFN; +} + +static gfn_t hvm_alloc_ioreq_gfn(struct hvm_ioreq_server *s) +{ + struct domain *d = s->target; + unsigned int i; + + for ( i = 0; i < sizeof(d->arch.hvm.ioreq_gfn.mask) * 8; i++ ) + { + if ( test_and_clear_bit(i, &d->arch.hvm.ioreq_gfn.mask) ) + return _gfn(d->arch.hvm.ioreq_gfn.base + i); + } + + /* + * If we are out of 'normal' GFNs then we may still have a 'legacy' + * GFN available. + */ + return hvm_alloc_legacy_ioreq_gfn(s); +} + +static bool hvm_free_legacy_ioreq_gfn(struct hvm_ioreq_server *s, + gfn_t gfn) +{ + struct domain *d = s->target; + unsigned int i; + + for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ ) + { + if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) ) + break; + } + if ( i > HVM_PARAM_BUFIOREQ_PFN ) + return false; + + set_bit(i, &d->arch.hvm.ioreq_gfn.legacy_mask); + return true; +} + +static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s, gfn_t gfn) +{ + struct domain *d = s->target; + unsigned int i = gfn_x(gfn) - d->arch.hvm.ioreq_gfn.base; + + ASSERT(!gfn_eq(gfn, INVALID_GFN)); + + if ( !hvm_free_legacy_ioreq_gfn(s, gfn) ) + { + ASSERT(i < sizeof(d->arch.hvm.ioreq_gfn.mask) * 8); + set_bit(i, &d->arch.hvm.ioreq_gfn.mask); + } +} + +static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf) +{ + struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; + + if ( gfn_eq(iorp->gfn, INVALID_GFN) ) + return; + + destroy_ring_for_helper(&iorp->va, iorp->page); + iorp->page = NULL; + + hvm_free_ioreq_gfn(s, iorp->gfn); + iorp->gfn = INVALID_GFN; +} + +static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf) +{ + struct domain *d = s->target; + struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; + int rc; + + if ( iorp->page ) + { + /* + * If a page has already been allocated (which will happen on + * demand if hvm_get_ioreq_server_frame() is called), then + * mapping a guest frame is not permitted. + */ + if ( gfn_eq(iorp->gfn, INVALID_GFN) ) + return -EPERM; + + return 0; + } + + if ( d->is_dying ) + return -EINVAL; + + iorp->gfn = hvm_alloc_ioreq_gfn(s); + + if ( gfn_eq(iorp->gfn, INVALID_GFN) ) + return -ENOMEM; + + rc = prepare_ring_for_helper(d, gfn_x(iorp->gfn), &iorp->page, + &iorp->va); + + if ( rc ) + hvm_unmap_ioreq_gfn(s, buf); + + return rc; +} + +static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf) +{ + struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; + struct page_info *page; + + if ( iorp->page ) + { + /* + * If a guest frame has already been mapped (which may happen + * on demand if hvm_get_ioreq_server_info() is called), then + * allocating a page is not permitted. + */ + if ( !gfn_eq(iorp->gfn, INVALID_GFN) ) + return -EPERM; + + return 0; + } + + page = alloc_domheap_page(s->target, MEMF_no_refcount); + + if ( !page ) + return -ENOMEM; + + if ( !get_page_and_type(page, s->target, PGT_writable_page) ) + { + /* + * The domain can't possibly know about this page yet, so failure + * here is a clear indication of something fishy going on. + */ + domain_crash(s->emulator); + return -ENODATA; + } + + iorp->va = __map_domain_page_global(page); + if ( !iorp->va ) + goto fail; + + iorp->page = page; + clear_page(iorp->va); + return 0; + + fail: + put_page_alloc_ref(page); + put_page_and_type(page); + + return -ENOMEM; +} + +static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf) +{ + struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; + struct page_info *page = iorp->page; + + if ( !page ) + return; + + iorp->page = NULL; + + unmap_domain_page_global(iorp->va); + iorp->va = NULL; + + put_page_alloc_ref(page); + put_page_and_type(page); +} + +bool is_ioreq_server_page(struct domain *d, const struct page_info *page) +{ + const struct hvm_ioreq_server *s; + unsigned int id; + bool found = false; + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + FOR_EACH_IOREQ_SERVER(d, id, s) + { + if ( (s->ioreq.page == page) || (s->bufioreq.page == page) ) + { + found = true; + break; + } + } + + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + + return found; +} + +static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf) + +{ + struct domain *d = s->target; + struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; + + if ( gfn_eq(iorp->gfn, INVALID_GFN) ) + return; + + if ( guest_physmap_remove_page(d, iorp->gfn, + page_to_mfn(iorp->page), 0) ) + domain_crash(d); + clear_page(iorp->va); +} + +static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf) +{ + struct domain *d = s->target; + struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; + int rc; + + if ( gfn_eq(iorp->gfn, INVALID_GFN) ) + return 0; + + clear_page(iorp->va); + + rc = guest_physmap_add_page(d, iorp->gfn, + page_to_mfn(iorp->page), 0); + if ( rc == 0 ) + paging_mark_pfn_dirty(d, _pfn(gfn_x(iorp->gfn))); + + return rc; +} + +static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s, + struct hvm_ioreq_vcpu *sv) +{ + ASSERT(spin_is_locked(&s->lock)); + + if ( s->ioreq.va != NULL ) + { + ioreq_t *p = get_ioreq(s, sv->vcpu); + + p->vp_eport = sv->ioreq_evtchn; + } +} + +#define HANDLE_BUFIOREQ(s) \ + ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) + +static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s, + struct vcpu *v) +{ + struct hvm_ioreq_vcpu *sv; + int rc; + + sv = xzalloc(struct hvm_ioreq_vcpu); + + rc = -ENOMEM; + if ( !sv ) + goto fail1; + + spin_lock(&s->lock); + + rc = alloc_unbound_xen_event_channel(v->domain, v->vcpu_id, + s->emulator->domain_id, NULL); + if ( rc < 0 ) + goto fail2; + + sv->ioreq_evtchn = rc; + + if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) ) + { + rc = alloc_unbound_xen_event_channel(v->domain, 0, + s->emulator->domain_id, NULL); + if ( rc < 0 ) + goto fail3; + + s->bufioreq_evtchn = rc; + } + + sv->vcpu = v; + + list_add(&sv->list_entry, &s->ioreq_vcpu_list); + + if ( s->enabled ) + hvm_update_ioreq_evtchn(s, sv); + + spin_unlock(&s->lock); + return 0; + + fail3: + free_xen_event_channel(v->domain, sv->ioreq_evtchn); + + fail2: + spin_unlock(&s->lock); + xfree(sv); + + fail1: + return rc; +} + +static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, + struct vcpu *v) +{ + struct hvm_ioreq_vcpu *sv; + + spin_lock(&s->lock); + + list_for_each_entry ( sv, + &s->ioreq_vcpu_list, + list_entry ) + { + if ( sv->vcpu != v ) + continue; + + list_del(&sv->list_entry); + + if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) ) + free_xen_event_channel(v->domain, s->bufioreq_evtchn); + + free_xen_event_channel(v->domain, sv->ioreq_evtchn); + + xfree(sv); + break; + } + + spin_unlock(&s->lock); +} + +static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s) +{ + struct hvm_ioreq_vcpu *sv, *next; + + spin_lock(&s->lock); + + list_for_each_entry_safe ( sv, + next, + &s->ioreq_vcpu_list, + list_entry ) + { + struct vcpu *v = sv->vcpu; + + list_del(&sv->list_entry); + + if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) ) + free_xen_event_channel(v->domain, s->bufioreq_evtchn); + + free_xen_event_channel(v->domain, sv->ioreq_evtchn); + + xfree(sv); + } + + spin_unlock(&s->lock); +} + +static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s) +{ + int rc; + + rc = hvm_map_ioreq_gfn(s, false); + + if ( !rc && HANDLE_BUFIOREQ(s) ) + rc = hvm_map_ioreq_gfn(s, true); + + if ( rc ) + hvm_unmap_ioreq_gfn(s, false); + + return rc; +} + +static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s) +{ + hvm_unmap_ioreq_gfn(s, true); + hvm_unmap_ioreq_gfn(s, false); +} + +static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s) +{ + int rc; + + rc = hvm_alloc_ioreq_mfn(s, false); + + if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) ) + rc = hvm_alloc_ioreq_mfn(s, true); + + if ( rc ) + hvm_free_ioreq_mfn(s, false); + + return rc; +} + +static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s) +{ + hvm_free_ioreq_mfn(s, true); + hvm_free_ioreq_mfn(s, false); +} + +static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s) +{ + unsigned int i; + + for ( i = 0; i < NR_IO_RANGE_TYPES; i++ ) + rangeset_destroy(s->range[i]); +} + +static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, + ioservid_t id) +{ + unsigned int i; + int rc; + + for ( i = 0; i < NR_IO_RANGE_TYPES; i++ ) + { + char *name; + + rc = asprintf(&name, "ioreq_server %d %s", id, + (i == XEN_DMOP_IO_RANGE_PORT) ? "port" : + (i == XEN_DMOP_IO_RANGE_MEMORY) ? "memory" : + (i == XEN_DMOP_IO_RANGE_PCI) ? "pci" : + ""); + if ( rc ) + goto fail; + + s->range[i] = rangeset_new(s->target, name, + RANGESETF_prettyprint_hex); + + xfree(name); + + rc = -ENOMEM; + if ( !s->range[i] ) + goto fail; + + rangeset_limit(s->range[i], MAX_NR_IO_RANGES); + } + + return 0; + + fail: + hvm_ioreq_server_free_rangesets(s); + + return rc; +} + +static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s) +{ + struct hvm_ioreq_vcpu *sv; + + spin_lock(&s->lock); + + if ( s->enabled ) + goto done; + + hvm_remove_ioreq_gfn(s, false); + hvm_remove_ioreq_gfn(s, true); + + s->enabled = true; + + list_for_each_entry ( sv, + &s->ioreq_vcpu_list, + list_entry ) + hvm_update_ioreq_evtchn(s, sv); + + done: + spin_unlock(&s->lock); +} + +static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s) +{ + spin_lock(&s->lock); + + if ( !s->enabled ) + goto done; + + hvm_add_ioreq_gfn(s, true); + hvm_add_ioreq_gfn(s, false); + + s->enabled = false; + + done: + spin_unlock(&s->lock); +} + +static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, + struct domain *d, int bufioreq_handling, + ioservid_t id) +{ + struct domain *currd = current->domain; + struct vcpu *v; + int rc; + + s->target = d; + + get_knownalive_domain(currd); + s->emulator = currd; + + spin_lock_init(&s->lock); + INIT_LIST_HEAD(&s->ioreq_vcpu_list); + spin_lock_init(&s->bufioreq_lock); + + s->ioreq.gfn = INVALID_GFN; + s->bufioreq.gfn = INVALID_GFN; + + rc = hvm_ioreq_server_alloc_rangesets(s, id); + if ( rc ) + return rc; + + s->bufioreq_handling = bufioreq_handling; + + for_each_vcpu ( d, v ) + { + rc = hvm_ioreq_server_add_vcpu(s, v); + if ( rc ) + goto fail_add; + } + + return 0; + + fail_add: + hvm_ioreq_server_remove_all_vcpus(s); + hvm_ioreq_server_unmap_pages(s); + + hvm_ioreq_server_free_rangesets(s); + + put_domain(s->emulator); + return rc; +} + +static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s) +{ + ASSERT(!s->enabled); + hvm_ioreq_server_remove_all_vcpus(s); + + /* + * NOTE: It is safe to call both hvm_ioreq_server_unmap_pages() and + * hvm_ioreq_server_free_pages() in that order. + * This is because the former will do nothing if the pages + * are not mapped, leaving the page to be freed by the latter. + * However if the pages are mapped then the former will set + * the page_info pointer to NULL, meaning the latter will do + * nothing. + */ + hvm_ioreq_server_unmap_pages(s); + hvm_ioreq_server_free_pages(s); + + hvm_ioreq_server_free_rangesets(s); + + put_domain(s->emulator); +} + +int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, + ioservid_t *id) +{ + struct hvm_ioreq_server *s; + unsigned int i; + int rc; + + if ( bufioreq_handling > HVM_IOREQSRV_BUFIOREQ_ATOMIC ) + return -EINVAL; + + s = xzalloc(struct hvm_ioreq_server); + if ( !s ) + return -ENOMEM; + + domain_pause(d); + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + for ( i = 0; i < MAX_NR_IOREQ_SERVERS; i++ ) + { + if ( !GET_IOREQ_SERVER(d, i) ) + break; + } + + rc = -ENOSPC; + if ( i >= MAX_NR_IOREQ_SERVERS ) + goto fail; + + /* + * It is safe to call set_ioreq_server() prior to + * hvm_ioreq_server_init() since the target domain is paused. + */ + set_ioreq_server(d, i, s); + + rc = hvm_ioreq_server_init(s, d, bufioreq_handling, i); + if ( rc ) + { + set_ioreq_server(d, i, NULL); + goto fail; + } + + if ( id ) + *id = i; + + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + domain_unpause(d); + + return 0; + + fail: + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + domain_unpause(d); + + xfree(s); + return rc; +} + +int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id) +{ + struct hvm_ioreq_server *s; + int rc; + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + s = get_ioreq_server(d, id); + + rc = -ENOENT; + if ( !s ) + goto out; + + rc = -EPERM; + if ( s->emulator != current->domain ) + goto out; + + domain_pause(d); + + arch_hvm_destroy_ioreq_server(s); + + hvm_ioreq_server_disable(s); + + /* + * It is safe to call hvm_ioreq_server_deinit() prior to + * set_ioreq_server() since the target domain is paused. + */ + hvm_ioreq_server_deinit(s); + set_ioreq_server(d, id, NULL); + + domain_unpause(d); + + xfree(s); + + rc = 0; + + out: + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + + return rc; +} + +int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id, + unsigned long *ioreq_gfn, + unsigned long *bufioreq_gfn, + evtchn_port_t *bufioreq_port) +{ + struct hvm_ioreq_server *s; + int rc; + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + s = get_ioreq_server(d, id); + + rc = -ENOENT; + if ( !s ) + goto out; + + rc = -EPERM; + if ( s->emulator != current->domain ) + goto out; + + if ( ioreq_gfn || bufioreq_gfn ) + { + rc = hvm_ioreq_server_map_pages(s); + if ( rc ) + goto out; + } + + if ( ioreq_gfn ) + *ioreq_gfn = gfn_x(s->ioreq.gfn); + + if ( HANDLE_BUFIOREQ(s) ) + { + if ( bufioreq_gfn ) + *bufioreq_gfn = gfn_x(s->bufioreq.gfn); + + if ( bufioreq_port ) + *bufioreq_port = s->bufioreq_evtchn; + } + + rc = 0; + + out: + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + + return rc; +} + +int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id, + unsigned long idx, mfn_t *mfn) +{ + struct hvm_ioreq_server *s; + int rc; + + ASSERT(is_hvm_domain(d)); + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + s = get_ioreq_server(d, id); + + rc = -ENOENT; + if ( !s ) + goto out; + + rc = -EPERM; + if ( s->emulator != current->domain ) + goto out; + + rc = hvm_ioreq_server_alloc_pages(s); + if ( rc ) + goto out; + + switch ( idx ) + { + case XENMEM_resource_ioreq_server_frame_bufioreq: + rc = -ENOENT; + if ( !HANDLE_BUFIOREQ(s) ) + goto out; + + *mfn = page_to_mfn(s->bufioreq.page); + rc = 0; + break; + + case XENMEM_resource_ioreq_server_frame_ioreq(0): + *mfn = page_to_mfn(s->ioreq.page); + rc = 0; + break; + + default: + rc = -EINVAL; + break; + } + + out: + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + + return rc; +} + +int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, + uint32_t type, uint64_t start, + uint64_t end) +{ + struct hvm_ioreq_server *s; + struct rangeset *r; + int rc; + + if ( start > end ) + return -EINVAL; + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + s = get_ioreq_server(d, id); + + rc = -ENOENT; + if ( !s ) + goto out; + + rc = -EPERM; + if ( s->emulator != current->domain ) + goto out; + + switch ( type ) + { + case XEN_DMOP_IO_RANGE_PORT: + case XEN_DMOP_IO_RANGE_MEMORY: + case XEN_DMOP_IO_RANGE_PCI: + r = s->range[type]; + break; + + default: + r = NULL; + break; + } + + rc = -EINVAL; + if ( !r ) + goto out; + + rc = -EEXIST; + if ( rangeset_overlaps_range(r, start, end) ) + goto out; + + rc = rangeset_add_range(r, start, end); + + out: + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + + return rc; +} + +int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, + uint32_t type, uint64_t start, + uint64_t end) +{ + struct hvm_ioreq_server *s; + struct rangeset *r; + int rc; + + if ( start > end ) + return -EINVAL; + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + s = get_ioreq_server(d, id); + + rc = -ENOENT; + if ( !s ) + goto out; + + rc = -EPERM; + if ( s->emulator != current->domain ) + goto out; + + switch ( type ) + { + case XEN_DMOP_IO_RANGE_PORT: + case XEN_DMOP_IO_RANGE_MEMORY: + case XEN_DMOP_IO_RANGE_PCI: + r = s->range[type]; + break; + + default: + r = NULL; + break; + } + + rc = -EINVAL; + if ( !r ) + goto out; + + rc = -ENOENT; + if ( !rangeset_contains_range(r, start, end) ) + goto out; + + rc = rangeset_remove_range(r, start, end); + + out: + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + + return rc; +} + +int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, + bool enabled) +{ + struct hvm_ioreq_server *s; + int rc; + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + s = get_ioreq_server(d, id); + + rc = -ENOENT; + if ( !s ) + goto out; + + rc = -EPERM; + if ( s->emulator != current->domain ) + goto out; + + domain_pause(d); + + if ( enabled ) + hvm_ioreq_server_enable(s); + else + hvm_ioreq_server_disable(s); + + domain_unpause(d); + + rc = 0; + + out: + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + return rc; +} + +int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v) +{ + struct hvm_ioreq_server *s; + unsigned int id; + int rc; + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + FOR_EACH_IOREQ_SERVER(d, id, s) + { + rc = hvm_ioreq_server_add_vcpu(s, v); + if ( rc ) + goto fail; + } + + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + + return 0; + + fail: + while ( ++id != MAX_NR_IOREQ_SERVERS ) + { + s = GET_IOREQ_SERVER(d, id); + + if ( !s ) + continue; + + hvm_ioreq_server_remove_vcpu(s, v); + } + + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + + return rc; +} + +void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v) +{ + struct hvm_ioreq_server *s; + unsigned int id; + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + FOR_EACH_IOREQ_SERVER(d, id, s) + hvm_ioreq_server_remove_vcpu(s, v); + + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); +} + +void hvm_destroy_all_ioreq_servers(struct domain *d) +{ + struct hvm_ioreq_server *s; + unsigned int id; + + arch_hvm_ioreq_destroy(d); + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + /* No need to domain_pause() as the domain is being torn down */ + + FOR_EACH_IOREQ_SERVER(d, id, s) + { + hvm_ioreq_server_disable(s); + + /* + * It is safe to call hvm_ioreq_server_deinit() prior to + * set_ioreq_server() since the target domain is being destroyed. + */ + hvm_ioreq_server_deinit(s); + set_ioreq_server(d, id, NULL); + + xfree(s); + } + + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); +} + +struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, + ioreq_t *p) +{ + struct hvm_ioreq_server *s; + uint8_t type; + uint64_t addr; + unsigned int id; + + if ( hvm_get_ioreq_server_range_type(d, p, &type, &addr) ) + return NULL; + + FOR_EACH_IOREQ_SERVER(d, id, s) + { + struct rangeset *r; + + if ( !s->enabled ) + continue; + + r = s->range[type]; + + switch ( type ) + { + unsigned long start, end; + + case XEN_DMOP_IO_RANGE_PORT: + start = addr; + end = start + p->size - 1; + if ( rangeset_contains_range(r, start, end) ) + return s; + + break; + + case XEN_DMOP_IO_RANGE_MEMORY: + start = hvm_mmio_first_byte(p); + end = hvm_mmio_last_byte(p); + + if ( rangeset_contains_range(r, start, end) ) + return s; + + break; + + case XEN_DMOP_IO_RANGE_PCI: + if ( rangeset_contains_singleton(r, addr >> 32) ) + { + p->type = IOREQ_TYPE_PCI_CONFIG; + p->addr = addr; + return s; + } + + break; + } + } + + return NULL; +} + +static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p) +{ + struct domain *d = current->domain; + struct hvm_ioreq_page *iorp; + buffered_iopage_t *pg; + buf_ioreq_t bp = { .data = p->data, + .addr = p->addr, + .type = p->type, + .dir = p->dir }; + /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */ + int qw = 0; + + /* Ensure buffered_iopage fits in a page */ + BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE); + + iorp = &s->bufioreq; + pg = iorp->va; + + if ( !pg ) + return IOREQ_IO_UNHANDLED; + + /* + * Return 0 for the cases we can't deal with: + * - 'addr' is only a 20-bit field, so we cannot address beyond 1MB + * - we cannot buffer accesses to guest memory buffers, as the guest + * may expect the memory buffer to be synchronously accessed + * - the count field is usually used with data_is_ptr and since we don't + * support data_is_ptr we do not waste space for the count field either + */ + if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) ) + return 0; + + switch ( p->size ) + { + case 1: + bp.size = 0; + break; + case 2: + bp.size = 1; + break; + case 4: + bp.size = 2; + break; + case 8: + bp.size = 3; + qw = 1; + break; + default: + gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size); + return IOREQ_IO_UNHANDLED; + } + + spin_lock(&s->bufioreq_lock); + + if ( (pg->ptrs.write_pointer - pg->ptrs.read_pointer) >= + (IOREQ_BUFFER_SLOT_NUM - qw) ) + { + /* The queue is full: send the iopacket through the normal path. */ + spin_unlock(&s->bufioreq_lock); + return IOREQ_IO_UNHANDLED; + } + + pg->buf_ioreq[pg->ptrs.write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp; + + if ( qw ) + { + bp.data = p->data >> 32; + pg->buf_ioreq[(pg->ptrs.write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp; + } + + /* Make the ioreq_t visible /before/ write_pointer. */ + smp_wmb(); + pg->ptrs.write_pointer += qw ? 2 : 1; + + /* Canonicalize read/write pointers to prevent their overflow. */ + while ( (s->bufioreq_handling == HVM_IOREQSRV_BUFIOREQ_ATOMIC) && + qw++ < IOREQ_BUFFER_SLOT_NUM && + pg->ptrs.read_pointer >= IOREQ_BUFFER_SLOT_NUM ) + { + union bufioreq_pointers old = pg->ptrs, new; + unsigned int n = old.read_pointer / IOREQ_BUFFER_SLOT_NUM; + + new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM; + new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM; + cmpxchg(&pg->ptrs.full, old.full, new.full); + } + + notify_via_xen_event_channel(d, s->bufioreq_evtchn); + spin_unlock(&s->bufioreq_lock); + + return IOREQ_IO_HANDLED; +} + +int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p, + bool buffered) +{ + struct vcpu *curr = current; + struct domain *d = curr->domain; + struct hvm_ioreq_vcpu *sv; + + ASSERT(s); + + if ( buffered ) + return hvm_send_buffered_ioreq(s, proto_p); + + if ( unlikely(!vcpu_start_shutdown_deferral(curr)) ) + return IOREQ_IO_RETRY; + + list_for_each_entry ( sv, + &s->ioreq_vcpu_list, + list_entry ) + { + if ( sv->vcpu == curr ) + { + evtchn_port_t port = sv->ioreq_evtchn; + ioreq_t *p = get_ioreq(s, curr); + + if ( unlikely(p->state != STATE_IOREQ_NONE) ) + { + gprintk(XENLOG_ERR, "device model set bad IO state %d\n", + p->state); + break; + } + + if ( unlikely(p->vp_eport != port) ) + { + gprintk(XENLOG_ERR, "device model set bad event channel %d\n", + p->vp_eport); + break; + } + + proto_p->state = STATE_IOREQ_NONE; + proto_p->vp_eport = port; + *p = *proto_p; + + prepare_wait_on_xen_event_channel(port); + + /* + * Following happens /after/ blocking and setting up ioreq + * contents. prepare_wait_on_xen_event_channel() is an implicit + * barrier. + */ + p->state = STATE_IOREQ_READY; + notify_via_xen_event_channel(d, port); + + sv->pending = true; + return IOREQ_IO_RETRY; + } + } + + return IOREQ_IO_UNHANDLED; +} + +unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered) +{ + struct domain *d = current->domain; + struct hvm_ioreq_server *s; + unsigned int id, failed = 0; + + FOR_EACH_IOREQ_SERVER(d, id, s) + { + if ( !s->enabled ) + continue; + + if ( hvm_send_ioreq(s, p, buffered) == IOREQ_IO_UNHANDLED ) + failed++; + } + + return failed; +} + +void hvm_ioreq_init(struct domain *d) +{ + spin_lock_init(&d->arch.hvm.ioreq_server.lock); + + arch_hvm_ioreq_init(d); +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/ioreq.h +++ b/xen/include/asm-x86/hvm/ioreq.h @@ -XXX,XX +XXX,XX @@ #ifndef __ASM_X86_HVM_IOREQ_H__ #define __ASM_X86_HVM_IOREQ_H__ -bool hvm_io_pending(struct vcpu *v); -bool handle_hvm_io_completion(struct vcpu *v); -bool is_ioreq_server_page(struct domain *d, const struct page_info *page); +#include <asm/hvm/emulate.h> +#include <asm/hvm/hvm.h> +#include <asm/hvm/vmx/vmx.h> -int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, - ioservid_t *id); -int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id); -int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id, - unsigned long *ioreq_gfn, - unsigned long *bufioreq_gfn, - evtchn_port_t *bufioreq_port); -int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id, - unsigned long idx, mfn_t *mfn); -int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, - uint32_t type, uint64_t start, - uint64_t end); -int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, - uint32_t type, uint64_t start, - uint64_t end); int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, uint32_t type, uint32_t flags); -int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, - bool enabled); - -int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v); -void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v); -void hvm_destroy_all_ioreq_servers(struct domain *d); - -struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, - ioreq_t *p); -int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p, - bool buffered); -unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered); - -void hvm_ioreq_init(struct domain *d); int arch_hvm_destroy_ioreq_server(struct hvm_ioreq_server *s); diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/xen/include/xen/ioreq.h @@ -XXX,XX +XXX,XX @@ +/* + * ioreq.h: Hardware virtual machine assist interface definitions. + * + * Copyright (c) 2016 Citrix Systems Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef __IOREQ_H__ +#define __IOREQ_H__ + +#include <xen/sched.h> + +#include <asm/hvm/ioreq.h> + +#define GET_IOREQ_SERVER(d, id) \ + (d)->arch.hvm.ioreq_server.server[id] + +static inline struct hvm_ioreq_server *get_ioreq_server(const struct domain *d, + unsigned int id) +{ + if ( id >= MAX_NR_IOREQ_SERVERS ) + return NULL; + + return GET_IOREQ_SERVER(d, id); +} + +bool hvm_io_pending(struct vcpu *v); +bool handle_hvm_io_completion(struct vcpu *v); +bool is_ioreq_server_page(struct domain *d, const struct page_info *page); + +int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, + ioservid_t *id); +int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id); +int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id, + unsigned long *ioreq_gfn, + unsigned long *bufioreq_gfn, + evtchn_port_t *bufioreq_port); +int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id, + unsigned long idx, mfn_t *mfn); +int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, + uint32_t type, uint64_t start, + uint64_t end); +int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, + uint32_t type, uint64_t start, + uint64_t end); +int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, + bool enabled); + +int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v); +void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v); +void hvm_destroy_all_ioreq_servers(struct domain *d); + +struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, + ioreq_t *p); +int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p, + bool buffered); +unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered); + +void hvm_ioreq_init(struct domain *d); + +#endif /* __IOREQ_H__ */ + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> The IOREQ is a common feature now and this helper will be used on Arm as is. Move it to include/xen/ioreq.h Although PIO handling on Arm is not introduced with the current series (it will be implemented when we add support for vPCI), technically the PIOs exist on Arm (however they are accessed the same way as MMIO) and it would be better not to diverge now. Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - new patch, was split from: "[RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common" --- --- xen/arch/x86/hvm/vmx/realmode.c | 1 + xen/include/asm-x86/hvm/vcpu.h | 7 ------- xen/include/xen/ioreq.h | 7 +++++++ 3 files changed, 8 insertions(+), 7 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/vmx/realmode.c +++ b/xen/arch/x86/hvm/vmx/realmode.c @@ -XXX,XX +XXX,XX @@ */ #include <xen/init.h> +#include <xen/ioreq.h> #include <xen/lib.h> #include <xen/sched.h> #include <xen/paging.h> diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/vcpu.h +++ b/xen/include/asm-x86/hvm/vcpu.h @@ -XXX,XX +XXX,XX @@ struct hvm_vcpu_io { const struct g2m_ioport *g2m_ioport; }; -static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq) -{ - return ioreq->state == STATE_IOREQ_READY && - !ioreq->data_is_ptr && - (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE); -} - struct nestedvcpu { bool_t nv_guestmode; /* vcpu in guestmode? */ void *nv_vvmcx; /* l1 guest virtual VMCB/VMCS */ diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/ioreq.h +++ b/xen/include/xen/ioreq.h @@ -XXX,XX +XXX,XX @@ static inline struct hvm_ioreq_server *get_ioreq_server(const struct domain *d, return GET_IOREQ_SERVER(d, id); } +static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq) +{ + return ioreq->state == STATE_IOREQ_READY && + !ioreq->data_is_ptr && + (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE); +} + bool hvm_io_pending(struct vcpu *v); bool handle_hvm_io_completion(struct vcpu *v); bool is_ioreq_server_page(struct domain *d, const struct page_info *page); -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> The IOREQ is a common feature now and Arm will have its own implementation. But the name of the function is pretty generic and can be confusing on Arm (we already have a try_handle_mmio()). In order not to rename the function (which is used for a varying set of purposes on x86) globally and get non-confusing variant on Arm provide an alias ioreq_handle_complete_mmio() to be used on common and Arm code. Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - new patch --- --- xen/common/ioreq.c | 2 +- xen/include/asm-x86/hvm/ioreq.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/ioreq.c +++ b/xen/common/ioreq.c @@ -XXX,XX +XXX,XX @@ bool handle_hvm_io_completion(struct vcpu *v) break; case HVMIO_mmio_completion: - return handle_mmio(); + return ioreq_handle_complete_mmio(); case HVMIO_pio_completion: return handle_pio(vio->io_req.addr, vio->io_req.size, diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/ioreq.h +++ b/xen/include/asm-x86/hvm/ioreq.h @@ -XXX,XX +XXX,XX @@ void arch_hvm_ioreq_destroy(struct domain *d); #define IOREQ_IO_UNHANDLED X86EMUL_UNHANDLEABLE #define IOREQ_IO_RETRY X86EMUL_RETRY +#define ioreq_handle_complete_mmio handle_mmio + #endif /* __ASM_X86_HVM_IOREQ_H__ */ /* -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> The IOREQ is a common feature now and these helpers will be used on Arm as is. Move them to include/xen/ioreq.h Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - new patch --- --- xen/arch/x86/hvm/intercept.c | 1 + xen/include/asm-x86/hvm/io.h | 16 ---------------- xen/include/xen/ioreq.h | 16 ++++++++++++++++ 3 files changed, 17 insertions(+), 16 deletions(-) diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/intercept.c +++ b/xen/arch/x86/hvm/intercept.c @@ -XXX,XX +XXX,XX @@ * this program; If not, see <http://www.gnu.org/licenses/>. */ +#include <xen/ioreq.h> #include <xen/types.h> #include <xen/sched.h> #include <asm/regs.h> diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/io.h +++ b/xen/include/asm-x86/hvm/io.h @@ -XXX,XX +XXX,XX @@ struct hvm_mmio_ops { hvm_mmio_write_t write; }; -static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p) -{ - return unlikely(p->df) ? - p->addr - (p->count - 1ul) * p->size : - p->addr; -} - -static inline paddr_t hvm_mmio_last_byte(const ioreq_t *p) -{ - unsigned long size = p->size; - - return unlikely(p->df) ? - p->addr + size - 1: - p->addr + (p->count * size) - 1; -} - typedef int (*portio_action_t)( int dir, unsigned int port, unsigned int bytes, uint32_t *val); diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/ioreq.h +++ b/xen/include/xen/ioreq.h @@ -XXX,XX +XXX,XX @@ static inline struct hvm_ioreq_server *get_ioreq_server(const struct domain *d, return GET_IOREQ_SERVER(d, id); } +static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p) +{ + return unlikely(p->df) ? + p->addr - (p->count - 1ul) * p->size : + p->addr; +} + +static inline paddr_t hvm_mmio_last_byte(const ioreq_t *p) +{ + unsigned long size = p->size; + + return unlikely(p->df) ? + p->addr + size - 1: + p->addr + (p->count * size) - 1; +} + static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq) { return ioreq->state == STATE_IOREQ_READY && -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> The IOREQ is a common feature now and these structs will be used on Arm as is. Move them to xen/ioreq.h Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - new patch --- --- xen/include/asm-x86/hvm/domain.h | 34 ---------------------------------- xen/include/xen/ioreq.h | 34 ++++++++++++++++++++++++++++++++++ 2 files changed, 34 insertions(+), 34 deletions(-) diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/domain.h +++ b/xen/include/asm-x86/hvm/domain.h @@ -XXX,XX +XXX,XX @@ #include <public/hvm/dm_op.h> -struct hvm_ioreq_page { - gfn_t gfn; - struct page_info *page; - void *va; -}; - -struct hvm_ioreq_vcpu { - struct list_head list_entry; - struct vcpu *vcpu; - evtchn_port_t ioreq_evtchn; - bool pending; -}; - -#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1) -#define MAX_NR_IO_RANGES 256 - -struct hvm_ioreq_server { - struct domain *target, *emulator; - - /* Lock to serialize toolstack modifications */ - spinlock_t lock; - - struct hvm_ioreq_page ioreq; - struct list_head ioreq_vcpu_list; - struct hvm_ioreq_page bufioreq; - - /* Lock to serialize access to buffered ioreq ring */ - spinlock_t bufioreq_lock; - evtchn_port_t bufioreq_evtchn; - struct rangeset *range[NR_IO_RANGE_TYPES]; - bool enabled; - uint8_t bufioreq_handling; -}; - #ifdef CONFIG_MEM_SHARING struct mem_sharing_domain { diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/ioreq.h +++ b/xen/include/xen/ioreq.h @@ -XXX,XX +XXX,XX @@ #include <asm/hvm/ioreq.h> +struct hvm_ioreq_page { + gfn_t gfn; + struct page_info *page; + void *va; +}; + +struct hvm_ioreq_vcpu { + struct list_head list_entry; + struct vcpu *vcpu; + evtchn_port_t ioreq_evtchn; + bool pending; +}; + +#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1) +#define MAX_NR_IO_RANGES 256 + +struct hvm_ioreq_server { + struct domain *target, *emulator; + + /* Lock to serialize toolstack modifications */ + spinlock_t lock; + + struct hvm_ioreq_page ioreq; + struct list_head ioreq_vcpu_list; + struct hvm_ioreq_page bufioreq; + + /* Lock to serialize access to buffered ioreq ring */ + spinlock_t bufioreq_lock; + evtchn_port_t bufioreq_evtchn; + struct rangeset *range[NR_IO_RANGE_TYPES]; + bool enabled; + uint8_t bufioreq_handling; +}; + #define GET_IOREQ_SERVER(d, id) \ (d)->arch.hvm.ioreq_server.server[id] -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> As a lot of x86 code can be re-used on Arm later on, this patch splits devicemodel support into common and arch specific parts. Also update XSM code a bit to let DM op be used on Arm. This support is going to be used on Arm to be able run device emulator outside of Xen hypervisor. Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - update XSM, related changes were pulled from: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features --- --- xen/arch/x86/hvm/dm.c | 287 +++----------------------------------------- xen/common/Makefile | 1 + xen/common/dm.c | 287 ++++++++++++++++++++++++++++++++++++++++++++ xen/include/xen/hypercall.h | 12 ++ xen/include/xsm/dummy.h | 4 +- xen/include/xsm/xsm.h | 6 +- xen/xsm/dummy.c | 2 +- xen/xsm/flask/hooks.c | 5 +- 8 files changed, 327 insertions(+), 277 deletions(-) create mode 100644 xen/common/dm.c diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/dm.c +++ b/xen/arch/x86/hvm/dm.c @@ -XXX,XX +XXX,XX @@ #include <public/hvm/hvm_op.h> -struct dmop_args { - domid_t domid; - unsigned int nr_bufs; - /* Reserve enough buf elements for all current hypercalls. */ - struct xen_dm_op_buf buf[2]; -}; - static bool _raw_copy_from_guest_buf_offset(void *dst, const struct dmop_args *args, unsigned int buf_idx, @@ -XXX,XX +XXX,XX @@ static int inject_event(struct domain *d, return 0; } -static int dm_op(const struct dmop_args *op_args) +int arch_dm_op(struct xen_dm_op *op, struct domain *d, + const struct dmop_args *op_args, bool *const_op) { - struct domain *d; - struct xen_dm_op op; - bool const_op = true; long rc; - size_t offset; - - static const uint8_t op_size[] = { - [XEN_DMOP_create_ioreq_server] = sizeof(struct xen_dm_op_create_ioreq_server), - [XEN_DMOP_get_ioreq_server_info] = sizeof(struct xen_dm_op_get_ioreq_server_info), - [XEN_DMOP_map_io_range_to_ioreq_server] = sizeof(struct xen_dm_op_ioreq_server_range), - [XEN_DMOP_unmap_io_range_from_ioreq_server] = sizeof(struct xen_dm_op_ioreq_server_range), - [XEN_DMOP_set_ioreq_server_state] = sizeof(struct xen_dm_op_set_ioreq_server_state), - [XEN_DMOP_destroy_ioreq_server] = sizeof(struct xen_dm_op_destroy_ioreq_server), - [XEN_DMOP_track_dirty_vram] = sizeof(struct xen_dm_op_track_dirty_vram), - [XEN_DMOP_set_pci_intx_level] = sizeof(struct xen_dm_op_set_pci_intx_level), - [XEN_DMOP_set_isa_irq_level] = sizeof(struct xen_dm_op_set_isa_irq_level), - [XEN_DMOP_set_pci_link_route] = sizeof(struct xen_dm_op_set_pci_link_route), - [XEN_DMOP_modified_memory] = sizeof(struct xen_dm_op_modified_memory), - [XEN_DMOP_set_mem_type] = sizeof(struct xen_dm_op_set_mem_type), - [XEN_DMOP_inject_event] = sizeof(struct xen_dm_op_inject_event), - [XEN_DMOP_inject_msi] = sizeof(struct xen_dm_op_inject_msi), - [XEN_DMOP_map_mem_type_to_ioreq_server] = sizeof(struct xen_dm_op_map_mem_type_to_ioreq_server), - [XEN_DMOP_remote_shutdown] = sizeof(struct xen_dm_op_remote_shutdown), - [XEN_DMOP_relocate_memory] = sizeof(struct xen_dm_op_relocate_memory), - [XEN_DMOP_pin_memory_cacheattr] = sizeof(struct xen_dm_op_pin_memory_cacheattr), - }; - - rc = rcu_lock_remote_domain_by_id(op_args->domid, &d); - if ( rc ) - return rc; - - if ( !is_hvm_domain(d) ) - goto out; - - rc = xsm_dm_op(XSM_DM_PRIV, d); - if ( rc ) - goto out; - - offset = offsetof(struct xen_dm_op, u); - - rc = -EFAULT; - if ( op_args->buf[0].size < offset ) - goto out; - - if ( copy_from_guest_offset((void *)&op, op_args->buf[0].h, 0, offset) ) - goto out; - - if ( op.op >= ARRAY_SIZE(op_size) ) - { - rc = -EOPNOTSUPP; - goto out; - } - - op.op = array_index_nospec(op.op, ARRAY_SIZE(op_size)); - - if ( op_args->buf[0].size < offset + op_size[op.op] ) - goto out; - - if ( copy_from_guest_offset((void *)&op.u, op_args->buf[0].h, offset, - op_size[op.op]) ) - goto out; - - rc = -EINVAL; - if ( op.pad ) - goto out; - - switch ( op.op ) - { - case XEN_DMOP_create_ioreq_server: - { - struct xen_dm_op_create_ioreq_server *data = - &op.u.create_ioreq_server; - - const_op = false; - - rc = -EINVAL; - if ( data->pad[0] || data->pad[1] || data->pad[2] ) - break; - - rc = hvm_create_ioreq_server(d, data->handle_bufioreq, - &data->id); - break; - } - case XEN_DMOP_get_ioreq_server_info: + switch ( op->op ) { - struct xen_dm_op_get_ioreq_server_info *data = - &op.u.get_ioreq_server_info; - const uint16_t valid_flags = XEN_DMOP_no_gfns; - - const_op = false; - - rc = -EINVAL; - if ( data->flags & ~valid_flags ) - break; - - rc = hvm_get_ioreq_server_info(d, data->id, - (data->flags & XEN_DMOP_no_gfns) ? - NULL : &data->ioreq_gfn, - (data->flags & XEN_DMOP_no_gfns) ? - NULL : &data->bufioreq_gfn, - &data->bufioreq_port); - break; - } - - case XEN_DMOP_map_io_range_to_ioreq_server: - { - const struct xen_dm_op_ioreq_server_range *data = - &op.u.map_io_range_to_ioreq_server; - - rc = -EINVAL; - if ( data->pad ) - break; - - rc = hvm_map_io_range_to_ioreq_server(d, data->id, data->type, - data->start, data->end); - break; - } - - case XEN_DMOP_unmap_io_range_from_ioreq_server: - { - const struct xen_dm_op_ioreq_server_range *data = - &op.u.unmap_io_range_from_ioreq_server; - - rc = -EINVAL; - if ( data->pad ) - break; - - rc = hvm_unmap_io_range_from_ioreq_server(d, data->id, data->type, - data->start, data->end); - break; - } - case XEN_DMOP_map_mem_type_to_ioreq_server: { struct xen_dm_op_map_mem_type_to_ioreq_server *data = - &op.u.map_mem_type_to_ioreq_server; + &op->u.map_mem_type_to_ioreq_server; unsigned long first_gfn = data->opaque; - const_op = false; + *const_op = false; rc = -EOPNOTSUPP; if ( !hap_enabled(d) ) @@ -XXX,XX +XXX,XX @@ static int dm_op(const struct dmop_args *op_args) break; } - case XEN_DMOP_set_ioreq_server_state: - { - const struct xen_dm_op_set_ioreq_server_state *data = - &op.u.set_ioreq_server_state; - - rc = -EINVAL; - if ( data->pad ) - break; - - rc = hvm_set_ioreq_server_state(d, data->id, !!data->enabled); - break; - } - - case XEN_DMOP_destroy_ioreq_server: - { - const struct xen_dm_op_destroy_ioreq_server *data = - &op.u.destroy_ioreq_server; - - rc = -EINVAL; - if ( data->pad ) - break; - - rc = hvm_destroy_ioreq_server(d, data->id); - break; - } - case XEN_DMOP_track_dirty_vram: { const struct xen_dm_op_track_dirty_vram *data = - &op.u.track_dirty_vram; + &op->u.track_dirty_vram; rc = -EINVAL; if ( data->pad ) @@ -XXX,XX +XXX,XX @@ static int dm_op(const struct dmop_args *op_args) case XEN_DMOP_set_pci_intx_level: { const struct xen_dm_op_set_pci_intx_level *data = - &op.u.set_pci_intx_level; + &op->u.set_pci_intx_level; rc = set_pci_intx_level(d, data->domain, data->bus, data->device, data->intx, @@ -XXX,XX +XXX,XX @@ static int dm_op(const struct dmop_args *op_args) case XEN_DMOP_set_isa_irq_level: { const struct xen_dm_op_set_isa_irq_level *data = - &op.u.set_isa_irq_level; + &op->u.set_isa_irq_level; rc = set_isa_irq_level(d, data->isa_irq, data->level); break; @@ -XXX,XX +XXX,XX @@ static int dm_op(const struct dmop_args *op_args) case XEN_DMOP_set_pci_link_route: { const struct xen_dm_op_set_pci_link_route *data = - &op.u.set_pci_link_route; + &op->u.set_pci_link_route; rc = hvm_set_pci_link_route(d, data->link, data->isa_irq); break; @@ -XXX,XX +XXX,XX @@ static int dm_op(const struct dmop_args *op_args) case XEN_DMOP_modified_memory: { struct xen_dm_op_modified_memory *data = - &op.u.modified_memory; + &op->u.modified_memory; rc = modified_memory(d, op_args, data); - const_op = !rc; + *const_op = !rc; break; } case XEN_DMOP_set_mem_type: { struct xen_dm_op_set_mem_type *data = - &op.u.set_mem_type; + &op->u.set_mem_type; - const_op = false; + *const_op = false; rc = -EINVAL; if ( data->pad ) @@ -XXX,XX +XXX,XX @@ static int dm_op(const struct dmop_args *op_args) case XEN_DMOP_inject_event: { const struct xen_dm_op_inject_event *data = - &op.u.inject_event; + &op->u.inject_event; rc = -EINVAL; if ( data->pad0 || data->pad1 ) @@ -XXX,XX +XXX,XX @@ static int dm_op(const struct dmop_args *op_args) case XEN_DMOP_inject_msi: { const struct xen_dm_op_inject_msi *data = - &op.u.inject_msi; + &op->u.inject_msi; rc = -EINVAL; if ( data->pad ) @@ -XXX,XX +XXX,XX @@ static int dm_op(const struct dmop_args *op_args) case XEN_DMOP_remote_shutdown: { const struct xen_dm_op_remote_shutdown *data = - &op.u.remote_shutdown; + &op->u.remote_shutdown; domain_shutdown(d, data->reason); rc = 0; @@ -XXX,XX +XXX,XX @@ static int dm_op(const struct dmop_args *op_args) case XEN_DMOP_relocate_memory: { - struct xen_dm_op_relocate_memory *data = &op.u.relocate_memory; + struct xen_dm_op_relocate_memory *data = &op->u.relocate_memory; struct xen_add_to_physmap xatp = { .domid = op_args->domid, .size = data->size, @@ -XXX,XX +XXX,XX @@ static int dm_op(const struct dmop_args *op_args) data->size -= rc; data->src_gfn += rc; data->dst_gfn += rc; - const_op = false; + *const_op = false; rc = -ERESTART; } break; @@ -XXX,XX +XXX,XX @@ static int dm_op(const struct dmop_args *op_args) case XEN_DMOP_pin_memory_cacheattr: { const struct xen_dm_op_pin_memory_cacheattr *data = - &op.u.pin_memory_cacheattr; + &op->u.pin_memory_cacheattr; if ( data->pad ) { @@ -XXX,XX +XXX,XX @@ static int dm_op(const struct dmop_args *op_args) break; } - if ( (!rc || rc == -ERESTART) && - !const_op && copy_to_guest_offset(op_args->buf[0].h, offset, - (void *)&op.u, op_size[op.op]) ) - rc = -EFAULT; - - out: - rcu_unlock_domain(d); - - return rc; -} - -CHECK_dm_op_create_ioreq_server; -CHECK_dm_op_get_ioreq_server_info; -CHECK_dm_op_ioreq_server_range; -CHECK_dm_op_set_ioreq_server_state; -CHECK_dm_op_destroy_ioreq_server; -CHECK_dm_op_track_dirty_vram; -CHECK_dm_op_set_pci_intx_level; -CHECK_dm_op_set_isa_irq_level; -CHECK_dm_op_set_pci_link_route; -CHECK_dm_op_modified_memory; -CHECK_dm_op_set_mem_type; -CHECK_dm_op_inject_event; -CHECK_dm_op_inject_msi; -CHECK_dm_op_remote_shutdown; -CHECK_dm_op_relocate_memory; -CHECK_dm_op_pin_memory_cacheattr; - -int compat_dm_op(domid_t domid, - unsigned int nr_bufs, - XEN_GUEST_HANDLE_PARAM(void) bufs) -{ - struct dmop_args args; - unsigned int i; - int rc; - - if ( nr_bufs > ARRAY_SIZE(args.buf) ) - return -E2BIG; - - args.domid = domid; - args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1); - - for ( i = 0; i < args.nr_bufs; i++ ) - { - struct compat_dm_op_buf cmp; - - if ( copy_from_guest_offset(&cmp, bufs, i, 1) ) - return -EFAULT; - -#define XLAT_dm_op_buf_HNDL_h(_d_, _s_) \ - guest_from_compat_handle((_d_)->h, (_s_)->h) - - XLAT_dm_op_buf(&args.buf[i], &cmp); - -#undef XLAT_dm_op_buf_HNDL_h - } - - rc = dm_op(&args); - - if ( rc == -ERESTART ) - rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih", - domid, nr_bufs, bufs); - - return rc; -} - -long do_dm_op(domid_t domid, - unsigned int nr_bufs, - XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs) -{ - struct dmop_args args; - int rc; - - if ( nr_bufs > ARRAY_SIZE(args.buf) ) - return -E2BIG; - - args.domid = domid; - args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1); - - if ( copy_from_guest_offset(&args.buf[0], bufs, 0, args.nr_bufs) ) - return -EFAULT; - - rc = dm_op(&args); - - if ( rc == -ERESTART ) - rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih", - domid, nr_bufs, bufs); - return rc; } diff --git a/xen/common/Makefile b/xen/common/Makefile index XXXXXXX..XXXXXXX 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_CORE_PARKING) += core_parking.o obj-y += cpu.o obj-$(CONFIG_DEBUG_TRACE) += debugtrace.o obj-$(CONFIG_HAS_DEVICE_TREE) += device_tree.o +obj-$(CONFIG_IOREQ_SERVER) += dm.o obj-y += domctl.o obj-y += domain.o obj-y += event_2l.o diff --git a/xen/common/dm.c b/xen/common/dm.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/xen/common/dm.c @@ -XXX,XX +XXX,XX @@ +/* + * Copyright (c) 2016 Citrix Systems Inc. + * Copyright (c) 2019 Arm ltd. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; If not, see <http://www.gnu.org/licenses/>. + */ + +#include <xen/guest_access.h> +#include <xen/hypercall.h> +#include <xen/ioreq.h> +#include <xen/nospec.h> + +static int dm_op(const struct dmop_args *op_args) +{ + struct domain *d; + struct xen_dm_op op; + long rc; + bool const_op = true; + const size_t offset = offsetof(struct xen_dm_op, u); + + static const uint8_t op_size[] = { + [XEN_DMOP_create_ioreq_server] = sizeof(struct xen_dm_op_create_ioreq_server), + [XEN_DMOP_get_ioreq_server_info] = sizeof(struct xen_dm_op_get_ioreq_server_info), + [XEN_DMOP_map_io_range_to_ioreq_server] = sizeof(struct xen_dm_op_ioreq_server_range), + [XEN_DMOP_unmap_io_range_from_ioreq_server] = sizeof(struct xen_dm_op_ioreq_server_range), + [XEN_DMOP_set_ioreq_server_state] = sizeof(struct xen_dm_op_set_ioreq_server_state), + [XEN_DMOP_destroy_ioreq_server] = sizeof(struct xen_dm_op_destroy_ioreq_server), + [XEN_DMOP_track_dirty_vram] = sizeof(struct xen_dm_op_track_dirty_vram), + [XEN_DMOP_set_pci_intx_level] = sizeof(struct xen_dm_op_set_pci_intx_level), + [XEN_DMOP_set_isa_irq_level] = sizeof(struct xen_dm_op_set_isa_irq_level), + [XEN_DMOP_set_pci_link_route] = sizeof(struct xen_dm_op_set_pci_link_route), + [XEN_DMOP_modified_memory] = sizeof(struct xen_dm_op_modified_memory), + [XEN_DMOP_set_mem_type] = sizeof(struct xen_dm_op_set_mem_type), + [XEN_DMOP_inject_event] = sizeof(struct xen_dm_op_inject_event), + [XEN_DMOP_inject_msi] = sizeof(struct xen_dm_op_inject_msi), + [XEN_DMOP_map_mem_type_to_ioreq_server] = sizeof(struct xen_dm_op_map_mem_type_to_ioreq_server), + [XEN_DMOP_remote_shutdown] = sizeof(struct xen_dm_op_remote_shutdown), + [XEN_DMOP_relocate_memory] = sizeof(struct xen_dm_op_relocate_memory), + [XEN_DMOP_pin_memory_cacheattr] = sizeof(struct xen_dm_op_pin_memory_cacheattr), + }; + + rc = rcu_lock_remote_domain_by_id(op_args->domid, &d); + if ( rc ) + return rc; + + if ( !is_hvm_domain(d) ) + goto out; + + rc = xsm_dm_op(XSM_DM_PRIV, d); + if ( rc ) + goto out; + + rc = -EFAULT; + if ( op_args->buf[0].size < offset ) + goto out; + + if ( copy_from_guest_offset((void *)&op, op_args->buf[0].h, 0, offset) ) + goto out; + + if ( op.op >= ARRAY_SIZE(op_size) ) + { + rc = -EOPNOTSUPP; + goto out; + } + + op.op = array_index_nospec(op.op, ARRAY_SIZE(op_size)); + + if ( op_args->buf[0].size < offset + op_size[op.op] ) + goto out; + + if ( copy_from_guest_offset((void *)&op.u, op_args->buf[0].h, offset, + op_size[op.op]) ) + goto out; + + rc = -EINVAL; + if ( op.pad ) + goto out; + + switch ( op.op ) + { + case XEN_DMOP_create_ioreq_server: + { + struct xen_dm_op_create_ioreq_server *data = + &op.u.create_ioreq_server; + + const_op = false; + + rc = -EINVAL; + if ( data->pad[0] || data->pad[1] || data->pad[2] ) + break; + + rc = hvm_create_ioreq_server(d, data->handle_bufioreq, + &data->id); + break; + } + + case XEN_DMOP_get_ioreq_server_info: + { + struct xen_dm_op_get_ioreq_server_info *data = + &op.u.get_ioreq_server_info; + const uint16_t valid_flags = XEN_DMOP_no_gfns; + + const_op = false; + + rc = -EINVAL; + if ( data->flags & ~valid_flags ) + break; + + rc = hvm_get_ioreq_server_info(d, data->id, + (data->flags & XEN_DMOP_no_gfns) ? + NULL : (unsigned long *)&data->ioreq_gfn, + (data->flags & XEN_DMOP_no_gfns) ? + NULL : (unsigned long *)&data->bufioreq_gfn, + &data->bufioreq_port); + break; + } + + case XEN_DMOP_map_io_range_to_ioreq_server: + { + const struct xen_dm_op_ioreq_server_range *data = + &op.u.map_io_range_to_ioreq_server; + + rc = -EINVAL; + if ( data->pad ) + break; + + rc = hvm_map_io_range_to_ioreq_server(d, data->id, data->type, + data->start, data->end); + break; + } + + case XEN_DMOP_unmap_io_range_from_ioreq_server: + { + const struct xen_dm_op_ioreq_server_range *data = + &op.u.unmap_io_range_from_ioreq_server; + + rc = -EINVAL; + if ( data->pad ) + break; + + rc = hvm_unmap_io_range_from_ioreq_server(d, data->id, data->type, + data->start, data->end); + break; + } + + case XEN_DMOP_set_ioreq_server_state: + { + const struct xen_dm_op_set_ioreq_server_state *data = + &op.u.set_ioreq_server_state; + + rc = -EINVAL; + if ( data->pad ) + break; + + rc = hvm_set_ioreq_server_state(d, data->id, !!data->enabled); + break; + } + + case XEN_DMOP_destroy_ioreq_server: + { + const struct xen_dm_op_destroy_ioreq_server *data = + &op.u.destroy_ioreq_server; + + rc = -EINVAL; + if ( data->pad ) + break; + + rc = hvm_destroy_ioreq_server(d, data->id); + break; + } + + default: + rc = arch_dm_op(&op, d, op_args, &const_op); + } + + if ( (!rc || rc == -ERESTART) && + !const_op && copy_to_guest_offset(op_args->buf[0].h, offset, + (void *)&op.u, op_size[op.op]) ) + rc = -EFAULT; + + out: + rcu_unlock_domain(d); + + return rc; +} + +#ifdef CONFIG_COMPAT +CHECK_dm_op_create_ioreq_server; +CHECK_dm_op_get_ioreq_server_info; +CHECK_dm_op_ioreq_server_range; +CHECK_dm_op_set_ioreq_server_state; +CHECK_dm_op_destroy_ioreq_server; +CHECK_dm_op_track_dirty_vram; +CHECK_dm_op_set_pci_intx_level; +CHECK_dm_op_set_isa_irq_level; +CHECK_dm_op_set_pci_link_route; +CHECK_dm_op_modified_memory; +CHECK_dm_op_set_mem_type; +CHECK_dm_op_inject_event; +CHECK_dm_op_inject_msi; +CHECK_dm_op_remote_shutdown; +CHECK_dm_op_relocate_memory; +CHECK_dm_op_pin_memory_cacheattr; + +int compat_dm_op(domid_t domid, + unsigned int nr_bufs, + XEN_GUEST_HANDLE_PARAM(void) bufs) +{ + struct dmop_args args; + unsigned int i; + int rc; + + if ( nr_bufs > ARRAY_SIZE(args.buf) ) + return -E2BIG; + + args.domid = domid; + args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1); + + for ( i = 0; i < args.nr_bufs; i++ ) + { + struct compat_dm_op_buf cmp; + + if ( copy_from_guest_offset(&cmp, bufs, i, 1) ) + return -EFAULT; + +#define XLAT_dm_op_buf_HNDL_h(_d_, _s_) \ + guest_from_compat_handle((_d_)->h, (_s_)->h) + + XLAT_dm_op_buf(&args.buf[i], &cmp); + +#undef XLAT_dm_op_buf_HNDL_h + } + + rc = dm_op(&args); + + if ( rc == -ERESTART ) + rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih", + domid, nr_bufs, bufs); + + return rc; +} +#endif + +long do_dm_op(domid_t domid, + unsigned int nr_bufs, + XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs) +{ + struct dmop_args args; + int rc; + + if ( nr_bufs > ARRAY_SIZE(args.buf) ) + return -E2BIG; + + args.domid = domid; + args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1); + + if ( copy_from_guest_offset(&args.buf[0], bufs, 0, args.nr_bufs) ) + return -EFAULT; + + rc = dm_op(&args); + + if ( rc == -ERESTART ) + rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih", + domid, nr_bufs, bufs); + + return rc; +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/include/xen/hypercall.h b/xen/include/xen/hypercall.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/hypercall.h +++ b/xen/include/xen/hypercall.h @@ -XXX,XX +XXX,XX @@ do_dm_op( unsigned int nr_bufs, XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs); +struct dmop_args { + domid_t domid; + unsigned int nr_bufs; + /* Reserve enough buf elements for all current hypercalls. */ + struct xen_dm_op_buf buf[2]; +}; + +int arch_dm_op(struct xen_dm_op *op, + struct domain *d, + const struct dmop_args *op_args, + bool *const_op); + #ifdef CONFIG_HYPFS extern long do_hypfs_op( diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xsm/dummy.h +++ b/xen/include/xsm/dummy.h @@ -XXX,XX +XXX,XX @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int } } +#endif /* CONFIG_X86 */ + static XSM_INLINE int xsm_dm_op(XSM_DEFAULT_ARG struct domain *d) { XSM_ASSERT_ACTION(XSM_DM_PRIV); return xsm_default_action(action, current->domain, d); } -#endif /* CONFIG_X86 */ - #ifdef CONFIG_ARGO static XSM_INLINE int xsm_argo_enable(const struct domain *d) { diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xsm/xsm.h +++ b/xen/include/xsm/xsm.h @@ -XXX,XX +XXX,XX @@ struct xsm_operations { int (*ioport_permission) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow); int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow); int (*pmu_op) (struct domain *d, unsigned int op); - int (*dm_op) (struct domain *d); #endif + int (*dm_op) (struct domain *d); int (*xen_version) (uint32_t cmd); int (*domain_resource_map) (struct domain *d); #ifdef CONFIG_ARGO @@ -XXX,XX +XXX,XX @@ static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, unsigned int return xsm_ops->pmu_op(d, op); } +#endif /* CONFIG_X86 */ + static inline int xsm_dm_op(xsm_default_t def, struct domain *d) { return xsm_ops->dm_op(d); } -#endif /* CONFIG_X86 */ - static inline int xsm_xen_version (xsm_default_t def, uint32_t op) { return xsm_ops->xen_version(op); diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c index XXXXXXX..XXXXXXX 100644 --- a/xen/xsm/dummy.c +++ b/xen/xsm/dummy.c @@ -XXX,XX +XXX,XX @@ void __init xsm_fixup_ops (struct xsm_operations *ops) set_to_dummy_if_null(ops, ioport_permission); set_to_dummy_if_null(ops, ioport_mapping); set_to_dummy_if_null(ops, pmu_op); - set_to_dummy_if_null(ops, dm_op); #endif + set_to_dummy_if_null(ops, dm_op); set_to_dummy_if_null(ops, xen_version); set_to_dummy_if_null(ops, domain_resource_map); #ifdef CONFIG_ARGO diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c index XXXXXXX..XXXXXXX 100644 --- a/xen/xsm/flask/hooks.c +++ b/xen/xsm/flask/hooks.c @@ -XXX,XX +XXX,XX @@ static int flask_pmu_op (struct domain *d, unsigned int op) return -EPERM; } } +#endif /* CONFIG_X86 */ static int flask_dm_op(struct domain *d) { return current_has_perm(d, SECCLASS_HVM, HVM__DM); } -#endif /* CONFIG_X86 */ - static int flask_xen_version (uint32_t op) { u32 dsid = domain_sid(current->domain); @@ -XXX,XX +XXX,XX @@ static struct xsm_operations flask_ops = { .ioport_permission = flask_ioport_permission, .ioport_mapping = flask_ioport_mapping, .pmu_op = flask_pmu_op, - .dm_op = flask_dm_op, #endif + .dm_op = flask_dm_op, .xen_version = flask_xen_version, .domain_resource_map = flask_domain_resource_map, #ifdef CONFIG_ARGO -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> As x86 implementation of XENMEM_resource_ioreq_server can be re-used on Arm later on, this patch makes it common and removes arch_acquire_resource as unneeded. This support is going to be used on Arm to be able run device emulator outside of Xen hypervisor. Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - no changes --- --- xen/arch/x86/mm.c | 44 -------------------------------------------- xen/common/memory.c | 45 +++++++++++++++++++++++++++++++++++++++++++-- xen/include/asm-arm/mm.h | 8 -------- xen/include/asm-x86/mm.h | 4 ---- 4 files changed, 43 insertions(+), 58 deletions(-) diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -XXX,XX +XXX,XX @@ int xenmem_add_to_physmap_one( return rc; } -int arch_acquire_resource(struct domain *d, unsigned int type, - unsigned int id, unsigned long frame, - unsigned int nr_frames, xen_pfn_t mfn_list[]) -{ - int rc; - - switch ( type ) - { -#ifdef CONFIG_HVM - case XENMEM_resource_ioreq_server: - { - ioservid_t ioservid = id; - unsigned int i; - - rc = -EINVAL; - if ( !is_hvm_domain(d) ) - break; - - if ( id != (unsigned int)ioservid ) - break; - - rc = 0; - for ( i = 0; i < nr_frames; i++ ) - { - mfn_t mfn; - - rc = hvm_get_ioreq_server_frame(d, id, frame + i, &mfn); - if ( rc ) - break; - - mfn_list[i] = mfn_x(mfn); - } - break; - } -#endif - - default: - rc = -EOPNOTSUPP; - break; - } - - return rc; -} - long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { int rc; diff --git a/xen/common/memory.c b/xen/common/memory.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -XXX,XX +XXX,XX @@ #include <public/memory.h> #include <xsm/xsm.h> +#ifdef CONFIG_IOREQ_SERVER +#include <xen/ioreq.h> +#endif + #ifdef CONFIG_X86 #include <asm/guest.h> #endif @@ -XXX,XX +XXX,XX @@ static int acquire_grant_table(struct domain *d, unsigned int id, return 0; } +#ifdef CONFIG_IOREQ_SERVER +static int acquire_ioreq_server(struct domain *d, + unsigned int id, + unsigned long frame, + unsigned int nr_frames, + xen_pfn_t mfn_list[]) +{ + ioservid_t ioservid = id; + unsigned int i; + int rc; + + if ( !is_hvm_domain(d) ) + return -EINVAL; + + if ( id != (unsigned int)ioservid ) + return -EINVAL; + + for ( i = 0; i < nr_frames; i++ ) + { + mfn_t mfn; + + rc = hvm_get_ioreq_server_frame(d, id, frame + i, &mfn); + if ( rc ) + return rc; + + mfn_list[i] = mfn_x(mfn); + } + + return 0; +} +#endif + static int acquire_resource( XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg) { @@ -XXX,XX +XXX,XX @@ static int acquire_resource( mfn_list); break; +#ifdef CONFIG_IOREQ_SERVER + case XENMEM_resource_ioreq_server: + rc = acquire_ioreq_server(d, xmar.id, xmar.frame, xmar.nr_frames, + mfn_list); + break; +#endif default: - rc = arch_acquire_resource(d, xmar.type, xmar.id, xmar.frame, - xmar.nr_frames, mfn_list); + rc = -EOPNOTSUPP; break; } diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -XXX,XX +XXX,XX @@ static inline void put_page_and_type(struct page_info *page) void clear_and_clean_page(struct page_info *page); -static inline -int arch_acquire_resource(struct domain *d, unsigned int type, unsigned int id, - unsigned long frame, unsigned int nr_frames, - xen_pfn_t mfn_list[]) -{ - return -EOPNOTSUPP; -} - unsigned int arch_get_dma_bitsize(void); #endif /* __ARCH_ARM_MM__ */ diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/mm.h +++ b/xen/include/asm-x86/mm.h @@ -XXX,XX +XXX,XX @@ static inline bool arch_mfn_in_directmap(unsigned long mfn) return mfn <= (virt_to_mfn(eva - 1) + 1); } -int arch_acquire_resource(struct domain *d, unsigned int type, - unsigned int id, unsigned long frame, - unsigned int nr_frames, xen_pfn_t mfn_list[]); - #endif /* __ASM_X86_MM_H__ */ -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> This patch adds basic IOREQ/DM support on Arm. The subsequent patches will improve functionality, add remaining bits as well as address several TODOs. Please note, the "PIO handling" TODO is expected to left unaddressed for the current series. It is not an big issue for now while Xen doesn't have support for vPCI on Arm. On Arm64 they are only used for PCI IO Bar and we would probably want to expose them to emulator as PIO access to make a DM completely arch-agnostic. So "PIO handling" should be implemented when we add support for vPCI. Please note, at the moment build on Arm32 is broken (see cmpxchg usage in hvm_send_buffered_ioreq()) due to the lack of cmpxchg_64 support on Arm32. There is a patch on review to address this issue: https://patchwork.kernel.org/patch/11715559/ Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - was split into: - arm/ioreq: Introduce arch specific bits for IOREQ/DM features - xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm - update patch description - update asm-arm/hvm/ioreq.h according to the newly introduced arch functions: - arch_hvm_destroy_ioreq_server() - arch_handle_hvm_io_completion() - update arch files to include xen/ioreq.h - remove HVMOP plumbing - rewrite a logic to handle properly case when hvm_send_ioreq() returns IO_RETRY - add a logic to handle properly handle_hvm_io_completion() return value - rename handle_mmio() to ioreq_handle_complete_mmio() - move paging_mark_pfn_dirty() to asm-arm/paging.h - remove forward declaration for hvm_ioreq_server in asm-arm/paging.h - move try_fwd_ioserv() to ioreq.c, provide stubs if !CONFIG_IOREQ_SERVER - do not remove #ifdef CONFIG_IOREQ_SERVER in memory.c for guarding xen/ioreq.h - use gdprintk in try_fwd_ioserv(), remove unneeded prints - update list of #include-s - move has_vpci() to asm-arm/domain.h - add a comment (TODO) to unimplemented yet handle_pio() - remove hvm_mmio_first(last)_byte() and hvm_ioreq_(page/vcpu/server) structs from the arch files, they were already moved to the common code - remove set_foreign_p2m_entry() changes, they will be properly implemented in the follow-up patch - select IOREQ_SERVER for Arm instead of Arm64 in Kconfig - remove x86's realmode and other unneeded stubs from xen/ioreq.h - clafify ioreq_t p.df usage in try_fwd_ioserv() - set ioreq_t p.count to 1 in try_fwd_ioserv() --- --- xen/arch/arm/Kconfig | 1 + xen/arch/arm/Makefile | 2 + xen/arch/arm/dm.c | 33 ++++++++++ xen/arch/arm/domain.c | 9 +++ xen/arch/arm/io.c | 11 +++- xen/arch/arm/ioreq.c | 142 ++++++++++++++++++++++++++++++++++++++++ xen/arch/arm/traps.c | 32 +++++++-- xen/include/asm-arm/domain.h | 46 +++++++++++++ xen/include/asm-arm/hvm/ioreq.h | 108 ++++++++++++++++++++++++++++++ xen/include/asm-arm/mmio.h | 1 + xen/include/asm-arm/paging.h | 4 ++ 11 files changed, 384 insertions(+), 5 deletions(-) create mode 100644 xen/arch/arm/dm.c create mode 100644 xen/arch/arm/ioreq.c create mode 100644 xen/include/asm-arm/hvm/ioreq.h diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/Kconfig +++ b/xen/arch/arm/Kconfig @@ -XXX,XX +XXX,XX @@ config ARM select HAS_PASSTHROUGH select HAS_PDX select IOMMU_FORCE_PT_SHARE + select IOREQ_SERVER config ARCH_DEFCONFIG string diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/Makefile +++ b/xen/arch/arm/Makefile @@ -XXX,XX +XXX,XX @@ obj-y += cpuerrata.o obj-y += cpufeature.o obj-y += decode.o obj-y += device.o +obj-$(CONFIG_IOREQ_SERVER) += dm.o obj-y += domain.o obj-y += domain_build.init.o obj-y += domctl.o @@ -XXX,XX +XXX,XX @@ obj-y += guest_atomics.o obj-y += guest_walk.o obj-y += hvm.o obj-y += io.o +obj-$(CONFIG_IOREQ_SERVER) += ioreq.o obj-y += irq.o obj-y += kernel.init.o obj-$(CONFIG_LIVEPATCH) += livepatch.o diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/xen/arch/arm/dm.c @@ -XXX,XX +XXX,XX @@ +/* + * Copyright (c) 2019 Arm ltd. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; If not, see <http://www.gnu.org/licenses/>. + */ + +#include <xen/hypercall.h> + +int arch_dm_op(struct xen_dm_op *op, struct domain *d, + const struct dmop_args *op_args, bool *const_op) +{ + return -EOPNOTSUPP; +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -XXX,XX +XXX,XX @@ #include <xen/grant_table.h> #include <xen/hypercall.h> #include <xen/init.h> +#include <xen/ioreq.h> #include <xen/lib.h> #include <xen/livepatch.h> #include <xen/sched.h> @@ -XXX,XX +XXX,XX @@ int arch_domain_create(struct domain *d, ASSERT(config != NULL); +#ifdef CONFIG_IOREQ_SERVER + hvm_ioreq_init(d); +#endif + /* p2m_init relies on some value initialized by the IOMMU subsystem */ if ( (rc = iommu_domain_init(d, config->iommu_opts)) != 0 ) goto fail; @@ -XXX,XX +XXX,XX @@ int domain_relinquish_resources(struct domain *d) if (ret ) return ret; +#ifdef CONFIG_IOREQ_SERVER + hvm_destroy_all_ioreq_servers(d); +#endif + PROGRESS(xen): ret = relinquish_memory(d, &d->xenpage_list); if ( ret ) diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/io.c +++ b/xen/arch/arm/io.c @@ -XXX,XX +XXX,XX @@ * GNU General Public License for more details. */ +#include <xen/ioreq.h> #include <xen/lib.h> #include <xen/spinlock.h> #include <xen/sched.h> @@ -XXX,XX +XXX,XX @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs, handler = find_mmio_handler(v->domain, info.gpa); if ( !handler ) - return IO_UNHANDLED; + { + int rc; + + rc = try_fwd_ioserv(regs, v, &info); + if ( rc == IO_HANDLED ) + return handle_ioserv(regs, v); + + return rc; + } /* All the instructions used on emulated MMIO region should be valid */ if ( !dabt.valid ) diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/xen/arch/arm/ioreq.c @@ -XXX,XX +XXX,XX @@ +/* + * arm/ioreq.c: hardware virtual machine I/O emulation + * + * Copyright (c) 2019 Arm ltd. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; If not, see <http://www.gnu.org/licenses/>. + */ + +#include <xen/domain.h> +#include <xen/ioreq.h> + +#include <public/hvm/ioreq.h> + +#include <asm/traps.h> + +enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v) +{ + const union hsr hsr = { .bits = regs->hsr }; + const struct hsr_dabt dabt = hsr.dabt; + /* Code is similar to handle_read */ + uint8_t size = (1 << dabt.size) * 8; + register_t r = v->arch.hvm.hvm_io.io_req.data; + + /* We are done with the IO */ + v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE; + + /* XXX: Do we need to take care of write here ? */ + if ( dabt.write ) + return IO_HANDLED; + + /* + * Sign extend if required. + * Note that we expect the read handler to have zeroed the bits + * outside the requested access size. + */ + if ( dabt.sign && (r & (1UL << (size - 1))) ) + { + /* + * We are relying on register_t using the same as + * an unsigned long in order to keep the 32-bit assembly + * code smaller. + */ + BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long)); + r |= (~0UL) << size; + } + + set_user_reg(regs, dabt.reg, r); + + return IO_HANDLED; +} + +enum io_state try_fwd_ioserv(struct cpu_user_regs *regs, + struct vcpu *v, mmio_info_t *info) +{ + struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io; + ioreq_t p = { + .type = IOREQ_TYPE_COPY, + .addr = info->gpa, + .size = 1 << info->dabt.size, + .count = 1, + .dir = !info->dabt.write, + /* + * On x86, df is used by 'rep' instruction to tell the direction + * to iterate (forward or backward). + * On Arm, all the accesses to MMIO region will do a single + * memory access. So for now, we can safely always set to 0. + */ + .df = 0, + .data = get_user_reg(regs, info->dabt.reg), + .state = STATE_IOREQ_READY, + }; + struct hvm_ioreq_server *s = NULL; + enum io_state rc; + + switch ( vio->io_req.state ) + { + case STATE_IOREQ_NONE: + break; + + case STATE_IORESP_READY: + return IO_HANDLED; + + default: + gdprintk(XENLOG_ERR, "wrong state %u\n", vio->io_req.state); + return IO_ABORT; + } + + s = hvm_select_ioreq_server(v->domain, &p); + if ( !s ) + return IO_UNHANDLED; + + if ( !info->dabt.valid ) + return IO_ABORT; + + vio->io_req = p; + + rc = hvm_send_ioreq(s, &p, 0); + if ( rc != IO_RETRY || v->domain->is_shutting_down ) + vio->io_req.state = STATE_IOREQ_NONE; + else if ( !hvm_ioreq_needs_completion(&vio->io_req) ) + rc = IO_HANDLED; + else + vio->io_completion = HVMIO_mmio_completion; + + return rc; +} + +bool ioreq_handle_complete_mmio(void) +{ + struct vcpu *v = current; + struct cpu_user_regs *regs = guest_cpu_user_regs(); + const union hsr hsr = { .bits = regs->hsr }; + paddr_t addr = v->arch.hvm.hvm_io.io_req.addr; + + if ( try_handle_mmio(regs, hsr, addr) == IO_HANDLED ) + { + advance_pc(regs, hsr); + return true; + } + + return false; +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -XXX,XX +XXX,XX @@ #include <xen/hypercall.h> #include <xen/init.h> #include <xen/iocap.h> +#include <xen/ioreq.h> #include <xen/irq.h> #include <xen/lib.h> #include <xen/mem_access.h> @@ -XXX,XX +XXX,XX @@ static arm_hypercall_t arm_hypercall_table[] = { #ifdef CONFIG_HYPFS HYPERCALL(hypfs_op, 5), #endif +#ifdef CONFIG_IOREQ_SERVER + HYPERCALL(dm_op, 3), +#endif }; #ifndef NDEBUG @@ -XXX,XX +XXX,XX @@ static void do_trap_stage2_abort_guest(struct cpu_user_regs *regs, case IO_HANDLED: advance_pc(regs, hsr); return; + case IO_RETRY: + /* finish later */ + return; case IO_UNHANDLED: /* IO unhandled, try another way to handle it. */ break; + default: + ASSERT_UNREACHABLE(); } } @@ -XXX,XX +XXX,XX @@ static void check_for_pcpu_work(void) * Process pending work for the vCPU. Any call should be fast or * implement preemption. */ -static void check_for_vcpu_work(void) +static bool check_for_vcpu_work(void) { struct vcpu *v = current; +#ifdef CONFIG_IOREQ_SERVER + bool handled; + + local_irq_enable(); + handled = handle_hvm_io_completion(v); + local_irq_disable(); + + if ( !handled ) + return true; +#endif + if ( likely(!v->arch.need_flush_to_ram) ) - return; + return false; /* * Give a chance for the pCPU to process work before handling the vCPU @@ -XXX,XX +XXX,XX @@ static void check_for_vcpu_work(void) local_irq_enable(); p2m_flush_vm(v); local_irq_disable(); + + return false; } /* @@ -XXX,XX +XXX,XX @@ void leave_hypervisor_to_guest(void) { local_irq_disable(); - check_for_vcpu_work(); - check_for_pcpu_work(); + do + { + check_for_pcpu_work(); + } while ( check_for_vcpu_work() ); vgic_sync_to_lrs(); diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -XXX,XX +XXX,XX @@ #include <asm/vgic.h> #include <asm/vpl011.h> #include <public/hvm/params.h> +#include <public/hvm/dm_op.h> +#include <public/hvm/ioreq.h> + +#define MAX_NR_IOREQ_SERVERS 8 struct hvm_domain { uint64_t params[HVM_NR_PARAMS]; + + /* Guest page range used for non-default ioreq servers */ + struct { + unsigned long base; + unsigned long mask; + unsigned long legacy_mask; /* indexed by HVM param number */ + } ioreq_gfn; + + /* Lock protects all other values in the sub-struct and the default */ + struct { + spinlock_t lock; + struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS]; + } ioreq_server; }; #ifdef CONFIG_ARM_64 @@ -XXX,XX +XXX,XX @@ struct arch_domain #endif } __cacheline_aligned; +enum hvm_io_completion { + HVMIO_no_completion, + HVMIO_mmio_completion, + HVMIO_pio_completion +}; + +struct hvm_vcpu_io { + /* I/O request in flight to device model. */ + enum hvm_io_completion io_completion; + ioreq_t io_req; + + /* + * HVM emulation: + * Linear address @mmio_gla maps to MMIO physical frame @mmio_gpfn. + * The latter is known to be an MMIO frame (not RAM). + * This translation is only valid for accesses as per @mmio_access. + */ + struct npfec mmio_access; + unsigned long mmio_gla; + unsigned long mmio_gpfn; +}; + struct arch_vcpu { struct { @@ -XXX,XX +XXX,XX @@ struct arch_vcpu */ bool need_flush_to_ram; + struct hvm_vcpu + { + struct hvm_vcpu_io hvm_io; + } hvm; + } __cacheline_aligned; void vcpu_show_execution_state(struct vcpu *); @@ -XXX,XX +XXX,XX @@ static inline void arch_vcpu_block(struct vcpu *v) {} #define arch_vm_assist_valid_mask(d) (1UL << VMASST_TYPE_runstate_update_flag) +#define has_vpci(d) ({ (void)(d); false; }) + #endif /* __ASM_DOMAIN_H__ */ /* diff --git a/xen/include/asm-arm/hvm/ioreq.h b/xen/include/asm-arm/hvm/ioreq.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/xen/include/asm-arm/hvm/ioreq.h @@ -XXX,XX +XXX,XX @@ +/* + * hvm.h: Hardware virtual machine assist interface definitions. + * + * Copyright (c) 2016 Citrix Systems Inc. + * Copyright (c) 2019 Arm ltd. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef __ASM_ARM_HVM_IOREQ_H__ +#define __ASM_ARM_HVM_IOREQ_H__ + +#include <public/hvm/ioreq.h> +#include <public/hvm/dm_op.h> + +#ifdef CONFIG_IOREQ_SERVER +enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v); +enum io_state try_fwd_ioserv(struct cpu_user_regs *regs, + struct vcpu *v, mmio_info_t *info); +#else +static inline enum io_state handle_ioserv(struct cpu_user_regs *regs, + struct vcpu *v) +{ + return IO_UNHANDLED; +} + +static inline enum io_state try_fwd_ioserv(struct cpu_user_regs *regs, + struct vcpu *v, mmio_info_t *info) +{ + return IO_UNHANDLED; +} +#endif + +bool ioreq_handle_complete_mmio(void); + +static inline bool handle_pio(uint16_t port, unsigned int size, int dir) +{ + /* + * TODO: For Arm64, the main user will be PCI. So this should be + * implemented when we add support for vPCI. + */ + BUG(); + return true; +} + +static inline int arch_hvm_destroy_ioreq_server(struct hvm_ioreq_server *s) +{ + return 0; +} + +static inline void msix_write_completion(struct vcpu *v) +{ +} + +static inline bool arch_handle_hvm_io_completion( + enum hvm_io_completion io_completion) +{ + ASSERT_UNREACHABLE(); +} + +static inline int hvm_get_ioreq_server_range_type(struct domain *d, + ioreq_t *p, + uint8_t *type, + uint64_t *addr) +{ + if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO ) + return -EINVAL; + + *type = (p->type == IOREQ_TYPE_PIO) ? + XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY; + *addr = p->addr; + + return 0; +} + +static inline void arch_hvm_ioreq_init(struct domain *d) +{ +} + +static inline void arch_hvm_ioreq_destroy(struct domain *d) +{ +} + +#define IOREQ_IO_HANDLED IO_HANDLED +#define IOREQ_IO_UNHANDLED IO_UNHANDLED +#define IOREQ_IO_RETRY IO_RETRY + +#endif /* __ASM_ARM_HVM_IOREQ_H__ */ + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/include/asm-arm/mmio.h b/xen/include/asm-arm/mmio.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-arm/mmio.h +++ b/xen/include/asm-arm/mmio.h @@ -XXX,XX +XXX,XX @@ enum io_state IO_ABORT, /* The IO was handled by the helper and led to an abort. */ IO_HANDLED, /* The IO was successfully handled by the helper. */ IO_UNHANDLED, /* The IO was not handled by the helper. */ + IO_RETRY, /* Retry the emulation for some reason */ }; typedef int (*mmio_read_t)(struct vcpu *v, mmio_info_t *info, diff --git a/xen/include/asm-arm/paging.h b/xen/include/asm-arm/paging.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-arm/paging.h +++ b/xen/include/asm-arm/paging.h @@ -XXX,XX +XXX,XX @@ #define paging_mode_translate(d) (1) #define paging_mode_external(d) (1) +static inline void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn) +{ +} + #endif /* XEN_PAGING_H */ /* -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> This patch implements reference counting of foreign entries in in set_foreign_p2m_entry() on Arm. This is a mandatory action if we want to run emulator (IOREQ server) in other than dom0 domain, as we can't trust it to do the right thing if it is not running in dom0. So we need to grab a reference on the page to avoid it disappearing. It was tested with IOREQ feature to confirm that all the pages given to this function belong to a domain, so we can use the same approach as for XENMAPSPACE_gmfn_foreign handling in xenmem_add_to_physmap_one(). This involves adding an extra parameter for the foreign domain to set_foreign_p2m_entry(). Also remove restriction for the hardware domain in the common code if we run on Arm. Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - new patch, was split from: "[RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features" - rewrite a logic to handle properly reference in set_foreign_p2m_entry() instead of treating foreign entries as p2m_ram_rw --- --- xen/arch/arm/p2m.c | 16 ++++++++++++++++ xen/arch/x86/mm/p2m.c | 5 +++-- xen/common/memory.c | 4 +++- xen/include/asm-arm/p2m.h | 11 ++--------- xen/include/asm-x86/p2m.h | 3 ++- 5 files changed, 26 insertions(+), 13 deletions(-) diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -XXX,XX +XXX,XX @@ int guest_physmap_remove_page(struct domain *d, gfn_t gfn, mfn_t mfn, return p2m_remove_mapping(d, gfn, (1 << page_order), mfn); } +int set_foreign_p2m_entry(struct domain *d, struct domain *fd, + unsigned long gfn, mfn_t mfn) +{ + struct page_info *page = mfn_to_page(mfn); + int rc; + + if ( !get_page(page, fd) ) + return -EINVAL; + + rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_map_foreign_rw); + if ( rc ) + put_page(page); + + return 0; +} + static struct page_info *p2m_allocate_root(void) { struct page_info *page; diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -XXX,XX +XXX,XX @@ static int set_typed_p2m_entry(struct domain *d, unsigned long gfn_l, } /* Set foreign mfn in the given guest's p2m table. */ -int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn) +int set_foreign_p2m_entry(struct domain *d, struct domain *fd, + unsigned long gfn, mfn_t mfn) { return set_typed_p2m_entry(d, gfn, mfn, PAGE_ORDER_4K, p2m_map_foreign, p2m_get_hostp2m(d)->default_access); @@ -XXX,XX +XXX,XX @@ int p2m_add_foreign(struct domain *tdom, unsigned long fgfn, * will update the m2p table which will result in mfn -> gpfn of dom0 * and not fgfn of domU. */ - rc = set_foreign_p2m_entry(tdom, gpfn, mfn); + rc = set_foreign_p2m_entry(tdom, fdom, gpfn, mfn); if ( rc ) gdprintk(XENLOG_WARNING, "set_foreign_p2m_entry failed. " "gpfn:%lx mfn:%lx fgfn:%lx td:%d fd:%d\n", diff --git a/xen/common/memory.c b/xen/common/memory.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -XXX,XX +XXX,XX @@ static int acquire_resource( xen_pfn_t gfn_list[ARRAY_SIZE(mfn_list)]; unsigned int i; +#ifndef CONFIG_ARM /* * FIXME: Until foreign pages inserted into the P2M are properly * reference counted, it is unsafe to allow mapping of @@ -XXX,XX +XXX,XX @@ static int acquire_resource( */ if ( !is_hardware_domain(currd) ) return -EACCES; +#endif if ( copy_from_guest(gfn_list, xmar.frame_list, xmar.nr_frames) ) rc = -EFAULT; for ( i = 0; !rc && i < xmar.nr_frames; i++ ) { - rc = set_foreign_p2m_entry(currd, gfn_list[i], + rc = set_foreign_p2m_entry(currd, d, gfn_list[i], _mfn(mfn_list[i])); /* rc should be -EIO for any iteration other than the first */ if ( rc && i ) diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-arm/p2m.h +++ b/xen/include/asm-arm/p2m.h @@ -XXX,XX +XXX,XX @@ static inline gfn_t gfn_next_boundary(gfn_t gfn, unsigned int order) return gfn_add(gfn, 1UL << order); } -static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, - mfn_t mfn) -{ - /* - * NOTE: If this is implemented then proper reference counting of - * foreign entries will need to be implemented. - */ - return -EOPNOTSUPP; -} +int set_foreign_p2m_entry(struct domain *d, struct domain *fd, + unsigned long gfn, mfn_t mfn); /* * A vCPU has cache enabled only when the MMU is enabled and data cache diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -XXX,XX +XXX,XX @@ int p2m_is_logdirty_range(struct p2m_domain *, unsigned long start, unsigned long end); /* Set foreign entry in the p2m table (for priv-mapping) */ -int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn); +int set_foreign_p2m_entry(struct domain *d, struct domain *fd, + unsigned long gfn, mfn_t mfn); /* Set mmio addresses in the p2m table (for pass-through) */ int set_mmio_p2m_entry(struct domain *d, gfn_t gfn, mfn_t mfn, -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> This patch introduces a helper the main purpose of which is to check if a domain is using IOREQ server(s). On Arm the benefit is to avoid calling handle_hvm_io_completion() (which implies iterating over all possible IOREQ servers anyway) on every return in leave_hypervisor_to_guest() if there is no active servers for the particular domain. This involves adding an extra per-domain variable to store the count of servers in use. Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - new patch --- --- xen/arch/arm/traps.c | 15 +++++++++------ xen/common/ioreq.c | 9 ++++++++- xen/include/asm-arm/domain.h | 1 + xen/include/asm-x86/hvm/domain.h | 1 + xen/include/xen/ioreq.h | 5 +++++ 5 files changed, 24 insertions(+), 7 deletions(-) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -XXX,XX +XXX,XX @@ static bool check_for_vcpu_work(void) struct vcpu *v = current; #ifdef CONFIG_IOREQ_SERVER - bool handled; + if ( hvm_domain_has_ioreq_server(v->domain) ) + { + bool handled; - local_irq_enable(); - handled = handle_hvm_io_completion(v); - local_irq_disable(); + local_irq_enable(); + handled = handle_hvm_io_completion(v); + local_irq_disable(); - if ( !handled ) - return true; + if ( !handled ) + return true; + } #endif if ( likely(!v->arch.need_flush_to_ram) ) diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/ioreq.c +++ b/xen/common/ioreq.c @@ -XXX,XX +XXX,XX @@ static void set_ioreq_server(struct domain *d, unsigned int id, struct hvm_ioreq_server *s) { ASSERT(id < MAX_NR_IOREQ_SERVERS); - ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]); + ASSERT((!s && d->arch.hvm.ioreq_server.server[id]) || + (s && !d->arch.hvm.ioreq_server.server[id])); d->arch.hvm.ioreq_server.server[id] = s; + + if ( s ) + d->arch.hvm.ioreq_server.nr_servers ++; + else + d->arch.hvm.ioreq_server.nr_servers --; } /* @@ -XXX,XX +XXX,XX @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered) void hvm_ioreq_init(struct domain *d) { spin_lock_init(&d->arch.hvm.ioreq_server.lock); + d->arch.hvm.ioreq_server.nr_servers = 0; arch_hvm_ioreq_init(d); } diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -XXX,XX +XXX,XX @@ struct hvm_domain struct { spinlock_t lock; struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS]; + unsigned int nr_servers; } ioreq_server; }; diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/domain.h +++ b/xen/include/asm-x86/hvm/domain.h @@ -XXX,XX +XXX,XX @@ struct hvm_domain { struct { spinlock_t lock; struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS]; + unsigned int nr_servers; } ioreq_server; /* Cached CF8 for guest PCI config cycles */ diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/ioreq.h +++ b/xen/include/xen/ioreq.h @@ -XXX,XX +XXX,XX @@ struct hvm_ioreq_server { uint8_t bufioreq_handling; }; +static inline bool hvm_domain_has_ioreq_server(const struct domain *d) +{ + return (d->arch.hvm.ioreq_server.nr_servers > 0); +} + #define GET_IOREQ_SERVER(d, id) \ (d)->arch.hvm.ioreq_server.server[id] -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> This patch adds ability to the device emulator to notify otherend (some entity running in the guest) using a SPI and implements Arm specific bits for it. Proposed interface allows emulator to set the logical level of a one of a domain's IRQ lines. Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Please note, I left interface untouched since there is still an open discussion what interface to use/what information to pass to the hypervisor. The question whether we should abstract away the state of the line or not. Changes RFC -> V1: - check incoming parameters in arch_dm_op() - add explicit padding to struct xen_dm_op_set_irq_level --- --- tools/libs/devicemodel/core.c | 18 +++++++++++++ tools/libs/devicemodel/include/xendevicemodel.h | 4 +++ tools/libs/devicemodel/libxendevicemodel.map | 1 + xen/arch/arm/dm.c | 36 ++++++++++++++++++++++++- xen/common/dm.c | 1 + xen/include/public/hvm/dm_op.h | 15 +++++++++++ 6 files changed, 74 insertions(+), 1 deletion(-) diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/devicemodel/core.c +++ b/tools/libs/devicemodel/core.c @@ -XXX,XX +XXX,XX @@ int xendevicemodel_set_isa_irq_level( return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op)); } +int xendevicemodel_set_irq_level( + xendevicemodel_handle *dmod, domid_t domid, uint32_t irq, + unsigned int level) +{ + struct xen_dm_op op; + struct xen_dm_op_set_irq_level *data; + + memset(&op, 0, sizeof(op)); + + op.op = XEN_DMOP_set_irq_level; + data = &op.u.set_irq_level; + + data->irq = irq; + data->level = level; + + return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op)); +} + int xendevicemodel_set_pci_link_route( xendevicemodel_handle *dmod, domid_t domid, uint8_t link, uint8_t irq) { diff --git a/tools/libs/devicemodel/include/xendevicemodel.h b/tools/libs/devicemodel/include/xendevicemodel.h index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/devicemodel/include/xendevicemodel.h +++ b/tools/libs/devicemodel/include/xendevicemodel.h @@ -XXX,XX +XXX,XX @@ int xendevicemodel_set_isa_irq_level( xendevicemodel_handle *dmod, domid_t domid, uint8_t irq, unsigned int level); +int xendevicemodel_set_irq_level( + xendevicemodel_handle *dmod, domid_t domid, unsigned int irq, + unsigned int level); + /** * This function maps a PCI INTx line to a an IRQ line. * diff --git a/tools/libs/devicemodel/libxendevicemodel.map b/tools/libs/devicemodel/libxendevicemodel.map index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/devicemodel/libxendevicemodel.map +++ b/tools/libs/devicemodel/libxendevicemodel.map @@ -XXX,XX +XXX,XX @@ VERS_1.2 { global: xendevicemodel_relocate_memory; xendevicemodel_pin_memory_cacheattr; + xendevicemodel_set_irq_level; } VERS_1.1; VERS_1.3 { diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/dm.c +++ b/xen/arch/arm/dm.c @@ -XXX,XX +XXX,XX @@ */ #include <xen/hypercall.h> +#include <asm/vgic.h> int arch_dm_op(struct xen_dm_op *op, struct domain *d, const struct dmop_args *op_args, bool *const_op) { - return -EOPNOTSUPP; + int rc; + + switch ( op->op ) + { + case XEN_DMOP_set_irq_level: + { + const struct xen_dm_op_set_irq_level *data = + &op->u.set_irq_level; + + /* Only SPIs are supported */ + if ( (data->irq < NR_LOCAL_IRQS) || (data->irq >= vgic_num_irqs(d)) ) + { + rc = -EINVAL; + break; + } + + if ( data->level != 0 && data->level != 1 ) + { + rc = -EINVAL; + break; + } + + + vgic_inject_irq(d, NULL, data->irq, data->level); + rc = 0; + break; + } + + default: + rc = -EOPNOTSUPP; + break; + } + + return rc; } /* diff --git a/xen/common/dm.c b/xen/common/dm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/dm.c +++ b/xen/common/dm.c @@ -XXX,XX +XXX,XX @@ static int dm_op(const struct dmop_args *op_args) [XEN_DMOP_remote_shutdown] = sizeof(struct xen_dm_op_remote_shutdown), [XEN_DMOP_relocate_memory] = sizeof(struct xen_dm_op_relocate_memory), [XEN_DMOP_pin_memory_cacheattr] = sizeof(struct xen_dm_op_pin_memory_cacheattr), + [XEN_DMOP_set_irq_level] = sizeof(struct xen_dm_op_set_irq_level), }; rc = rcu_lock_remote_domain_by_id(op_args->domid, &d); diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/public/hvm/dm_op.h +++ b/xen/include/public/hvm/dm_op.h @@ -XXX,XX +XXX,XX @@ struct xen_dm_op_pin_memory_cacheattr { uint32_t pad; }; +/* + * XEN_DMOP_set_irq_level: Set the logical level of a one of a domain's + * IRQ lines. + * XXX Handle PPIs. + */ +#define XEN_DMOP_set_irq_level 19 + +struct xen_dm_op_set_irq_level { + uint32_t irq; + /* IN - Level: 0 -> deasserted, 1 -> asserted */ + uint8_t level; + uint8_t pad[3]; +}; + struct xen_dm_op { uint32_t op; uint32_t pad; @@ -XXX,XX +XXX,XX @@ struct xen_dm_op { struct xen_dm_op_track_dirty_vram track_dirty_vram; struct xen_dm_op_set_pci_intx_level set_pci_intx_level; struct xen_dm_op_set_isa_irq_level set_isa_irq_level; + struct xen_dm_op_set_irq_level set_irq_level; struct xen_dm_op_set_pci_link_route set_pci_link_route; struct xen_dm_op_modified_memory modified_memory; struct xen_dm_op_set_mem_type set_mem_type; -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> As the IOREQ is a common feature now and we also need to invalidate qemu mapcache on XENMEM_decrease_reservation on Arm this patch moves this handling to the common code and move per-domain qemu_mapcache_invalidate variable out of the arch sub-struct. Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - move send_invalidate_req() to the common code - update patch subject/description - move qemu_mapcache_invalidate out of the arch sub-struct, update checks - remove #if defined(CONFIG_ARM64) from the common code --- --- xen/arch/arm/traps.c | 6 ++++++ xen/arch/x86/hvm/hypercall.c | 9 ++++----- xen/arch/x86/hvm/io.c | 14 -------------- xen/common/ioreq.c | 14 ++++++++++++++ xen/common/memory.c | 5 +++++ xen/include/asm-x86/hvm/domain.h | 1 - xen/include/asm-x86/hvm/io.h | 1 - xen/include/xen/ioreq.h | 2 ++ xen/include/xen/sched.h | 2 ++ 9 files changed, 33 insertions(+), 21 deletions(-) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -XXX,XX +XXX,XX @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr, /* Ensure the hypercall trap instruction is re-executed. */ if ( current->hcall_preempted ) regs->pc -= 4; /* re-execute 'hvc #XEN_HYPERCALL_TAG' */ + +#ifdef CONFIG_IOREQ_SERVER + if ( unlikely(current->domain->qemu_mapcache_invalidate) && + test_and_clear_bool(current->domain->qemu_mapcache_invalidate) ) + send_invalidate_req(); +#endif } void arch_hypercall_tasklet_result(struct vcpu *v, long res) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -XXX,XX +XXX,XX @@ * * Copyright (c) 2017 Citrix Systems Ltd. */ + #include <xen/lib.h> #include <xen/hypercall.h> +#include <xen/ioreq.h> #include <xen/nospec.h> #include <asm/hvm/emulate.h> @@ -XXX,XX +XXX,XX @@ static long hvm_memory_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) else rc = compat_memory_op(cmd, arg); - if ( (cmd & MEMOP_CMD_MASK) == XENMEM_decrease_reservation ) - curr->domain->arch.hvm.qemu_mapcache_invalidate = true; - return rc; } @@ -XXX,XX +XXX,XX @@ int hvm_hypercall(struct cpu_user_regs *regs) if ( curr->hcall_preempted ) return HVM_HCALL_preempted; - if ( unlikely(currd->arch.hvm.qemu_mapcache_invalidate) && - test_and_clear_bool(currd->arch.hvm.qemu_mapcache_invalidate) ) + if ( unlikely(currd->qemu_mapcache_invalidate) && + test_and_clear_bool(currd->qemu_mapcache_invalidate) ) send_invalidate_req(); return HVM_HCALL_completed; diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/io.c +++ b/xen/arch/x86/hvm/io.c @@ -XXX,XX +XXX,XX @@ void send_timeoffset_req(unsigned long timeoff) gprintk(XENLOG_ERR, "Unsuccessful timeoffset update\n"); } -/* Ask ioemu mapcache to invalidate mappings. */ -void send_invalidate_req(void) -{ - ioreq_t p = { - .type = IOREQ_TYPE_INVALIDATE, - .size = 4, - .dir = IOREQ_WRITE, - .data = ~0UL, /* flush all */ - }; - - if ( hvm_broadcast_ioreq(&p, false) != 0 ) - gprintk(XENLOG_ERR, "Unsuccessful map-cache invalidate\n"); -} - bool hvm_emulate_one_insn(hvm_emulate_validate_t *validate, const char *descr) { struct hvm_emulate_ctxt ctxt; diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/ioreq.c +++ b/xen/common/ioreq.c @@ -XXX,XX +XXX,XX @@ #include <public/hvm/ioreq.h> #include <public/hvm/params.h> +/* Ask ioemu mapcache to invalidate mappings. */ +void send_invalidate_req(void) +{ + ioreq_t p = { + .type = IOREQ_TYPE_INVALIDATE, + .size = 4, + .dir = IOREQ_WRITE, + .data = ~0UL, /* flush all */ + }; + + if ( hvm_broadcast_ioreq(&p, false) != 0 ) + gprintk(XENLOG_ERR, "Unsuccessful map-cache invalidate\n"); +} + static void set_ioreq_server(struct domain *d, unsigned int id, struct hvm_ioreq_server *s) { diff --git a/xen/common/memory.c b/xen/common/memory.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -XXX,XX +XXX,XX @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } +#ifdef CONFIG_IOREQ_SERVER + if ( op == XENMEM_decrease_reservation ) + curr_d->qemu_mapcache_invalidate = true; +#endif + return rc; } diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/domain.h +++ b/xen/include/asm-x86/hvm/domain.h @@ -XXX,XX +XXX,XX @@ struct hvm_domain { struct viridian_domain *viridian; - bool_t qemu_mapcache_invalidate; bool_t is_s3_suspended; /* diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/io.h +++ b/xen/include/asm-x86/hvm/io.h @@ -XXX,XX +XXX,XX @@ bool relocate_portio_handler( unsigned int size); void send_timeoffset_req(unsigned long timeoff); -void send_invalidate_req(void); bool handle_mmio_with_translation(unsigned long gla, unsigned long gpfn, struct npfec); bool handle_pio(uint16_t port, unsigned int size, int dir); diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/ioreq.h +++ b/xen/include/xen/ioreq.h @@ -XXX,XX +XXX,XX @@ static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq) (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE); } +void send_invalidate_req(void); + bool hvm_io_pending(struct vcpu *v); bool handle_hvm_io_completion(struct vcpu *v); bool is_ioreq_server_page(struct domain *d, const struct page_info *page); diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -XXX,XX +XXX,XX @@ struct domain /* Argo interdomain communication support */ struct argo_domain *argo; #endif + + bool_t qemu_mapcache_invalidate; }; static inline struct page_list_head *page_to_list( -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> The cmpxchg() in hvm_send_buffered_ioreq() operates on memory shared with the emulator. In order to be on the safe side we need to switch to guest_cmpxchg64() to prevent a domain to DoS Xen on Arm. CC: Julien Grall <jgrall@amazon.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> --- Please note, this patch depends on the following patch on a review: https://patchwork.kernel.org/patch/11715559/ Changes RFC -> V1: - new patch --- --- xen/common/ioreq.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/ioreq.c +++ b/xen/common/ioreq.c @@ -XXX,XX +XXX,XX @@ #include <xen/trace.h> #include <xen/vpci.h> +#include <asm/guest_atomics.h> + #include <public/hvm/dm_op.h> #include <public/hvm/ioreq.h> #include <public/hvm/params.h> @@ -XXX,XX +XXX,XX @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p) new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM; new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM; - cmpxchg(&pg->ptrs.full, old.full, new.full); + guest_cmpxchg64(d, &pg->ptrs.full, old.full, new.full); } notify_via_xen_event_channel(d, s->bufioreq_evtchn); -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> This patch creates specific device node in the Guest device-tree with allocated MMIO range and SPI interrupt if specific 'virtio' property is present in domain config. Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - was squashed with: "[RFC PATCH V1 09/12] libxl: Handle virtio-mmio irq in more correct way" "[RFC PATCH V1 11/12] libxl: Insert "dma-coherent" property into virtio-mmio device node" "[RFC PATCH V1 12/12] libxl: Fix duplicate memory node in DT" - move VirtIO MMIO #define-s to xen/include/public/arch-arm.h --- --- tools/libxl/libxl_arm.c | 58 +++++++++++++++++++++++++++++++++++++++++-- tools/libxl/libxl_types.idl | 1 + tools/xl/xl_parse.c | 1 + xen/include/public/arch-arm.h | 5 ++++ 4 files changed, 63 insertions(+), 2 deletions(-) diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c index XXXXXXX..XXXXXXX 100644 --- a/tools/libxl/libxl_arm.c +++ b/tools/libxl/libxl_arm.c @@ -XXX,XX +XXX,XX @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, { uint32_t nr_spis = 0; unsigned int i; - uint32_t vuart_irq; - bool vuart_enabled = false; + uint32_t vuart_irq, virtio_irq; + bool vuart_enabled = false, virtio_enabled = false; /* * If pl011 vuart is enabled then increment the nr_spis to allow allocation @@ -XXX,XX +XXX,XX @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, vuart_enabled = true; } + /* + * XXX: Handle properly virtio + * A proper solution would be the toolstack to allocate the interrupts + * used by each virtio backend and let the backend now which one is used + */ + if (libxl_defbool_val(d_config->b_info.arch_arm.virtio)) { + nr_spis += (GUEST_VIRTIO_MMIO_SPI - 32) + 1; + virtio_irq = GUEST_VIRTIO_MMIO_SPI; + virtio_enabled = true; + } + for (i = 0; i < d_config->b_info.num_irqs; i++) { uint32_t irq = d_config->b_info.irqs[i]; uint32_t spi; @@ -XXX,XX +XXX,XX @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, return ERROR_FAIL; } + /* The same check as for vpl011 */ + if (virtio_enabled && irq == virtio_irq) { + LOG(ERROR, "Physical IRQ %u conflicting with virtio SPI\n", irq); + return ERROR_FAIL; + } + if (irq < 32) continue; @@ -XXX,XX +XXX,XX @@ static int make_vpl011_uart_node(libxl__gc *gc, void *fdt, return 0; } +static int make_virtio_mmio_node(libxl__gc *gc, void *fdt, + uint64_t base, uint32_t irq) +{ + int res; + gic_interrupt intr; + /* Placeholder for virtio@ + a 64-bit number + \0 */ + char buf[24]; + + snprintf(buf, sizeof(buf), "virtio@%"PRIx64, base); + res = fdt_begin_node(fdt, buf); + if (res) return res; + + res = fdt_property_compat(gc, fdt, 1, "virtio,mmio"); + if (res) return res; + + res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS, + 1, base, GUEST_VIRTIO_MMIO_SIZE); + if (res) return res; + + set_interrupt(intr, irq, 0xf, DT_IRQ_TYPE_EDGE_RISING); + res = fdt_property_interrupts(gc, fdt, &intr, 1); + if (res) return res; + + res = fdt_property(fdt, "dma-coherent", NULL, 0); + if (res) return res; + + res = fdt_end_node(fdt); + if (res) return res; + + return 0; + +} + static const struct arch_info *get_arch_info(libxl__gc *gc, const struct xc_dom_image *dom) { @@ -XXX,XX +XXX,XX @@ next_resize: if (info->tee == LIBXL_TEE_TYPE_OPTEE) FDT( make_optee_node(gc, fdt) ); + if (libxl_defbool_val(info->arch_arm.virtio)) + FDT( make_virtio_mmio_node(gc, fdt, GUEST_VIRTIO_MMIO_BASE, GUEST_VIRTIO_MMIO_SPI) ); + if (pfdt) FDT( copy_partial_fdt(gc, fdt, pfdt) ); @@ -XXX,XX +XXX,XX @@ void libxl__arch_domain_build_info_setdefault(libxl__gc *gc, { /* ACPI is disabled by default */ libxl_defbool_setdefault(&b_info->acpi, false); + libxl_defbool_setdefault(&b_info->arch_arm.virtio, false); if (b_info->type != LIBXL_DOMAIN_TYPE_PV) return; diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index XXXXXXX..XXXXXXX 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -XXX,XX +XXX,XX @@ libxl_domain_build_info = Struct("domain_build_info",[ ("arch_arm", Struct(None, [("gic_version", libxl_gic_version), + ("virtio", libxl_defbool), ("vuart", libxl_vuart_type), ])), # Alternate p2m is not bound to any architecture or guest type, as it is diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c index XXXXXXX..XXXXXXX 100644 --- a/tools/xl/xl_parse.c +++ b/tools/xl/xl_parse.c @@ -XXX,XX +XXX,XX @@ skip_usbdev: } xlu_cfg_get_defbool(config, "dm_restrict", &b_info->dm_restrict, 0); + xlu_cfg_get_defbool(config, "virtio", &b_info->arch_arm.virtio, 0); if (c_info->type == LIBXL_DOMAIN_TYPE_HVM) { if (!xlu_cfg_get_string (config, "vga", &buf, 0)) { diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/public/arch-arm.h +++ b/xen/include/public/arch-arm.h @@ -XXX,XX +XXX,XX @@ typedef uint64_t xen_callback_t; #define PSCI_cpu_on 2 #define PSCI_migrate 3 +/* VirtIO MMIO definitions */ +#define GUEST_VIRTIO_MMIO_BASE xen_mk_ullong(0x02000000) +#define GUEST_VIRTIO_MMIO_SIZE xen_mk_ullong(0x200) +#define GUEST_VIRTIO_MMIO_SPI 33 + #endif #ifndef __ASSEMBLY__ -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> This patch adds basic support for configuring and assisting virtio-disk backend (emualator) which is intended to run out of Qemu and could be run in any domain. Xenstore was chosen as a communication interface for the emulator running in non-toolstack domain to be able to get configuration either by reading Xenstore directly or by receiving command line parameters (an updated 'xl devd' running in the same domain would read Xenstore beforehand and call backend executable with the required arguments). An example of domain configuration (two disks are assigned to the guest, the latter is in readonly mode): vdisk = [ 'backend=DomD, disks=rw:/dev/mmcblk0p3;ro:/dev/mmcblk1p3' ] Where per-disk Xenstore entries are: - filename and readonly flag (configured via "vdisk" property) - base and irq (allocated dynamically) Besides handling 'visible' params described in configuration file, patch also allocates virtio-mmio specific ones for each device and writes them into Xenstore. virtio-mmio params (irq and base) are unique per guest domain, they allocated at the domain creation time and passed through to the emulator. Each VirtIO device has at least one pair of these params. TODO: 1. An extra "virtio" property could be removed. 2. Update documentation. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> --- Changes RFC -> V1: - no changes Please note, there is a real concern about VirtIO interrupts allocation. Just copy here what Stefano said in RFC thread. So, if we end up allocating let's say 6 virtio interrupts for a domain, the chance of a clash with a physical interrupt of a passthrough device is real. I am not entirely sure how to solve it, but these are a few ideas: - choosing virtio interrupts that are less likely to conflict (maybe > 1000) - make the virtio irq (optionally) configurable so that a user could override the default irq and specify one that doesn't conflict - implementing support for virq != pirq (even the xl interface doesn't allow to specify the virq number for passthrough devices, see "irqs") --- --- tools/libxl/Makefile | 4 +- tools/libxl/libxl_arm.c | 56 ++++++++++++++--- tools/libxl/libxl_create.c | 1 + tools/libxl/libxl_internal.h | 1 + tools/libxl/libxl_types.idl | 15 +++++ tools/libxl/libxl_types_internal.idl | 1 + tools/libxl/libxl_virtio_disk.c | 109 +++++++++++++++++++++++++++++++++ tools/xl/Makefile | 2 +- tools/xl/xl.h | 3 + tools/xl/xl_cmdtable.c | 15 +++++ tools/xl/xl_parse.c | 115 +++++++++++++++++++++++++++++++++++ tools/xl/xl_virtio_disk.c | 46 ++++++++++++++ 12 files changed, 356 insertions(+), 12 deletions(-) create mode 100644 tools/libxl/libxl_virtio_disk.c create mode 100644 tools/xl/xl_virtio_disk.c diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index XXXXXXX..XXXXXXX 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -XXX,XX +XXX,XX @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \ libxl_vtpm.o libxl_nic.o libxl_disk.o libxl_console.o \ libxl_cpupool.o libxl_mem.o libxl_sched.o libxl_tmem.o \ libxl_9pfs.o libxl_domain.o libxl_vdispl.o \ - libxl_pvcalls.o libxl_vsnd.o libxl_vkb.o $(LIBXL_OBJS-y) + libxl_pvcalls.o libxl_vsnd.o libxl_vkb.o \ + libxl_virtio_disk.o $(LIBXL_OBJS-y) + LIBXL_OBJS += libxl_genid.o LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c index XXXXXXX..XXXXXXX 100644 --- a/tools/libxl/libxl_arm.c +++ b/tools/libxl/libxl_arm.c @@ -XXX,XX +XXX,XX @@ #include <assert.h> #include <xen/device_tree_defs.h> +#ifndef container_of +#define container_of(ptr, type, member) ({ \ + typeof( ((type *)0)->member ) *__mptr = (ptr); \ + (type *)( (char *)__mptr - offsetof(type,member) );}) +#endif + static const char *gicv_to_string(libxl_gic_version gic_version) { switch (gic_version) { @@ -XXX,XX +XXX,XX @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, vuart_enabled = true; } - /* - * XXX: Handle properly virtio - * A proper solution would be the toolstack to allocate the interrupts - * used by each virtio backend and let the backend now which one is used - */ if (libxl_defbool_val(d_config->b_info.arch_arm.virtio)) { - nr_spis += (GUEST_VIRTIO_MMIO_SPI - 32) + 1; + uint64_t virtio_base; + libxl_device_virtio_disk *virtio_disk; + + virtio_base = GUEST_VIRTIO_MMIO_BASE; virtio_irq = GUEST_VIRTIO_MMIO_SPI; + + if (!d_config->num_virtio_disks) { + LOG(ERROR, "Virtio is enabled, but no Virtio devices present\n"); + return ERROR_FAIL; + } + virtio_disk = &d_config->virtio_disks[0]; + + for (i = 0; i < virtio_disk->num_disks; i++) { + virtio_disk->disks[i].base = virtio_base; + virtio_disk->disks[i].irq = virtio_irq; + + LOG(DEBUG, "Allocate Virtio MMIO params: IRQ %u BASE 0x%"PRIx64, + virtio_irq, virtio_base); + + virtio_irq ++; + virtio_base += GUEST_VIRTIO_MMIO_SIZE; + } + virtio_irq --; + + nr_spis += (virtio_irq - 32) + 1; virtio_enabled = true; } @@ -XXX,XX +XXX,XX @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, } /* The same check as for vpl011 */ - if (virtio_enabled && irq == virtio_irq) { - LOG(ERROR, "Physical IRQ %u conflicting with virtio SPI\n", irq); + if (virtio_enabled && + (irq >= GUEST_VIRTIO_MMIO_SPI && irq <= virtio_irq)) { + LOG(ERROR, "Physical IRQ %u conflicting with Virtio IRQ range\n", irq); return ERROR_FAIL; } @@ -XXX,XX +XXX,XX @@ next_resize: if (info->tee == LIBXL_TEE_TYPE_OPTEE) FDT( make_optee_node(gc, fdt) ); - if (libxl_defbool_val(info->arch_arm.virtio)) - FDT( make_virtio_mmio_node(gc, fdt, GUEST_VIRTIO_MMIO_BASE, GUEST_VIRTIO_MMIO_SPI) ); + if (libxl_defbool_val(info->arch_arm.virtio)) { + libxl_domain_config *d_config = + container_of(info, libxl_domain_config, b_info); + libxl_device_virtio_disk *virtio_disk = &d_config->virtio_disks[0]; + unsigned int i; + + for (i = 0; i < virtio_disk->num_disks; i++) { + uint64_t base = virtio_disk->disks[i].base; + uint32_t irq = virtio_disk->disks[i].irq; + + FDT( make_virtio_mmio_node(gc, fdt, base, irq) ); + } + } if (pfdt) FDT( copy_partial_fdt(gc, fdt, pfdt) ); diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index XXXXXXX..XXXXXXX 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -XXX,XX +XXX,XX @@ const libxl__device_type *device_type_tbl[] = { &libxl__dtdev_devtype, &libxl__vdispl_devtype, &libxl__vsnd_devtype, + &libxl__virtio_disk_devtype, NULL }; diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index XXXXXXX..XXXXXXX 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -XXX,XX +XXX,XX @@ extern const libxl__device_type libxl__vdispl_devtype; extern const libxl__device_type libxl__p9_devtype; extern const libxl__device_type libxl__pvcallsif_devtype; extern const libxl__device_type libxl__vsnd_devtype; +extern const libxl__device_type libxl__virtio_disk_devtype; extern const libxl__device_type *device_type_tbl[]; diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index XXXXXXX..XXXXXXX 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -XXX,XX +XXX,XX @@ libxl_device_vsnd = Struct("device_vsnd", [ ("pcms", Array(libxl_vsnd_pcm, "num_vsnd_pcms")) ]) +libxl_virtio_disk_param = Struct("virtio_disk_param", [ + ("filename", string), + ("readonly", bool), + ("irq", uint32), + ("base", uint64), + ]) + +libxl_device_virtio_disk = Struct("device_virtio_disk", [ + ("backend_domid", libxl_domid), + ("backend_domname", string), + ("devid", libxl_devid), + ("disks", Array(libxl_virtio_disk_param, "num_disks")), + ]) + libxl_domain_config = Struct("domain_config", [ ("c_info", libxl_domain_create_info), ("b_info", libxl_domain_build_info), @@ -XXX,XX +XXX,XX @@ libxl_domain_config = Struct("domain_config", [ ("pvcallsifs", Array(libxl_device_pvcallsif, "num_pvcallsifs")), ("vdispls", Array(libxl_device_vdispl, "num_vdispls")), ("vsnds", Array(libxl_device_vsnd, "num_vsnds")), + ("virtio_disks", Array(libxl_device_virtio_disk, "num_virtio_disks")), # a channel manifests as a console with a name, # see docs/misc/channels.txt ("channels", Array(libxl_device_channel, "num_channels")), diff --git a/tools/libxl/libxl_types_internal.idl b/tools/libxl/libxl_types_internal.idl index XXXXXXX..XXXXXXX 100644 --- a/tools/libxl/libxl_types_internal.idl +++ b/tools/libxl/libxl_types_internal.idl @@ -XXX,XX +XXX,XX @@ libxl__device_kind = Enumeration("device_kind", [ (14, "PVCALLS"), (15, "VSND"), (16, "VINPUT"), + (17, "VIRTIO_DISK"), ]) libxl__console_backend = Enumeration("console_backend", [ diff --git a/tools/libxl/libxl_virtio_disk.c b/tools/libxl/libxl_virtio_disk.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/tools/libxl/libxl_virtio_disk.c @@ -XXX,XX +XXX,XX @@ +/* + * Copyright (C) 2020 EPAM Systems Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#include "libxl_internal.h" + +static int libxl__device_virtio_disk_setdefault(libxl__gc *gc, uint32_t domid, + libxl_device_virtio_disk *virtio_disk, + bool hotplug) +{ + return libxl__resolve_domid(gc, virtio_disk->backend_domname, + &virtio_disk->backend_domid); +} + +static int libxl__virtio_disk_from_xenstore(libxl__gc *gc, const char *libxl_path, + libxl_devid devid, + libxl_device_virtio_disk *virtio_disk) +{ + const char *be_path; + int rc; + + virtio_disk->devid = devid; + rc = libxl__xs_read_mandatory(gc, XBT_NULL, + GCSPRINTF("%s/backend", libxl_path), + &be_path); + if (rc) return rc; + + rc = libxl__backendpath_parse_domid(gc, be_path, &virtio_disk->backend_domid); + if (rc) return rc; + + return 0; +} + +static void libxl__update_config_virtio_disk(libxl__gc *gc, + libxl_device_virtio_disk *dst, + libxl_device_virtio_disk *src) +{ + dst->devid = src->devid; +} + +static int libxl_device_virtio_disk_compare(libxl_device_virtio_disk *d1, + libxl_device_virtio_disk *d2) +{ + return COMPARE_DEVID(d1, d2); +} + +static void libxl__device_virtio_disk_add(libxl__egc *egc, uint32_t domid, + libxl_device_virtio_disk *virtio_disk, + libxl__ao_device *aodev) +{ + libxl__device_add_async(egc, domid, &libxl__virtio_disk_devtype, virtio_disk, aodev); +} + +static int libxl__set_xenstore_virtio_disk(libxl__gc *gc, uint32_t domid, + libxl_device_virtio_disk *virtio_disk, + flexarray_t *back, flexarray_t *front, + flexarray_t *ro_front) +{ + int rc; + unsigned int i; + + for (i = 0; i < virtio_disk->num_disks; i++) { + rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/filename", i), + GCSPRINTF("%s", virtio_disk->disks[i].filename)); + if (rc) return rc; + + rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/readonly", i), + GCSPRINTF("%d", virtio_disk->disks[i].readonly)); + if (rc) return rc; + + rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/base", i), + GCSPRINTF("%lu", virtio_disk->disks[i].base)); + if (rc) return rc; + + rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/irq", i), + GCSPRINTF("%u", virtio_disk->disks[i].irq)); + if (rc) return rc; + } + + return 0; +} + +static LIBXL_DEFINE_UPDATE_DEVID(virtio_disk) +static LIBXL_DEFINE_DEVICE_FROM_TYPE(virtio_disk) +static LIBXL_DEFINE_DEVICES_ADD(virtio_disk) + +DEFINE_DEVICE_TYPE_STRUCT(virtio_disk, VIRTIO_DISK, + .update_config = (device_update_config_fn_t) libxl__update_config_virtio_disk, + .from_xenstore = (device_from_xenstore_fn_t) libxl__virtio_disk_from_xenstore, + .set_xenstore_config = (device_set_xenstore_config_fn_t) libxl__set_xenstore_virtio_disk +); + +/* + * Local variables: + * mode: C + * c-basic-offset: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/tools/xl/Makefile b/tools/xl/Makefile index XXXXXXX..XXXXXXX 100644 --- a/tools/xl/Makefile +++ b/tools/xl/Makefile @@ -XXX,XX +XXX,XX @@ XL_OBJS += xl_vtpm.o xl_block.o xl_nic.o xl_usb.o XL_OBJS += xl_sched.o xl_pci.o xl_vcpu.o xl_cdrom.o xl_mem.o XL_OBJS += xl_info.o xl_console.o xl_misc.o XL_OBJS += xl_vmcontrol.o xl_saverestore.o xl_migrate.o -XL_OBJS += xl_vdispl.o xl_vsnd.o xl_vkb.o +XL_OBJS += xl_vdispl.o xl_vsnd.o xl_vkb.o xl_virtio_disk.o $(XL_OBJS): CFLAGS += $(CFLAGS_libxentoollog) $(XL_OBJS): CFLAGS += $(CFLAGS_XL) diff --git a/tools/xl/xl.h b/tools/xl/xl.h index XXXXXXX..XXXXXXX 100644 --- a/tools/xl/xl.h +++ b/tools/xl/xl.h @@ -XXX,XX +XXX,XX @@ int main_vsnddetach(int argc, char **argv); int main_vkbattach(int argc, char **argv); int main_vkblist(int argc, char **argv); int main_vkbdetach(int argc, char **argv); +int main_virtio_diskattach(int argc, char **argv); +int main_virtio_disklist(int argc, char **argv); +int main_virtio_diskdetach(int argc, char **argv); int main_usbctrl_attach(int argc, char **argv); int main_usbctrl_detach(int argc, char **argv); int main_usbdev_attach(int argc, char **argv); diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c index XXXXXXX..XXXXXXX 100644 --- a/tools/xl/xl_cmdtable.c +++ b/tools/xl/xl_cmdtable.c @@ -XXX,XX +XXX,XX @@ struct cmd_spec cmd_table[] = { "Destroy a domain's virtual sound device", "<Domain> <DevId>", }, + { "virtio-disk-attach", + &main_virtio_diskattach, 1, 1, + "Create a new virtio block device", + " TBD\n" + }, + { "virtio-disk-list", + &main_virtio_disklist, 0, 0, + "List virtio block devices for a domain", + "<Domain(s)>", + }, + { "virtio-disk-detach", + &main_virtio_diskdetach, 0, 1, + "Destroy a domain's virtio block device", + "<Domain> <DevId>", + }, { "uptime", &main_uptime, 0, 0, "Print uptime for all/some domains", diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c index XXXXXXX..XXXXXXX 100644 --- a/tools/xl/xl_parse.c +++ b/tools/xl/xl_parse.c @@ -XXX,XX +XXX,XX @@ out: if (rc) exit(EXIT_FAILURE); } +#define MAX_VIRTIO_DISKS 4 + +static int parse_virtio_disk_config(libxl_device_virtio_disk *virtio_disk, char *token) +{ + char *oparg; + libxl_string_list disks = NULL; + int i, rc; + + if (MATCH_OPTION("backend", token, oparg)) { + virtio_disk->backend_domname = strdup(oparg); + } else if (MATCH_OPTION("disks", token, oparg)) { + split_string_into_string_list(oparg, ";", &disks); + + virtio_disk->num_disks = libxl_string_list_length(&disks); + if (virtio_disk->num_disks > MAX_VIRTIO_DISKS) { + fprintf(stderr, "vdisk: currently only %d disks are supported", + MAX_VIRTIO_DISKS); + return 1; + } + virtio_disk->disks = xcalloc(virtio_disk->num_disks, + sizeof(*virtio_disk->disks)); + + for(i = 0; i < virtio_disk->num_disks; i++) { + char *disk_opt; + + rc = split_string_into_pair(disks[i], ":", &disk_opt, + &virtio_disk->disks[i].filename); + if (rc) { + fprintf(stderr, "vdisk: failed to split \"%s\" into pair\n", + disks[i]); + goto out; + } + + if (!strcmp(disk_opt, "ro")) + virtio_disk->disks[i].readonly = 1; + else if (!strcmp(disk_opt, "rw")) + virtio_disk->disks[i].readonly = 0; + else { + fprintf(stderr, "vdisk: failed to parse \"%s\" disk option\n", + disk_opt); + rc = 1; + } + free(disk_opt); + + if (rc) goto out; + } + } else { + fprintf(stderr, "Unknown string \"%s\" in vdisk spec\n", token); + rc = 1; goto out; + } + + rc = 0; + +out: + libxl_string_list_dispose(&disks); + return rc; +} + +static void parse_virtio_disk_list(const XLU_Config *config, + libxl_domain_config *d_config) +{ + XLU_ConfigList *virtio_disks; + const char *item; + char *buf = NULL; + int rc; + + if (!xlu_cfg_get_list (config, "vdisk", &virtio_disks, 0, 0)) { + libxl_domain_build_info *b_info = &d_config->b_info; + int entry = 0; + + /* XXX Remove an extra property */ + libxl_defbool_setdefault(&b_info->arch_arm.virtio, false); + if (!libxl_defbool_val(b_info->arch_arm.virtio)) { + fprintf(stderr, "Virtio device requires Virtio property to be set\n"); + exit(EXIT_FAILURE); + } + + while ((item = xlu_cfg_get_listitem(virtio_disks, entry)) != NULL) { + libxl_device_virtio_disk *virtio_disk; + char *p; + + virtio_disk = ARRAY_EXTEND_INIT(d_config->virtio_disks, + d_config->num_virtio_disks, + libxl_device_virtio_disk_init); + + buf = strdup(item); + + p = strtok (buf, ","); + while (p != NULL) + { + while (*p == ' ') p++; + + rc = parse_virtio_disk_config(virtio_disk, p); + if (rc) goto out; + + p = strtok (NULL, ","); + } + + entry++; + + if (virtio_disk->num_disks == 0) { + fprintf(stderr, "At least one virtio disk should be specified\n"); + rc = 1; goto out; + } + } + } + + rc = 0; + +out: + free(buf); + if (rc) exit(EXIT_FAILURE); +} + void parse_config_data(const char *config_source, const char *config_data, int config_len, @@ -XXX,XX +XXX,XX @@ skip_usbdev: } parse_vkb_list(config, d_config); + parse_virtio_disk_list(config, d_config); xlu_cfg_get_defbool(config, "xend_suspend_evtchn_compat", &c_info->xend_suspend_evtchn_compat, 0); diff --git a/tools/xl/xl_virtio_disk.c b/tools/xl/xl_virtio_disk.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/tools/xl/xl_virtio_disk.c @@ -XXX,XX +XXX,XX @@ +/* + * Copyright (C) 2020 EPAM Systems Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#include <stdlib.h> + +#include <libxl.h> +#include <libxl_utils.h> +#include <libxlutil.h> + +#include "xl.h" +#include "xl_utils.h" +#include "xl_parse.h" + +int main_virtio_diskattach(int argc, char **argv) +{ + return 0; +} + +int main_virtio_disklist(int argc, char **argv) +{ + return 0; +} + +int main_virtio_diskdetach(int argc, char **argv) +{ + return 0; +} + +/* + * Local variables: + * mode: C + * c-basic-offset: 4 + * indent-tabs-mode: nil + * End: + */ -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Hello all. The purpose of this patch series is to add IOREQ/DM support to Xen on Arm. You can find an initial discussion at [1] and RFC-V3 series at [2]-[5]. Xen on Arm requires some implementation to forward guest MMIO access to a device model in order to implement virtio-mmio backend or even mediator outside of hypervisor. As Xen on x86 already contains required support this series tries to make it common and introduce Arm specific bits plus some new functionality. Patch series is based on Julien's PoC "xen/arm: Add support for Guest IO forwarding to a device emulator". Besides splitting existing IOREQ/DM support and introducing Arm side, the series also includes virtio-mmio related changes (last 2 patches for toolstack) for the reviewers to be able to see how the whole picture could look like and give it a try. According to the initial/subsequent discussions there are a few open questions/concerns regarding security, performance in VirtIO solution: 1. virtio-mmio vs virtio-pci, SPI vs MSI, or even a composition of virtio-mmio + MSI, different use-cases require different transport... 2. virtio backend is able to access all guest memory, some kind of protection is needed: 'virtio-iommu in Xen' vs 'pre-shared-memory & memcpys in guest', etc (for the first two Alex have provided valuable input at [6]) 3. interface between toolstack and 'out-of-qemu' virtio backend, avoid using Xenstore in virtio backend if possible. Also, there is a desire to make VirtIO backend hypervisor-agnostic. 4. a lot of 'foreing mapping' could lead to the memory exhaustion at the host side, as we are stealing the page from host memory in order to map the guest page. Julien has some idea regarding that. 5. Julien also has some ideas how to optimize the IOREQ code: 5.1 vcpu_ioreq_handle_completion (former handle_hvm_io_completion) which is called in an hotpath on Arm (everytime we are re-entering to the guest): Ideally, vcpu_ioreq_handle_completion should be a NOP (at max a few instructions) if there is nothing to do (if we don't have I/O forwarded to an IOREQ server). Maybe we want to introduce a per-vCPU flag indicating if an I/O has been forwarded to an IOREQ server. This would allow us to bypass most of the function if there is nothing to do. 5.2 The current way to handle MMIO is the following: - Pause the vCPU - Forward the access to the backend domain - Schedule the backend domain - Wait for the access to be handled - Unpause the vCPU The sequence is going to be fairly expensive on Xen. It might be possible to optimize the ACK and avoid to wait for the backend to handle the access. Looks like all of them are valid and worth considering, but the first thing which we need on Arm is a mechanism to forward guest IO to a device emulator, so let's focus on it in the first place. *** There are a lot of changes since RFC series, almost all TODOs were resolved on Arm, Arm code was improved and hardened, common IOREQ/DM code became really arch-agnostic (without HVM-ism), the "legacy" mechanism of mapping magic pages for the IOREQ servers was left x86 specific, etc. Also patch that makes DM code public was reworked to have the top level dm-op handling arch-specific and call into ioreq_server_dm_op() for otherwise unhandled ops. But one TODO still remains which is "PIO handling" on Arm. The "PIO handling" TODO is expected to left unaddressed for the current series. It is not an big issue for now while Xen doesn't have support for vPCI on Arm. On Arm64 they are only used for PCI IO Bar and we would probably want to expose them to emulator as PIO access to make a DM completely arch-agnostic. So "PIO handling" should be implemented when we add support for vPCI. I left interface untouched in the following patch "xen/dm: Introduce xendevicemodel_set_irq_level DM op" since there is still an open discussion what interface to use/what information to pass to the hypervisor. There are patches on review this series depends on: https://patchwork.kernel.org/patch/11816689 https://patchwork.kernel.org/patch/11803383 Please note, that IOREQ feature is disabled by default on Arm within current series. *** Patch series [7] was rebased on recent "staging branch" (7ba2ab4 x86/p2m: Fix paging_gva_to_gfn() for nested virt) and tested on Renesas Salvator-X board + H3 ES3.0 SoC (Arm64) with virtio-mmio disk backend [8] running in driver domain and unmodified Linux Guest running on existing virtio-blk driver (frontend). No issues were observed. Guest domain 'reboot/destroy' use-cases work properly. Patch series was only build-tested on x86. Please note, build-test passed for the following modes: 1. x86: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y (default) 2. x86: #CONFIG_HVM is not set / #CONFIG_IOREQ_SERVER is not set 3. Arm64: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y 4. Arm64: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set (default) 5. Arm32: CONFIG_HVM=y / CONFIG_IOREQ_SERVER=y 6. Arm32: CONFIG_HVM=y / #CONFIG_IOREQ_SERVER is not set (default) *** Any feedback/help would be highly appreciated. [1] https://lists.xenproject.org/archives/html/xen-devel/2020-07/msg00825.html [2] https://lists.xenproject.org/archives/html/xen-devel/2020-08/msg00071.html [3] https://lists.xenproject.org/archives/html/xen-devel/2020-09/msg00732.html [4] https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg01077.html [5] https://lists.xenproject.org/archives/html/xen-devel/2020-11/msg02188.html [6] https://lists.xenproject.org/archives/html/xen-devel/2020-11/msg02212.html [7] https://github.com/otyshchenko1/xen/commits/ioreq_4.14_ml5 [8] https://github.com/xen-troops/virtio-disk/commits/ioreq_ml1 Julien Grall (5): xen/ioreq: Make x86's IOREQ related dm-op handling common xen/mm: Make x86's XENMEM_resource_ioreq_server handling common arm/ioreq: Introduce arch specific bits for IOREQ/DM features xen/dm: Introduce xendevicemodel_set_irq_level DM op libxl: Introduce basic virtio-mmio support on Arm Oleksandr Tyshchenko (19): x86/ioreq: Prepare IOREQ feature for making it common x86/ioreq: Add IOREQ_STATUS_* #define-s and update code for moving x86/ioreq: Provide out-of-line wrapper for the handle_mmio() xen/ioreq: Make x86's IOREQ feature common xen/ioreq: Make x86's hvm_ioreq_needs_completion() common xen/ioreq: Make x86's hvm_mmio_first(last)_byte() common xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common xen/ioreq: Move x86's ioreq_server to struct domain xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu xen/ioreq: Remove "hvm" prefixes from involved function names xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg() xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm xen/ioreq: Introduce domain_has_ioreq_server() xen/arm: io: Abstract sign-extension xen/arm: io: Harden sign extension check xen/ioreq: Make x86's send_invalidate_req() common xen/arm: Add mapcache invalidation handling [RFC] libxl: Add support for virtio-disk configuration MAINTAINERS | 8 +- tools/include/xendevicemodel.h | 4 + tools/libs/devicemodel/core.c | 18 + tools/libs/devicemodel/libxendevicemodel.map | 1 + tools/libs/light/Makefile | 1 + tools/libs/light/libxl_arm.c | 94 +- tools/libs/light/libxl_create.c | 1 + tools/libs/light/libxl_internal.h | 1 + tools/libs/light/libxl_types.idl | 16 + tools/libs/light/libxl_types_internal.idl | 1 + tools/libs/light/libxl_virtio_disk.c | 109 ++ tools/xl/Makefile | 2 +- tools/xl/xl.h | 3 + tools/xl/xl_cmdtable.c | 15 + tools/xl/xl_parse.c | 116 +++ tools/xl/xl_virtio_disk.c | 46 + xen/arch/arm/Makefile | 2 + xen/arch/arm/dm.c | 174 ++++ xen/arch/arm/domain.c | 9 + xen/arch/arm/io.c | 30 +- xen/arch/arm/ioreq.c | 198 ++++ xen/arch/arm/p2m.c | 51 +- xen/arch/arm/traps.c | 72 +- xen/arch/x86/Kconfig | 1 + xen/arch/x86/hvm/dm.c | 107 +- xen/arch/x86/hvm/emulate.c | 220 ++-- xen/arch/x86/hvm/hvm.c | 14 +- xen/arch/x86/hvm/hypercall.c | 9 +- xen/arch/x86/hvm/intercept.c | 5 +- xen/arch/x86/hvm/io.c | 52 +- xen/arch/x86/hvm/ioreq.c | 1375 ++----------------------- xen/arch/x86/hvm/stdvga.c | 12 +- xen/arch/x86/hvm/svm/nestedsvm.c | 2 +- xen/arch/x86/hvm/vmx/realmode.c | 8 +- xen/arch/x86/hvm/vmx/vvmx.c | 5 +- xen/arch/x86/mm.c | 46 +- xen/arch/x86/mm/p2m.c | 17 +- xen/arch/x86/mm/shadow/common.c | 2 +- xen/common/Kconfig | 3 + xen/common/Makefile | 1 + xen/common/ioreq.c | 1426 ++++++++++++++++++++++++++ xen/common/memory.c | 72 +- xen/include/asm-arm/domain.h | 3 + xen/include/asm-arm/hvm/ioreq.h | 72 ++ xen/include/asm-arm/mm.h | 8 - xen/include/asm-arm/mmio.h | 1 + xen/include/asm-arm/p2m.h | 19 +- xen/include/asm-arm/traps.h | 25 + xen/include/asm-x86/hvm/domain.h | 43 - xen/include/asm-x86/hvm/emulate.h | 2 +- xen/include/asm-x86/hvm/io.h | 17 - xen/include/asm-x86/hvm/ioreq.h | 39 +- xen/include/asm-x86/hvm/vcpu.h | 18 - xen/include/asm-x86/mm.h | 4 - xen/include/asm-x86/p2m.h | 27 +- xen/include/public/arch-arm.h | 5 + xen/include/public/hvm/dm_op.h | 16 + xen/include/xen/dm.h | 39 + xen/include/xen/ioreq.h | 140 +++ xen/include/xen/p2m-common.h | 4 + xen/include/xen/sched.h | 34 + xen/include/xsm/dummy.h | 4 +- xen/include/xsm/xsm.h | 6 +- xen/xsm/dummy.c | 2 +- xen/xsm/flask/hooks.c | 5 +- 65 files changed, 3073 insertions(+), 1809 deletions(-) create mode 100644 tools/libs/light/libxl_virtio_disk.c create mode 100644 tools/xl/xl_virtio_disk.c create mode 100644 xen/arch/arm/dm.c create mode 100644 xen/arch/arm/ioreq.c create mode 100644 xen/common/ioreq.c create mode 100644 xen/include/asm-arm/hvm/ioreq.h create mode 100644 xen/include/xen/dm.h create mode 100644 xen/include/xen/ioreq.h -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> As a lot of x86 code can be re-used on Arm later on, this patch makes some preparation to x86/hvm/ioreq.c before moving to the common code. This way we will get a verbatim copy for a code movement in subsequent patch. This patch mostly introduces specific hooks to abstract arch specific materials taking into the account the requirment to leave the "legacy" mechanism of mapping magic pages for the IOREQ servers x86 specific and not expose it to the common code. These hooks are named according to the more consistent new naming scheme right away (including dropping the "hvm" prefixes and infixes): - IOREQ server functions should start with "ioreq_server_" - IOREQ functions should start with "ioreq_" other functions will be renamed in subsequent patches. Also re-order #include-s alphabetically. This support is going to be used on Arm to be able run device emulator outside of Xen hypervisor. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> CC: Julien Grall <julien.grall@arm.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - new patch, was split from: "[RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common" - fold the check of p->type into hvm_get_ioreq_server_range_type() and make it return success/failure - remove relocate_portio_handler() call from arch_hvm_ioreq_destroy() in arch/x86/hvm/ioreq.c - introduce arch_hvm_destroy_ioreq_server()/arch_handle_hvm_io_completion() Changes V1 -> V2: - update patch description - make arch functions inline and put them into arch header to achieve a truly rename by the subsequent patch - return void in arch_hvm_destroy_ioreq_server() - return bool in arch_hvm_ioreq_destroy() - bring relocate_portio_handler() back to arch_hvm_ioreq_destroy() - rename IOREQ_IO* to IOREQ_STATUS* - remove *handle* from arch_handle_hvm_io_completion() - re-order #include-s alphabetically - rename hvm_get_ioreq_server_range_type() to hvm_ioreq_server_get_type_addr() and add "const" to several arguments Changes V2 -> V3: - update patch description - name new arch hooks according to the new naming scheme - don't make arch hooks inline, move them ioreq.c - make get_ioreq_server() local again - rework the whole patch taking into the account that "legacy" interface should remain x86 specific (additional arch hooks, etc) - update the code to be able to use hvm_map_mem_type_to_ioreq_server() in the common code (an extra arch hook, etc) - don’t include <asm/hvm/emulate.h> from arch header - add "arch" prefix to hvm_ioreq_server_get_type_addr() - move IOREQ_STATUS_* #define-s introduction to the separate patch - move HANDLE_BUFIOREQ to the arch header - just return relocate_portio_handler() from arch_ioreq_server_destroy_all() - misc adjustments proposed by Jan (adding const, unsigned int instead of uint32_t) Changes V3 -> V4: - add Alex's R-b - update patch description - make arch_ioreq_server_get_type_addr return bool - drop #include <xen/ctype.h> - use two arch hooks in hvm_map_mem_type_to_ioreq_server() to avoid calling p2m_change_entry_type_global() with lock held --- xen/arch/x86/hvm/ioreq.c | 179 ++++++++++++++++++++++++++-------------- xen/include/asm-x86/hvm/ioreq.h | 22 +++++ 2 files changed, 141 insertions(+), 60 deletions(-) diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/ioreq.c +++ b/xen/arch/x86/hvm/ioreq.c @@ -XXX,XX +XXX,XX @@ * this program; If not, see <http://www.gnu.org/licenses/>. */ -#include <xen/ctype.h> +#include <xen/domain.h> +#include <xen/event.h> #include <xen/init.h> +#include <xen/irq.h> #include <xen/lib.h> -#include <xen/trace.h> +#include <xen/paging.h> #include <xen/sched.h> -#include <xen/irq.h> #include <xen/softirq.h> -#include <xen/domain.h> -#include <xen/event.h> -#include <xen/paging.h> +#include <xen/trace.h> #include <xen/vpci.h> #include <asm/hvm/emulate.h> @@ -XXX,XX +XXX,XX @@ static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p) return true; } +bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion) +{ + switch ( io_completion ) + { + case HVMIO_realmode_completion: + { + struct hvm_emulate_ctxt ctxt; + + hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs()); + vmx_realmode_emulate_one(&ctxt); + hvm_emulate_writeback(&ctxt); + + break; + } + + default: + ASSERT_UNREACHABLE(); + break; + } + + return true; +} + bool handle_hvm_io_completion(struct vcpu *v) { struct domain *d = v->domain; @@ -XXX,XX +XXX,XX @@ bool handle_hvm_io_completion(struct vcpu *v) return handle_pio(vio->io_req.addr, vio->io_req.size, vio->io_req.dir); - case HVMIO_realmode_completion: - { - struct hvm_emulate_ctxt ctxt; - - hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs()); - vmx_realmode_emulate_one(&ctxt); - hvm_emulate_writeback(&ctxt); - - break; - } default: - ASSERT_UNREACHABLE(); - break; + return arch_vcpu_ioreq_completion(io_completion); } return true; @@ -XXX,XX +XXX,XX @@ static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s, } } -#define HANDLE_BUFIOREQ(s) \ - ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) - static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s, struct vcpu *v) { @@ -XXX,XX +XXX,XX @@ static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s) spin_unlock(&s->lock); } -static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s) +int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s) { int rc; @@ -XXX,XX +XXX,XX @@ static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s) return rc; } -static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s) +void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s) { hvm_unmap_ioreq_gfn(s, true); hvm_unmap_ioreq_gfn(s, false); @@ -XXX,XX +XXX,XX @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, return rc; } +void arch_ioreq_server_enable(struct hvm_ioreq_server *s) +{ + hvm_remove_ioreq_gfn(s, false); + hvm_remove_ioreq_gfn(s, true); +} + static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s) { struct hvm_ioreq_vcpu *sv; @@ -XXX,XX +XXX,XX @@ static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s) if ( s->enabled ) goto done; - hvm_remove_ioreq_gfn(s, false); - hvm_remove_ioreq_gfn(s, true); + arch_ioreq_server_enable(s); s->enabled = true; @@ -XXX,XX +XXX,XX @@ static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s) spin_unlock(&s->lock); } +void arch_ioreq_server_disable(struct hvm_ioreq_server *s) +{ + hvm_add_ioreq_gfn(s, true); + hvm_add_ioreq_gfn(s, false); +} + static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s) { spin_lock(&s->lock); @@ -XXX,XX +XXX,XX @@ static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s) if ( !s->enabled ) goto done; - hvm_add_ioreq_gfn(s, true); - hvm_add_ioreq_gfn(s, false); + arch_ioreq_server_disable(s); s->enabled = false; @@ -XXX,XX +XXX,XX @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, fail_add: hvm_ioreq_server_remove_all_vcpus(s); - hvm_ioreq_server_unmap_pages(s); + arch_ioreq_server_unmap_pages(s); hvm_ioreq_server_free_rangesets(s); @@ -XXX,XX +XXX,XX @@ static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s) hvm_ioreq_server_remove_all_vcpus(s); /* - * NOTE: It is safe to call both hvm_ioreq_server_unmap_pages() and + * NOTE: It is safe to call both arch_ioreq_server_unmap_pages() and * hvm_ioreq_server_free_pages() in that order. * This is because the former will do nothing if the pages * are not mapped, leaving the page to be freed by the latter. @@ -XXX,XX +XXX,XX @@ static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s) * the page_info pointer to NULL, meaning the latter will do * nothing. */ - hvm_ioreq_server_unmap_pages(s); + arch_ioreq_server_unmap_pages(s); hvm_ioreq_server_free_pages(s); hvm_ioreq_server_free_rangesets(s); @@ -XXX,XX +XXX,XX @@ int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, return rc; } +/* Called when target domain is paused */ +void arch_ioreq_server_destroy(struct hvm_ioreq_server *s) +{ + p2m_set_ioreq_server(s->target, 0, s); +} + int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id) { struct hvm_ioreq_server *s; @@ -XXX,XX +XXX,XX @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id) domain_pause(d); - p2m_set_ioreq_server(d, 0, s); + arch_ioreq_server_destroy(s); hvm_ioreq_server_disable(s); @@ -XXX,XX +XXX,XX @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id, if ( ioreq_gfn || bufioreq_gfn ) { - rc = hvm_ioreq_server_map_pages(s); + rc = arch_ioreq_server_map_pages(s); if ( rc ) goto out; } @@ -XXX,XX +XXX,XX @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, return rc; } +/* Called with ioreq_server lock held */ +int arch_ioreq_server_map_mem_type(struct domain *d, + struct hvm_ioreq_server *s, + uint32_t flags) +{ + return p2m_set_ioreq_server(d, flags, s); +} + +void arch_ioreq_server_map_mem_type_completed(struct domain *d, + struct hvm_ioreq_server *s, + uint32_t flags) +{ + if ( flags == 0 ) + { + const struct p2m_domain *p2m = p2m_get_hostp2m(d); + + if ( read_atomic(&p2m->ioreq.entry_count) ) + p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw); + } +} + /* * Map or unmap an ioreq server to specific memory type. For now, only * HVMMEM_ioreq_server is supported, and in the future new types can be @@ -XXX,XX +XXX,XX @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, if ( s->emulator != current->domain ) goto out; - rc = p2m_set_ioreq_server(d, flags, s); + rc = arch_ioreq_server_map_mem_type(d, s, flags); out: spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - if ( rc == 0 && flags == 0 ) - { - struct p2m_domain *p2m = p2m_get_hostp2m(d); - - if ( read_atomic(&p2m->ioreq.entry_count) ) - p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw); - } + if ( rc == 0 ) + arch_ioreq_server_map_mem_type_completed(d, s, flags); return rc; } @@ -XXX,XX +XXX,XX @@ void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v) spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); } +bool arch_ioreq_server_destroy_all(struct domain *d) +{ + return relocate_portio_handler(d, 0xcf8, 0xcf8, 4); +} + void hvm_destroy_all_ioreq_servers(struct domain *d) { struct hvm_ioreq_server *s; unsigned int id; - if ( !relocate_portio_handler(d, 0xcf8, 0xcf8, 4) ) + if ( !arch_ioreq_server_destroy_all(d) ) return; spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); @@ -XXX,XX +XXX,XX @@ void hvm_destroy_all_ioreq_servers(struct domain *d) spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); } -struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, - ioreq_t *p) +bool arch_ioreq_server_get_type_addr(const struct domain *d, + const ioreq_t *p, + uint8_t *type, + uint64_t *addr) { - struct hvm_ioreq_server *s; - uint32_t cf8; - uint8_t type; - uint64_t addr; - unsigned int id; + unsigned int cf8 = d->arch.hvm.pci_cf8; if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO ) - return NULL; - - cf8 = d->arch.hvm.pci_cf8; + return false; if ( p->type == IOREQ_TYPE_PIO && (p->addr & ~3) == 0xcfc && CF8_ENABLED(cf8) ) { - uint32_t x86_fam; + unsigned int x86_fam, reg; pci_sbdf_t sbdf; - unsigned int reg; reg = hvm_pci_decode_addr(cf8, p->addr, &sbdf); /* PCI config data cycle */ - type = XEN_DMOP_IO_RANGE_PCI; - addr = ((uint64_t)sbdf.sbdf << 32) | reg; + *type = XEN_DMOP_IO_RANGE_PCI; + *addr = ((uint64_t)sbdf.sbdf << 32) | reg; /* AMD extended configuration space access? */ if ( CF8_ADDR_HI(cf8) && d->arch.cpuid->x86_vendor == X86_VENDOR_AMD && @@ -XXX,XX +XXX,XX @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, if ( !rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) && (msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT)) ) - addr |= CF8_ADDR_HI(cf8); + *addr |= CF8_ADDR_HI(cf8); } } else { - type = (p->type == IOREQ_TYPE_PIO) ? - XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY; - addr = p->addr; + *type = (p->type == IOREQ_TYPE_PIO) ? + XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY; + *addr = p->addr; } + return true; +} + +struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, + ioreq_t *p) +{ + struct hvm_ioreq_server *s; + uint8_t type; + uint64_t addr; + unsigned int id; + + if ( !arch_ioreq_server_get_type_addr(d, p, &type, &addr) ) + return NULL; + FOR_EACH_IOREQ_SERVER(d, id, s) { struct rangeset *r; @@ -XXX,XX +XXX,XX @@ static int hvm_access_cf8( return X86EMUL_UNHANDLEABLE; } +void arch_ioreq_domain_init(struct domain *d) +{ + register_portio_handler(d, 0xcf8, 4, hvm_access_cf8); +} + void hvm_ioreq_init(struct domain *d) { spin_lock_init(&d->arch.hvm.ioreq_server.lock); - register_portio_handler(d, 0xcf8, 4, hvm_access_cf8); + arch_ioreq_domain_init(d); } /* diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/ioreq.h +++ b/xen/include/asm-x86/hvm/ioreq.h @@ -XXX,XX +XXX,XX @@ #ifndef __ASM_X86_HVM_IOREQ_H__ #define __ASM_X86_HVM_IOREQ_H__ +#define HANDLE_BUFIOREQ(s) \ + ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) + bool hvm_io_pending(struct vcpu *v); bool handle_hvm_io_completion(struct vcpu *v); bool is_ioreq_server_page(struct domain *d, const struct page_info *page); @@ -XXX,XX +XXX,XX @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered); void hvm_ioreq_init(struct domain *d); +bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion); +int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s); +void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s); +void arch_ioreq_server_enable(struct hvm_ioreq_server *s); +void arch_ioreq_server_disable(struct hvm_ioreq_server *s); +void arch_ioreq_server_destroy(struct hvm_ioreq_server *s); +int arch_ioreq_server_map_mem_type(struct domain *d, + struct hvm_ioreq_server *s, + uint32_t flags); +void arch_ioreq_server_map_mem_type_completed(struct domain *d, + struct hvm_ioreq_server *s, + uint32_t flags); +bool arch_ioreq_server_destroy_all(struct domain *d); +bool arch_ioreq_server_get_type_addr(const struct domain *d, + const ioreq_t *p, + uint8_t *type, + uint64_t *addr); +void arch_ioreq_domain_init(struct domain *d); + #endif /* __ASM_X86_HVM_IOREQ_H__ */ /* -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> This patch continues to make some preparation to x86/hvm/ioreq.c before moving to the common code. Add IOREQ_STATUS_* #define-s and update candidates for moving since X86EMUL_* shouldn't be exposed to the common code in that form. This support is going to be used on Arm to be able run device emulator outside of Xen hypervisor. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> CC: Julien Grall <julien.grall@arm.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes V2 -> V3: - new patch, was split from [PATCH V2 01/23] x86/ioreq: Prepare IOREQ feature for making it common Changes V3 -> V4: - add Alex's R-b and Jan's A-b - add a comment above IOREQ_STATUS_* #define-s --- xen/arch/x86/hvm/ioreq.c | 16 ++++++++-------- xen/include/asm-x86/hvm/ioreq.h | 5 +++++ 2 files changed, 13 insertions(+), 8 deletions(-) diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/ioreq.c +++ b/xen/arch/x86/hvm/ioreq.c @@ -XXX,XX +XXX,XX @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p) pg = iorp->va; if ( !pg ) - return X86EMUL_UNHANDLEABLE; + return IOREQ_STATUS_UNHANDLED; /* * Return 0 for the cases we can't deal with: @@ -XXX,XX +XXX,XX @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p) break; default: gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size); - return X86EMUL_UNHANDLEABLE; + return IOREQ_STATUS_UNHANDLED; } spin_lock(&s->bufioreq_lock); @@ -XXX,XX +XXX,XX @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p) { /* The queue is full: send the iopacket through the normal path. */ spin_unlock(&s->bufioreq_lock); - return X86EMUL_UNHANDLEABLE; + return IOREQ_STATUS_UNHANDLED; } pg->buf_ioreq[pg->ptrs.write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp; @@ -XXX,XX +XXX,XX @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p) notify_via_xen_event_channel(d, s->bufioreq_evtchn); spin_unlock(&s->bufioreq_lock); - return X86EMUL_OKAY; + return IOREQ_STATUS_HANDLED; } int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p, @@ -XXX,XX +XXX,XX @@ int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p, return hvm_send_buffered_ioreq(s, proto_p); if ( unlikely(!vcpu_start_shutdown_deferral(curr)) ) - return X86EMUL_RETRY; + return IOREQ_STATUS_RETRY; list_for_each_entry ( sv, &s->ioreq_vcpu_list, @@ -XXX,XX +XXX,XX @@ int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p, notify_via_xen_event_channel(d, port); sv->pending = true; - return X86EMUL_RETRY; + return IOREQ_STATUS_RETRY; } } - return X86EMUL_UNHANDLEABLE; + return IOREQ_STATUS_UNHANDLED; } unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered) @@ -XXX,XX +XXX,XX @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered) if ( !s->enabled ) continue; - if ( hvm_send_ioreq(s, p, buffered) == X86EMUL_UNHANDLEABLE ) + if ( hvm_send_ioreq(s, p, buffered) == IOREQ_STATUS_UNHANDLED ) failed++; } diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/ioreq.h +++ b/xen/include/asm-x86/hvm/ioreq.h @@ -XXX,XX +XXX,XX @@ bool arch_ioreq_server_get_type_addr(const struct domain *d, uint64_t *addr); void arch_ioreq_domain_init(struct domain *d); +/* This correlation must not be altered */ +#define IOREQ_STATUS_HANDLED X86EMUL_OKAY +#define IOREQ_STATUS_UNHANDLED X86EMUL_UNHANDLEABLE +#define IOREQ_STATUS_RETRY X86EMUL_RETRY + #endif /* __ASM_X86_HVM_IOREQ_H__ */ /* -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> The IOREQ is about to be common feature and Arm will have its own implementation. But the name of the function is pretty generic and can be confusing on Arm (we already have a try_handle_mmio()). In order not to rename the function (which is used for a varying set of purposes on x86) globally and get non-confusing variant on Arm provide a wrapper arch_ioreq_complete_mmio() to be used on common and Arm code. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> CC: Julien Grall <julien.grall@arm.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - new patch Changes V1 -> V2: - remove "handle" - add Jan's A-b Changes V2 -> V3: - remove Jan's A-b - update patch subject/description - use out-of-line function instead of #define - put earlier in the series to avoid breakage Changes V3 -> V4: - add Jan's R-b - rename ioreq_complete_mmio() to arch_ioreq_complete_mmio() --- xen/arch/x86/hvm/ioreq.c | 7 ++++++- xen/include/asm-x86/hvm/ioreq.h | 1 + 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/ioreq.c +++ b/xen/arch/x86/hvm/ioreq.c @@ -XXX,XX +XXX,XX @@ #include <public/hvm/ioreq.h> #include <public/hvm/params.h> +bool arch_ioreq_complete_mmio(void) +{ + return handle_mmio(); +} + static void set_ioreq_server(struct domain *d, unsigned int id, struct hvm_ioreq_server *s) { @@ -XXX,XX +XXX,XX @@ bool handle_hvm_io_completion(struct vcpu *v) break; case HVMIO_mmio_completion: - return handle_mmio(); + return arch_ioreq_complete_mmio(); case HVMIO_pio_completion: return handle_pio(vio->io_req.addr, vio->io_req.size, diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/ioreq.h +++ b/xen/include/asm-x86/hvm/ioreq.h @@ -XXX,XX +XXX,XX @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered); void hvm_ioreq_init(struct domain *d); +bool arch_ioreq_complete_mmio(void); bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion); int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s); void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s); -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> As a lot of x86 code can be re-used on Arm later on, this patch moves previously prepared IOREQ support to the common code (the code movement is verbatim copy). The "legacy" mechanism of mapping magic pages for the IOREQ servers remains x86 specific and not exposed to the common code. The common IOREQ feature is supposed to be built with IOREQ_SERVER option enabled, which is selected for x86's config HVM for now. In order to avoid having a gigantic patch here, the subsequent patches will update remaining bits in the common code step by step: - Make IOREQ related structs/materials common - Drop the "hvm" prefixes and infixes - Remove layering violation by moving corresponding fields out of *arch.hvm* or abstracting away accesses to them Also include <xen/domain_page.h> which will be needed on Arm to avoid touch the common code again when introducing Arm specific bits. This support is going to be used on Arm to be able run device emulator outside of Xen hypervisor. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> CC: Julien Grall <julien.grall@arm.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" *** Please note, this patch depends on the following which is on review: https://patchwork.kernel.org/patch/11816689/ *** Changes RFC -> V1: - was split into three patches: - x86/ioreq: Prepare IOREQ feature for making it common - xen/ioreq: Make x86's IOREQ feature common - xen/ioreq: Make x86's hvm_ioreq_needs_completion() common - update MAINTAINERS file - do not use a separate subdir for the IOREQ stuff, move it to: - xen/common/ioreq.c - xen/include/xen/ioreq.h - update x86's files to include xen/ioreq.h - remove unneeded headers in arch/x86/hvm/ioreq.c - re-order the headers alphabetically in common/ioreq.c - update common/ioreq.c according to the newly introduced arch functions: arch_hvm_destroy_ioreq_server()/arch_handle_hvm_io_completion() Changes V1 -> V2: - update patch description - make everything needed in the previous patch to achieve a truly rename here - don't include unnecessary headers from asm-x86/hvm/ioreq.h and xen/ioreq.h - use __XEN_IOREQ_H__ instead of __IOREQ_H__ - move get_ioreq_server() to common/ioreq.c Changes V2 -> V3: - update patch description - make everything needed in the previous patch to not expose "legacy" interface to the common code here - update patch according the "legacy interface" is x86 specific - include <xen/domain_page.h> in common ioreq.c Changes V3 -> V4: - rebase - don't include <xen/ioreq.h> from arch header - мove all arch hook declarations to the common header --- MAINTAINERS | 8 +- xen/arch/x86/Kconfig | 1 + xen/arch/x86/hvm/dm.c | 2 +- xen/arch/x86/hvm/emulate.c | 2 +- xen/arch/x86/hvm/hvm.c | 2 +- xen/arch/x86/hvm/io.c | 2 +- xen/arch/x86/hvm/ioreq.c | 1347 ++------------------------------------- xen/arch/x86/hvm/stdvga.c | 2 +- xen/arch/x86/hvm/vmx/vvmx.c | 3 +- xen/arch/x86/mm.c | 2 +- xen/arch/x86/mm/shadow/common.c | 2 +- xen/common/Kconfig | 3 + xen/common/Makefile | 1 + xen/common/ioreq.c | 1290 +++++++++++++++++++++++++++++++++++++ xen/include/asm-x86/hvm/ioreq.h | 59 -- xen/include/xen/ioreq.h | 93 +++ 16 files changed, 1455 insertions(+), 1364 deletions(-) create mode 100644 xen/common/ioreq.c create mode 100644 xen/include/xen/ioreq.h diff --git a/MAINTAINERS b/MAINTAINERS index XXXXXXX..XXXXXXX 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -XXX,XX +XXX,XX @@ X: xen/drivers/passthrough/vtd/ X: xen/drivers/passthrough/device_tree.c F: xen/include/xen/iommu.h +I/O EMULATION (IOREQ) +M: Paul Durrant <paul@xen.org> +S: Supported +F: xen/common/ioreq.c +F: xen/include/xen/ioreq.h +F: xen/include/public/hvm/ioreq.h + KCONFIG M: Doug Goldstein <cardoe@cardoe.com> S: Supported @@ -XXX,XX +XXX,XX @@ F: xen/arch/x86/hvm/ioreq.c F: xen/include/asm-x86/hvm/emulate.h F: xen/include/asm-x86/hvm/io.h F: xen/include/asm-x86/hvm/ioreq.h -F: xen/include/public/hvm/ioreq.h X86 MEMORY MANAGEMENT M: Jan Beulich <jbeulich@suse.com> diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/Kconfig +++ b/xen/arch/x86/Kconfig @@ -XXX,XX +XXX,XX @@ config PV_LINEAR_PT config HVM def_bool !PV_SHIM_EXCLUSIVE + select IOREQ_SERVER prompt "HVM support" ---help--- Interfaces to support HVM domains. HVM domains require hardware diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/dm.c +++ b/xen/arch/x86/hvm/dm.c @@ -XXX,XX +XXX,XX @@ #include <xen/event.h> #include <xen/guest_access.h> #include <xen/hypercall.h> +#include <xen/ioreq.h> #include <xen/nospec.h> #include <xen/sched.h> #include <asm/hap.h> #include <asm/hvm/cacheattr.h> -#include <asm/hvm/ioreq.h> #include <asm/shadow.h> #include <xsm/xsm.h> diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -XXX,XX +XXX,XX @@ */ #include <xen/init.h> +#include <xen/ioreq.h> #include <xen/lib.h> #include <xen/sched.h> #include <xen/paging.h> @@ -XXX,XX +XXX,XX @@ #include <asm/xstate.h> #include <asm/hvm/emulate.h> #include <asm/hvm/hvm.h> -#include <asm/hvm/ioreq.h> #include <asm/hvm/monitor.h> #include <asm/hvm/trace.h> #include <asm/hvm/support.h> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -XXX,XX +XXX,XX @@ #include <xen/ctype.h> #include <xen/init.h> +#include <xen/ioreq.h> #include <xen/lib.h> #include <xen/trace.h> #include <xen/sched.h> @@ -XXX,XX +XXX,XX @@ #include <asm/hvm/trace.h> #include <asm/hvm/nestedhvm.h> #include <asm/hvm/monitor.h> -#include <asm/hvm/ioreq.h> #include <asm/hvm/viridian.h> #include <asm/hvm/vm_event.h> #include <asm/altp2m.h> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/io.c +++ b/xen/arch/x86/hvm/io.c @@ -XXX,XX +XXX,XX @@ */ #include <xen/init.h> +#include <xen/ioreq.h> #include <xen/mm.h> #include <xen/lib.h> #include <xen/errno.h> @@ -XXX,XX +XXX,XX @@ #include <asm/shadow.h> #include <asm/p2m.h> #include <asm/hvm/hvm.h> -#include <asm/hvm/ioreq.h> #include <asm/hvm/support.h> #include <asm/hvm/vpt.h> #include <asm/hvm/vpic.h> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/ioreq.c +++ b/xen/arch/x86/hvm/ioreq.c @@ -XXX,XX +XXX,XX @@ #include <xen/domain.h> #include <xen/event.h> #include <xen/init.h> +#include <xen/ioreq.h> #include <xen/irq.h> #include <xen/lib.h> #include <xen/paging.h> @@ -XXX,XX +XXX,XX @@ #include <asm/hvm/emulate.h> #include <asm/hvm/hvm.h> -#include <asm/hvm/ioreq.h> #include <asm/hvm/vmx/vmx.h> #include <public/hvm/ioreq.h> @@ -XXX,XX +XXX,XX @@ bool arch_ioreq_complete_mmio(void) return handle_mmio(); } -static void set_ioreq_server(struct domain *d, unsigned int id, - struct hvm_ioreq_server *s) -{ - ASSERT(id < MAX_NR_IOREQ_SERVERS); - ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]); - - d->arch.hvm.ioreq_server.server[id] = s; -} - -#define GET_IOREQ_SERVER(d, id) \ - (d)->arch.hvm.ioreq_server.server[id] - -static struct hvm_ioreq_server *get_ioreq_server(const struct domain *d, - unsigned int id) -{ - if ( id >= MAX_NR_IOREQ_SERVERS ) - return NULL; - - return GET_IOREQ_SERVER(d, id); -} - -/* - * Iterate over all possible ioreq servers. - * - * NOTE: The iteration is backwards such that more recently created - * ioreq servers are favoured in hvm_select_ioreq_server(). - * This is a semantic that previously existed when ioreq servers - * were held in a linked list. - */ -#define FOR_EACH_IOREQ_SERVER(d, id, s) \ - for ( (id) = MAX_NR_IOREQ_SERVERS; (id) != 0; ) \ - if ( !(s = GET_IOREQ_SERVER(d, --(id))) ) \ - continue; \ - else - -static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v) -{ - shared_iopage_t *p = s->ioreq.va; - - ASSERT((v == current) || !vcpu_runnable(v)); - ASSERT(p != NULL); - - return &p->vcpu_ioreq[v->vcpu_id]; -} - -static struct hvm_ioreq_vcpu *get_pending_vcpu(const struct vcpu *v, - struct hvm_ioreq_server **srvp) -{ - struct domain *d = v->domain; - struct hvm_ioreq_server *s; - unsigned int id; - - FOR_EACH_IOREQ_SERVER(d, id, s) - { - struct hvm_ioreq_vcpu *sv; - - list_for_each_entry ( sv, - &s->ioreq_vcpu_list, - list_entry ) - { - if ( sv->vcpu == v && sv->pending ) - { - if ( srvp ) - *srvp = s; - return sv; - } - } - } - - return NULL; -} - -bool hvm_io_pending(struct vcpu *v) -{ - return get_pending_vcpu(v, NULL); -} - -static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p) -{ - unsigned int prev_state = STATE_IOREQ_NONE; - unsigned int state = p->state; - uint64_t data = ~0; - - smp_rmb(); - - /* - * The only reason we should see this condition be false is when an - * emulator dying races with I/O being requested. - */ - while ( likely(state != STATE_IOREQ_NONE) ) - { - if ( unlikely(state < prev_state) ) - { - gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> %u\n", - prev_state, state); - sv->pending = false; - domain_crash(sv->vcpu->domain); - return false; /* bail */ - } - - switch ( prev_state = state ) - { - case STATE_IORESP_READY: /* IORESP_READY -> NONE */ - p->state = STATE_IOREQ_NONE; - data = p->data; - break; - - case STATE_IOREQ_READY: /* IOREQ_{READY,INPROCESS} -> IORESP_READY */ - case STATE_IOREQ_INPROCESS: - wait_on_xen_event_channel(sv->ioreq_evtchn, - ({ state = p->state; - smp_rmb(); - state != prev_state; })); - continue; - - default: - gdprintk(XENLOG_ERR, "Weird HVM iorequest state %u\n", state); - sv->pending = false; - domain_crash(sv->vcpu->domain); - return false; /* bail */ - } - - break; - } - - p = &sv->vcpu->arch.hvm.hvm_io.io_req; - if ( hvm_ioreq_needs_completion(p) ) - p->data = data; - - sv->pending = false; - - return true; -} - bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion) { switch ( io_completion ) @@ -XXX,XX +XXX,XX @@ bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion) return true; } -bool handle_hvm_io_completion(struct vcpu *v) -{ - struct domain *d = v->domain; - struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io; - struct hvm_ioreq_server *s; - struct hvm_ioreq_vcpu *sv; - enum hvm_io_completion io_completion; - - if ( has_vpci(d) && vpci_process_pending(v) ) - { - raise_softirq(SCHEDULE_SOFTIRQ); - return false; - } - - sv = get_pending_vcpu(v, &s); - if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) ) - return false; - - vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ? - STATE_IORESP_READY : STATE_IOREQ_NONE; - - msix_write_completion(v); - vcpu_end_shutdown_deferral(v); - - io_completion = vio->io_completion; - vio->io_completion = HVMIO_no_completion; - - switch ( io_completion ) - { - case HVMIO_no_completion: - break; - - case HVMIO_mmio_completion: - return arch_ioreq_complete_mmio(); - - case HVMIO_pio_completion: - return handle_pio(vio->io_req.addr, vio->io_req.size, - vio->io_req.dir); - - default: - return arch_vcpu_ioreq_completion(io_completion); - } - - return true; -} - static gfn_t hvm_alloc_legacy_ioreq_gfn(struct hvm_ioreq_server *s) { struct domain *d = s->target; @@ -XXX,XX +XXX,XX @@ static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf) return rc; } -static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf) -{ - struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; - struct page_info *page; - - if ( iorp->page ) - { - /* - * If a guest frame has already been mapped (which may happen - * on demand if hvm_get_ioreq_server_info() is called), then - * allocating a page is not permitted. - */ - if ( !gfn_eq(iorp->gfn, INVALID_GFN) ) - return -EPERM; - - return 0; - } - - page = alloc_domheap_page(s->target, MEMF_no_refcount); - - if ( !page ) - return -ENOMEM; - - if ( !get_page_and_type(page, s->target, PGT_writable_page) ) - { - /* - * The domain can't possibly know about this page yet, so failure - * here is a clear indication of something fishy going on. - */ - domain_crash(s->emulator); - return -ENODATA; - } - - iorp->va = __map_domain_page_global(page); - if ( !iorp->va ) - goto fail; - - iorp->page = page; - clear_page(iorp->va); - return 0; - - fail: - put_page_alloc_ref(page); - put_page_and_type(page); - - return -ENOMEM; -} - -static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf) -{ - struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; - struct page_info *page = iorp->page; - - if ( !page ) - return; - - iorp->page = NULL; - - unmap_domain_page_global(iorp->va); - iorp->va = NULL; - - put_page_alloc_ref(page); - put_page_and_type(page); -} - -bool is_ioreq_server_page(struct domain *d, const struct page_info *page) -{ - const struct hvm_ioreq_server *s; - unsigned int id; - bool found = false; - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - FOR_EACH_IOREQ_SERVER(d, id, s) - { - if ( (s->ioreq.page == page) || (s->bufioreq.page == page) ) - { - found = true; - break; - } - } - - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - - return found; -} - static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf) { @@ -XXX,XX +XXX,XX @@ static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf) return rc; } -static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s, - struct hvm_ioreq_vcpu *sv) -{ - ASSERT(spin_is_locked(&s->lock)); - - if ( s->ioreq.va != NULL ) - { - ioreq_t *p = get_ioreq(s, sv->vcpu); - - p->vp_eport = sv->ioreq_evtchn; - } -} - -static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s, - struct vcpu *v) -{ - struct hvm_ioreq_vcpu *sv; - int rc; - - sv = xzalloc(struct hvm_ioreq_vcpu); - - rc = -ENOMEM; - if ( !sv ) - goto fail1; - - spin_lock(&s->lock); - - rc = alloc_unbound_xen_event_channel(v->domain, v->vcpu_id, - s->emulator->domain_id, NULL); - if ( rc < 0 ) - goto fail2; - - sv->ioreq_evtchn = rc; - - if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) ) - { - rc = alloc_unbound_xen_event_channel(v->domain, 0, - s->emulator->domain_id, NULL); - if ( rc < 0 ) - goto fail3; - - s->bufioreq_evtchn = rc; - } - - sv->vcpu = v; - - list_add(&sv->list_entry, &s->ioreq_vcpu_list); - - if ( s->enabled ) - hvm_update_ioreq_evtchn(s, sv); - - spin_unlock(&s->lock); - return 0; - - fail3: - free_xen_event_channel(v->domain, sv->ioreq_evtchn); - - fail2: - spin_unlock(&s->lock); - xfree(sv); - - fail1: - return rc; -} - -static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, - struct vcpu *v) -{ - struct hvm_ioreq_vcpu *sv; - - spin_lock(&s->lock); - - list_for_each_entry ( sv, - &s->ioreq_vcpu_list, - list_entry ) - { - if ( sv->vcpu != v ) - continue; - - list_del(&sv->list_entry); - - if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) ) - free_xen_event_channel(v->domain, s->bufioreq_evtchn); - - free_xen_event_channel(v->domain, sv->ioreq_evtchn); - - xfree(sv); - break; - } - - spin_unlock(&s->lock); -} - -static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s) -{ - struct hvm_ioreq_vcpu *sv, *next; - - spin_lock(&s->lock); - - list_for_each_entry_safe ( sv, - next, - &s->ioreq_vcpu_list, - list_entry ) - { - struct vcpu *v = sv->vcpu; - - list_del(&sv->list_entry); - - if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) ) - free_xen_event_channel(v->domain, s->bufioreq_evtchn); - - free_xen_event_channel(v->domain, sv->ioreq_evtchn); - - xfree(sv); - } - - spin_unlock(&s->lock); -} - int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s) { int rc; @@ -XXX,XX +XXX,XX @@ void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s) hvm_unmap_ioreq_gfn(s, false); } -static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s) +void arch_ioreq_server_enable(struct hvm_ioreq_server *s) { - int rc; - - rc = hvm_alloc_ioreq_mfn(s, false); - - if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) ) - rc = hvm_alloc_ioreq_mfn(s, true); - - if ( rc ) - hvm_free_ioreq_mfn(s, false); - - return rc; + hvm_remove_ioreq_gfn(s, false); + hvm_remove_ioreq_gfn(s, true); } -static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s) +void arch_ioreq_server_disable(struct hvm_ioreq_server *s) { - hvm_free_ioreq_mfn(s, true); - hvm_free_ioreq_mfn(s, false); + hvm_add_ioreq_gfn(s, true); + hvm_add_ioreq_gfn(s, false); } -static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s) +/* Called when target domain is paused */ +void arch_ioreq_server_destroy(struct hvm_ioreq_server *s) { - unsigned int i; - - for ( i = 0; i < NR_IO_RANGE_TYPES; i++ ) - rangeset_destroy(s->range[i]); + p2m_set_ioreq_server(s->target, 0, s); } -static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, - ioservid_t id) +/* Called with ioreq_server lock held */ +int arch_ioreq_server_map_mem_type(struct domain *d, + struct hvm_ioreq_server *s, + uint32_t flags) { - unsigned int i; - int rc; + return p2m_set_ioreq_server(d, flags, s); +} - for ( i = 0; i < NR_IO_RANGE_TYPES; i++ ) +void arch_ioreq_server_map_mem_type_completed(struct domain *d, + struct hvm_ioreq_server *s, + uint32_t flags) +{ + if ( flags == 0 ) { - char *name; - - rc = asprintf(&name, "ioreq_server %d %s", id, - (i == XEN_DMOP_IO_RANGE_PORT) ? "port" : - (i == XEN_DMOP_IO_RANGE_MEMORY) ? "memory" : - (i == XEN_DMOP_IO_RANGE_PCI) ? "pci" : - ""); - if ( rc ) - goto fail; - - s->range[i] = rangeset_new(s->target, name, - RANGESETF_prettyprint_hex); - - xfree(name); - - rc = -ENOMEM; - if ( !s->range[i] ) - goto fail; + const struct p2m_domain *p2m = p2m_get_hostp2m(d); - rangeset_limit(s->range[i], MAX_NR_IO_RANGES); + if ( read_atomic(&p2m->ioreq.entry_count) ) + p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw); } - - return 0; - - fail: - hvm_ioreq_server_free_rangesets(s); - - return rc; } -void arch_ioreq_server_enable(struct hvm_ioreq_server *s) +bool arch_ioreq_server_destroy_all(struct domain *d) { - hvm_remove_ioreq_gfn(s, false); - hvm_remove_ioreq_gfn(s, true); + return relocate_portio_handler(d, 0xcf8, 0xcf8, 4); } -static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s) +bool arch_ioreq_server_get_type_addr(const struct domain *d, + const ioreq_t *p, + uint8_t *type, + uint64_t *addr) { - struct hvm_ioreq_vcpu *sv; - - spin_lock(&s->lock); - - if ( s->enabled ) - goto done; - - arch_ioreq_server_enable(s); + unsigned int cf8 = d->arch.hvm.pci_cf8; - s->enabled = true; + if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO ) + return false; - list_for_each_entry ( sv, - &s->ioreq_vcpu_list, - list_entry ) - hvm_update_ioreq_evtchn(s, sv); + if ( p->type == IOREQ_TYPE_PIO && + (p->addr & ~3) == 0xcfc && + CF8_ENABLED(cf8) ) + { + unsigned int x86_fam, reg; + pci_sbdf_t sbdf; - done: - spin_unlock(&s->lock); -} + reg = hvm_pci_decode_addr(cf8, p->addr, &sbdf); -void arch_ioreq_server_disable(struct hvm_ioreq_server *s) -{ - hvm_add_ioreq_gfn(s, true); - hvm_add_ioreq_gfn(s, false); -} - -static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s) -{ - spin_lock(&s->lock); - - if ( !s->enabled ) - goto done; - - arch_ioreq_server_disable(s); - - s->enabled = false; - - done: - spin_unlock(&s->lock); -} - -static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, - struct domain *d, int bufioreq_handling, - ioservid_t id) -{ - struct domain *currd = current->domain; - struct vcpu *v; - int rc; - - s->target = d; - - get_knownalive_domain(currd); - s->emulator = currd; - - spin_lock_init(&s->lock); - INIT_LIST_HEAD(&s->ioreq_vcpu_list); - spin_lock_init(&s->bufioreq_lock); - - s->ioreq.gfn = INVALID_GFN; - s->bufioreq.gfn = INVALID_GFN; - - rc = hvm_ioreq_server_alloc_rangesets(s, id); - if ( rc ) - return rc; - - s->bufioreq_handling = bufioreq_handling; - - for_each_vcpu ( d, v ) - { - rc = hvm_ioreq_server_add_vcpu(s, v); - if ( rc ) - goto fail_add; - } - - return 0; - - fail_add: - hvm_ioreq_server_remove_all_vcpus(s); - arch_ioreq_server_unmap_pages(s); - - hvm_ioreq_server_free_rangesets(s); - - put_domain(s->emulator); - return rc; -} - -static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s) -{ - ASSERT(!s->enabled); - hvm_ioreq_server_remove_all_vcpus(s); - - /* - * NOTE: It is safe to call both arch_ioreq_server_unmap_pages() and - * hvm_ioreq_server_free_pages() in that order. - * This is because the former will do nothing if the pages - * are not mapped, leaving the page to be freed by the latter. - * However if the pages are mapped then the former will set - * the page_info pointer to NULL, meaning the latter will do - * nothing. - */ - arch_ioreq_server_unmap_pages(s); - hvm_ioreq_server_free_pages(s); - - hvm_ioreq_server_free_rangesets(s); - - put_domain(s->emulator); -} - -int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, - ioservid_t *id) -{ - struct hvm_ioreq_server *s; - unsigned int i; - int rc; - - if ( bufioreq_handling > HVM_IOREQSRV_BUFIOREQ_ATOMIC ) - return -EINVAL; - - s = xzalloc(struct hvm_ioreq_server); - if ( !s ) - return -ENOMEM; - - domain_pause(d); - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - for ( i = 0; i < MAX_NR_IOREQ_SERVERS; i++ ) - { - if ( !GET_IOREQ_SERVER(d, i) ) - break; - } - - rc = -ENOSPC; - if ( i >= MAX_NR_IOREQ_SERVERS ) - goto fail; - - /* - * It is safe to call set_ioreq_server() prior to - * hvm_ioreq_server_init() since the target domain is paused. - */ - set_ioreq_server(d, i, s); - - rc = hvm_ioreq_server_init(s, d, bufioreq_handling, i); - if ( rc ) - { - set_ioreq_server(d, i, NULL); - goto fail; - } - - if ( id ) - *id = i; - - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - domain_unpause(d); - - return 0; - - fail: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - domain_unpause(d); - - xfree(s); - return rc; -} - -/* Called when target domain is paused */ -void arch_ioreq_server_destroy(struct hvm_ioreq_server *s) -{ - p2m_set_ioreq_server(s->target, 0, s); -} - -int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id) -{ - struct hvm_ioreq_server *s; - int rc; - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - s = get_ioreq_server(d, id); - - rc = -ENOENT; - if ( !s ) - goto out; - - rc = -EPERM; - if ( s->emulator != current->domain ) - goto out; - - domain_pause(d); - - arch_ioreq_server_destroy(s); - - hvm_ioreq_server_disable(s); - - /* - * It is safe to call hvm_ioreq_server_deinit() prior to - * set_ioreq_server() since the target domain is paused. - */ - hvm_ioreq_server_deinit(s); - set_ioreq_server(d, id, NULL); - - domain_unpause(d); - - xfree(s); - - rc = 0; - - out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - - return rc; -} - -int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id, - unsigned long *ioreq_gfn, - unsigned long *bufioreq_gfn, - evtchn_port_t *bufioreq_port) -{ - struct hvm_ioreq_server *s; - int rc; - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - s = get_ioreq_server(d, id); - - rc = -ENOENT; - if ( !s ) - goto out; - - rc = -EPERM; - if ( s->emulator != current->domain ) - goto out; - - if ( ioreq_gfn || bufioreq_gfn ) - { - rc = arch_ioreq_server_map_pages(s); - if ( rc ) - goto out; - } - - if ( ioreq_gfn ) - *ioreq_gfn = gfn_x(s->ioreq.gfn); - - if ( HANDLE_BUFIOREQ(s) ) - { - if ( bufioreq_gfn ) - *bufioreq_gfn = gfn_x(s->bufioreq.gfn); - - if ( bufioreq_port ) - *bufioreq_port = s->bufioreq_evtchn; - } - - rc = 0; - - out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - - return rc; -} - -int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id, - unsigned long idx, mfn_t *mfn) -{ - struct hvm_ioreq_server *s; - int rc; - - ASSERT(is_hvm_domain(d)); - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - s = get_ioreq_server(d, id); - - rc = -ENOENT; - if ( !s ) - goto out; - - rc = -EPERM; - if ( s->emulator != current->domain ) - goto out; - - rc = hvm_ioreq_server_alloc_pages(s); - if ( rc ) - goto out; - - switch ( idx ) - { - case XENMEM_resource_ioreq_server_frame_bufioreq: - rc = -ENOENT; - if ( !HANDLE_BUFIOREQ(s) ) - goto out; - - *mfn = page_to_mfn(s->bufioreq.page); - rc = 0; - break; - - case XENMEM_resource_ioreq_server_frame_ioreq(0): - *mfn = page_to_mfn(s->ioreq.page); - rc = 0; - break; - - default: - rc = -EINVAL; - break; - } - - out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - - return rc; -} - -int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, - uint32_t type, uint64_t start, - uint64_t end) -{ - struct hvm_ioreq_server *s; - struct rangeset *r; - int rc; - - if ( start > end ) - return -EINVAL; - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - s = get_ioreq_server(d, id); - - rc = -ENOENT; - if ( !s ) - goto out; - - rc = -EPERM; - if ( s->emulator != current->domain ) - goto out; - - switch ( type ) - { - case XEN_DMOP_IO_RANGE_PORT: - case XEN_DMOP_IO_RANGE_MEMORY: - case XEN_DMOP_IO_RANGE_PCI: - r = s->range[type]; - break; - - default: - r = NULL; - break; - } - - rc = -EINVAL; - if ( !r ) - goto out; - - rc = -EEXIST; - if ( rangeset_overlaps_range(r, start, end) ) - goto out; - - rc = rangeset_add_range(r, start, end); - - out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - - return rc; -} - -int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, - uint32_t type, uint64_t start, - uint64_t end) -{ - struct hvm_ioreq_server *s; - struct rangeset *r; - int rc; - - if ( start > end ) - return -EINVAL; - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - s = get_ioreq_server(d, id); - - rc = -ENOENT; - if ( !s ) - goto out; - - rc = -EPERM; - if ( s->emulator != current->domain ) - goto out; - - switch ( type ) - { - case XEN_DMOP_IO_RANGE_PORT: - case XEN_DMOP_IO_RANGE_MEMORY: - case XEN_DMOP_IO_RANGE_PCI: - r = s->range[type]; - break; - - default: - r = NULL; - break; - } - - rc = -EINVAL; - if ( !r ) - goto out; - - rc = -ENOENT; - if ( !rangeset_contains_range(r, start, end) ) - goto out; - - rc = rangeset_remove_range(r, start, end); - - out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - - return rc; -} - -/* Called with ioreq_server lock held */ -int arch_ioreq_server_map_mem_type(struct domain *d, - struct hvm_ioreq_server *s, - uint32_t flags) -{ - return p2m_set_ioreq_server(d, flags, s); -} - -void arch_ioreq_server_map_mem_type_completed(struct domain *d, - struct hvm_ioreq_server *s, - uint32_t flags) -{ - if ( flags == 0 ) - { - const struct p2m_domain *p2m = p2m_get_hostp2m(d); - - if ( read_atomic(&p2m->ioreq.entry_count) ) - p2m_change_entry_type_global(d, p2m_ioreq_server, p2m_ram_rw); - } -} - -/* - * Map or unmap an ioreq server to specific memory type. For now, only - * HVMMEM_ioreq_server is supported, and in the future new types can be - * introduced, e.g. HVMMEM_ioreq_serverX mapped to ioreq server X. And - * currently, only write operations are to be forwarded to an ioreq server. - * Support for the emulation of read operations can be added when an ioreq - * server has such requirement in the future. - */ -int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, - uint32_t type, uint32_t flags) -{ - struct hvm_ioreq_server *s; - int rc; - - if ( type != HVMMEM_ioreq_server ) - return -EINVAL; - - if ( flags & ~XEN_DMOP_IOREQ_MEM_ACCESS_WRITE ) - return -EINVAL; - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - s = get_ioreq_server(d, id); - - rc = -ENOENT; - if ( !s ) - goto out; - - rc = -EPERM; - if ( s->emulator != current->domain ) - goto out; - - rc = arch_ioreq_server_map_mem_type(d, s, flags); - - out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - - if ( rc == 0 ) - arch_ioreq_server_map_mem_type_completed(d, s, flags); - - return rc; -} - -int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, - bool enabled) -{ - struct hvm_ioreq_server *s; - int rc; - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - s = get_ioreq_server(d, id); - - rc = -ENOENT; - if ( !s ) - goto out; - - rc = -EPERM; - if ( s->emulator != current->domain ) - goto out; - - domain_pause(d); - - if ( enabled ) - hvm_ioreq_server_enable(s); - else - hvm_ioreq_server_disable(s); - - domain_unpause(d); - - rc = 0; - - out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - return rc; -} - -int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v) -{ - struct hvm_ioreq_server *s; - unsigned int id; - int rc; - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - FOR_EACH_IOREQ_SERVER(d, id, s) - { - rc = hvm_ioreq_server_add_vcpu(s, v); - if ( rc ) - goto fail; - } - - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - - return 0; - - fail: - while ( ++id != MAX_NR_IOREQ_SERVERS ) - { - s = GET_IOREQ_SERVER(d, id); - - if ( !s ) - continue; - - hvm_ioreq_server_remove_vcpu(s, v); - } - - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); - - return rc; -} - -void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v) -{ - struct hvm_ioreq_server *s; - unsigned int id; - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - FOR_EACH_IOREQ_SERVER(d, id, s) - hvm_ioreq_server_remove_vcpu(s, v); - - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); -} - -bool arch_ioreq_server_destroy_all(struct domain *d) -{ - return relocate_portio_handler(d, 0xcf8, 0xcf8, 4); -} - -void hvm_destroy_all_ioreq_servers(struct domain *d) -{ - struct hvm_ioreq_server *s; - unsigned int id; - - if ( !arch_ioreq_server_destroy_all(d) ) - return; - - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); - - /* No need to domain_pause() as the domain is being torn down */ - - FOR_EACH_IOREQ_SERVER(d, id, s) - { - hvm_ioreq_server_disable(s); - - /* - * It is safe to call hvm_ioreq_server_deinit() prior to - * set_ioreq_server() since the target domain is being destroyed. - */ - hvm_ioreq_server_deinit(s); - set_ioreq_server(d, id, NULL); - - xfree(s); - } - - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); -} - -bool arch_ioreq_server_get_type_addr(const struct domain *d, - const ioreq_t *p, - uint8_t *type, - uint64_t *addr) -{ - unsigned int cf8 = d->arch.hvm.pci_cf8; - - if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO ) - return false; - - if ( p->type == IOREQ_TYPE_PIO && - (p->addr & ~3) == 0xcfc && - CF8_ENABLED(cf8) ) - { - unsigned int x86_fam, reg; - pci_sbdf_t sbdf; - - reg = hvm_pci_decode_addr(cf8, p->addr, &sbdf); - - /* PCI config data cycle */ - *type = XEN_DMOP_IO_RANGE_PCI; - *addr = ((uint64_t)sbdf.sbdf << 32) | reg; - /* AMD extended configuration space access? */ - if ( CF8_ADDR_HI(cf8) && - d->arch.cpuid->x86_vendor == X86_VENDOR_AMD && - (x86_fam = get_cpu_family( - d->arch.cpuid->basic.raw_fms, NULL, NULL)) >= 0x10 && - x86_fam < 0x17 ) - { - uint64_t msr_val; + /* PCI config data cycle */ + *type = XEN_DMOP_IO_RANGE_PCI; + *addr = ((uint64_t)sbdf.sbdf << 32) | reg; + /* AMD extended configuration space access? */ + if ( CF8_ADDR_HI(cf8) && + d->arch.cpuid->x86_vendor == X86_VENDOR_AMD && + (x86_fam = get_cpu_family( + d->arch.cpuid->basic.raw_fms, NULL, NULL)) >= 0x10 && + x86_fam < 0x17 ) + { + uint64_t msr_val; if ( !rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) && (msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT)) ) @@ -XXX,XX +XXX,XX @@ bool arch_ioreq_server_get_type_addr(const struct domain *d, return true; } -struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, - ioreq_t *p) -{ - struct hvm_ioreq_server *s; - uint8_t type; - uint64_t addr; - unsigned int id; - - if ( !arch_ioreq_server_get_type_addr(d, p, &type, &addr) ) - return NULL; - - FOR_EACH_IOREQ_SERVER(d, id, s) - { - struct rangeset *r; - - if ( !s->enabled ) - continue; - - r = s->range[type]; - - switch ( type ) - { - unsigned long start, end; - - case XEN_DMOP_IO_RANGE_PORT: - start = addr; - end = start + p->size - 1; - if ( rangeset_contains_range(r, start, end) ) - return s; - - break; - - case XEN_DMOP_IO_RANGE_MEMORY: - start = hvm_mmio_first_byte(p); - end = hvm_mmio_last_byte(p); - - if ( rangeset_contains_range(r, start, end) ) - return s; - - break; - - case XEN_DMOP_IO_RANGE_PCI: - if ( rangeset_contains_singleton(r, addr >> 32) ) - { - p->type = IOREQ_TYPE_PCI_CONFIG; - p->addr = addr; - return s; - } - - break; - } - } - - return NULL; -} - -static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p) -{ - struct domain *d = current->domain; - struct hvm_ioreq_page *iorp; - buffered_iopage_t *pg; - buf_ioreq_t bp = { .data = p->data, - .addr = p->addr, - .type = p->type, - .dir = p->dir }; - /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */ - int qw = 0; - - /* Ensure buffered_iopage fits in a page */ - BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE); - - iorp = &s->bufioreq; - pg = iorp->va; - - if ( !pg ) - return IOREQ_STATUS_UNHANDLED; - - /* - * Return 0 for the cases we can't deal with: - * - 'addr' is only a 20-bit field, so we cannot address beyond 1MB - * - we cannot buffer accesses to guest memory buffers, as the guest - * may expect the memory buffer to be synchronously accessed - * - the count field is usually used with data_is_ptr and since we don't - * support data_is_ptr we do not waste space for the count field either - */ - if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) ) - return 0; - - switch ( p->size ) - { - case 1: - bp.size = 0; - break; - case 2: - bp.size = 1; - break; - case 4: - bp.size = 2; - break; - case 8: - bp.size = 3; - qw = 1; - break; - default: - gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size); - return IOREQ_STATUS_UNHANDLED; - } - - spin_lock(&s->bufioreq_lock); - - if ( (pg->ptrs.write_pointer - pg->ptrs.read_pointer) >= - (IOREQ_BUFFER_SLOT_NUM - qw) ) - { - /* The queue is full: send the iopacket through the normal path. */ - spin_unlock(&s->bufioreq_lock); - return IOREQ_STATUS_UNHANDLED; - } - - pg->buf_ioreq[pg->ptrs.write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp; - - if ( qw ) - { - bp.data = p->data >> 32; - pg->buf_ioreq[(pg->ptrs.write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp; - } - - /* Make the ioreq_t visible /before/ write_pointer. */ - smp_wmb(); - pg->ptrs.write_pointer += qw ? 2 : 1; - - /* Canonicalize read/write pointers to prevent their overflow. */ - while ( (s->bufioreq_handling == HVM_IOREQSRV_BUFIOREQ_ATOMIC) && - qw++ < IOREQ_BUFFER_SLOT_NUM && - pg->ptrs.read_pointer >= IOREQ_BUFFER_SLOT_NUM ) - { - union bufioreq_pointers old = pg->ptrs, new; - unsigned int n = old.read_pointer / IOREQ_BUFFER_SLOT_NUM; - - new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM; - new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM; - cmpxchg(&pg->ptrs.full, old.full, new.full); - } - - notify_via_xen_event_channel(d, s->bufioreq_evtchn); - spin_unlock(&s->bufioreq_lock); - - return IOREQ_STATUS_HANDLED; -} - -int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p, - bool buffered) -{ - struct vcpu *curr = current; - struct domain *d = curr->domain; - struct hvm_ioreq_vcpu *sv; - - ASSERT(s); - - if ( buffered ) - return hvm_send_buffered_ioreq(s, proto_p); - - if ( unlikely(!vcpu_start_shutdown_deferral(curr)) ) - return IOREQ_STATUS_RETRY; - - list_for_each_entry ( sv, - &s->ioreq_vcpu_list, - list_entry ) - { - if ( sv->vcpu == curr ) - { - evtchn_port_t port = sv->ioreq_evtchn; - ioreq_t *p = get_ioreq(s, curr); - - if ( unlikely(p->state != STATE_IOREQ_NONE) ) - { - gprintk(XENLOG_ERR, "device model set bad IO state %d\n", - p->state); - break; - } - - if ( unlikely(p->vp_eport != port) ) - { - gprintk(XENLOG_ERR, "device model set bad event channel %d\n", - p->vp_eport); - break; - } - - proto_p->state = STATE_IOREQ_NONE; - proto_p->vp_eport = port; - *p = *proto_p; - - prepare_wait_on_xen_event_channel(port); - - /* - * Following happens /after/ blocking and setting up ioreq - * contents. prepare_wait_on_xen_event_channel() is an implicit - * barrier. - */ - p->state = STATE_IOREQ_READY; - notify_via_xen_event_channel(d, port); - - sv->pending = true; - return IOREQ_STATUS_RETRY; - } - } - - return IOREQ_STATUS_UNHANDLED; -} - -unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered) -{ - struct domain *d = current->domain; - struct hvm_ioreq_server *s; - unsigned int id, failed = 0; - - FOR_EACH_IOREQ_SERVER(d, id, s) - { - if ( !s->enabled ) - continue; - - if ( hvm_send_ioreq(s, p, buffered) == IOREQ_STATUS_UNHANDLED ) - failed++; - } - - return failed; -} - static int hvm_access_cf8( int dir, unsigned int port, unsigned int bytes, uint32_t *val) { @@ -XXX,XX +XXX,XX @@ void arch_ioreq_domain_init(struct domain *d) register_portio_handler(d, 0xcf8, 4, hvm_access_cf8); } -void hvm_ioreq_init(struct domain *d) -{ - spin_lock_init(&d->arch.hvm.ioreq_server.lock); - - arch_ioreq_domain_init(d); -} - /* * Local variables: * mode: C diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/stdvga.c +++ b/xen/arch/x86/hvm/stdvga.c @@ -XXX,XX +XXX,XX @@ * can have side effects. */ +#include <xen/ioreq.h> #include <xen/types.h> #include <xen/sched.h> #include <xen/domain_page.h> -#include <asm/hvm/ioreq.h> #include <asm/hvm/support.h> #include <xen/numa.h> #include <xen/paging.h> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -XXX,XX +XXX,XX @@ * */ +#include <xen/ioreq.h> + #include <asm/types.h> #include <asm/mtrr.h> #include <asm/p2m.h> -#include <asm/hvm/ioreq.h> #include <asm/hvm/vmx/vmx.h> #include <asm/hvm/vmx/vvmx.h> #include <asm/hvm/nestedhvm.h> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -XXX,XX +XXX,XX @@ */ #include <xen/init.h> +#include <xen/ioreq.h> #include <xen/kernel.h> #include <xen/lib.h> #include <xen/mm.h> @@ -XXX,XX +XXX,XX @@ #include <asm/io_apic.h> #include <asm/pci.h> #include <asm/guest.h> -#include <asm/hvm/ioreq.h> #include <asm/pv/domain.h> #include <asm/pv/mm.h> diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/mm/shadow/common.c +++ b/xen/arch/x86/mm/shadow/common.c @@ -XXX,XX +XXX,XX @@ * along with this program; If not, see <http://www.gnu.org/licenses/>. */ +#include <xen/ioreq.h> #include <xen/types.h> #include <xen/mm.h> #include <xen/trace.h> @@ -XXX,XX +XXX,XX @@ #include <asm/current.h> #include <asm/flushtlb.h> #include <asm/shadow.h> -#include <asm/hvm/ioreq.h> #include <xen/numa.h> #include "private.h" diff --git a/xen/common/Kconfig b/xen/common/Kconfig index XXXXXXX..XXXXXXX 100644 --- a/xen/common/Kconfig +++ b/xen/common/Kconfig @@ -XXX,XX +XXX,XX @@ config HYPFS_CONFIG Disable this option in case you want to spare some memory or you want to hide the .config contents from dom0. +config IOREQ_SERVER + bool + config KEXEC bool "kexec support" default y diff --git a/xen/common/Makefile b/xen/common/Makefile index XXXXXXX..XXXXXXX 100644 --- a/xen/common/Makefile +++ b/xen/common/Makefile @@ -XXX,XX +XXX,XX @@ obj-$(CONFIG_GRANT_TABLE) += grant_table.o obj-y += guestcopy.o obj-bin-y += gunzip.init.o obj-$(CONFIG_HYPFS) += hypfs.o +obj-$(CONFIG_IOREQ_SERVER) += ioreq.o obj-y += irq.o obj-y += kernel.o obj-y += keyhandler.o diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/xen/common/ioreq.c @@ -XXX,XX +XXX,XX @@ +/* + * ioreq.c: hardware virtual machine I/O emulation + * + * Copyright (c) 2016 Citrix Systems Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; If not, see <http://www.gnu.org/licenses/>. + */ + +#include <xen/domain.h> +#include <xen/domain_page.h> +#include <xen/event.h> +#include <xen/init.h> +#include <xen/ioreq.h> +#include <xen/irq.h> +#include <xen/lib.h> +#include <xen/paging.h> +#include <xen/sched.h> +#include <xen/softirq.h> +#include <xen/trace.h> +#include <xen/vpci.h> + +#include <asm/hvm/ioreq.h> + +#include <public/hvm/ioreq.h> +#include <public/hvm/params.h> + +static void set_ioreq_server(struct domain *d, unsigned int id, + struct hvm_ioreq_server *s) +{ + ASSERT(id < MAX_NR_IOREQ_SERVERS); + ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]); + + d->arch.hvm.ioreq_server.server[id] = s; +} + +#define GET_IOREQ_SERVER(d, id) \ + (d)->arch.hvm.ioreq_server.server[id] + +static struct hvm_ioreq_server *get_ioreq_server(const struct domain *d, + unsigned int id) +{ + if ( id >= MAX_NR_IOREQ_SERVERS ) + return NULL; + + return GET_IOREQ_SERVER(d, id); +} + +/* + * Iterate over all possible ioreq servers. + * + * NOTE: The iteration is backwards such that more recently created + * ioreq servers are favoured in hvm_select_ioreq_server(). + * This is a semantic that previously existed when ioreq servers + * were held in a linked list. + */ +#define FOR_EACH_IOREQ_SERVER(d, id, s) \ + for ( (id) = MAX_NR_IOREQ_SERVERS; (id) != 0; ) \ + if ( !(s = GET_IOREQ_SERVER(d, --(id))) ) \ + continue; \ + else + +static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v) +{ + shared_iopage_t *p = s->ioreq.va; + + ASSERT((v == current) || !vcpu_runnable(v)); + ASSERT(p != NULL); + + return &p->vcpu_ioreq[v->vcpu_id]; +} + +static struct hvm_ioreq_vcpu *get_pending_vcpu(const struct vcpu *v, + struct hvm_ioreq_server **srvp) +{ + struct domain *d = v->domain; + struct hvm_ioreq_server *s; + unsigned int id; + + FOR_EACH_IOREQ_SERVER(d, id, s) + { + struct hvm_ioreq_vcpu *sv; + + list_for_each_entry ( sv, + &s->ioreq_vcpu_list, + list_entry ) + { + if ( sv->vcpu == v && sv->pending ) + { + if ( srvp ) + *srvp = s; + return sv; + } + } + } + + return NULL; +} + +bool hvm_io_pending(struct vcpu *v) +{ + return get_pending_vcpu(v, NULL); +} + +static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p) +{ + unsigned int prev_state = STATE_IOREQ_NONE; + unsigned int state = p->state; + uint64_t data = ~0; + + smp_rmb(); + + /* + * The only reason we should see this condition be false is when an + * emulator dying races with I/O being requested. + */ + while ( likely(state != STATE_IOREQ_NONE) ) + { + if ( unlikely(state < prev_state) ) + { + gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> %u\n", + prev_state, state); + sv->pending = false; + domain_crash(sv->vcpu->domain); + return false; /* bail */ + } + + switch ( prev_state = state ) + { + case STATE_IORESP_READY: /* IORESP_READY -> NONE */ + p->state = STATE_IOREQ_NONE; + data = p->data; + break; + + case STATE_IOREQ_READY: /* IOREQ_{READY,INPROCESS} -> IORESP_READY */ + case STATE_IOREQ_INPROCESS: + wait_on_xen_event_channel(sv->ioreq_evtchn, + ({ state = p->state; + smp_rmb(); + state != prev_state; })); + continue; + + default: + gdprintk(XENLOG_ERR, "Weird HVM iorequest state %u\n", state); + sv->pending = false; + domain_crash(sv->vcpu->domain); + return false; /* bail */ + } + + break; + } + + p = &sv->vcpu->arch.hvm.hvm_io.io_req; + if ( hvm_ioreq_needs_completion(p) ) + p->data = data; + + sv->pending = false; + + return true; +} + +bool handle_hvm_io_completion(struct vcpu *v) +{ + struct domain *d = v->domain; + struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io; + struct hvm_ioreq_server *s; + struct hvm_ioreq_vcpu *sv; + enum hvm_io_completion io_completion; + + if ( has_vpci(d) && vpci_process_pending(v) ) + { + raise_softirq(SCHEDULE_SOFTIRQ); + return false; + } + + sv = get_pending_vcpu(v, &s); + if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) ) + return false; + + vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ? + STATE_IORESP_READY : STATE_IOREQ_NONE; + + msix_write_completion(v); + vcpu_end_shutdown_deferral(v); + + io_completion = vio->io_completion; + vio->io_completion = HVMIO_no_completion; + + switch ( io_completion ) + { + case HVMIO_no_completion: + break; + + case HVMIO_mmio_completion: + return arch_ioreq_complete_mmio(); + + case HVMIO_pio_completion: + return handle_pio(vio->io_req.addr, vio->io_req.size, + vio->io_req.dir); + + default: + return arch_vcpu_ioreq_completion(io_completion); + } + + return true; +} + +static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf) +{ + struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; + struct page_info *page; + + if ( iorp->page ) + { + /* + * If a guest frame has already been mapped (which may happen + * on demand if hvm_get_ioreq_server_info() is called), then + * allocating a page is not permitted. + */ + if ( !gfn_eq(iorp->gfn, INVALID_GFN) ) + return -EPERM; + + return 0; + } + + page = alloc_domheap_page(s->target, MEMF_no_refcount); + + if ( !page ) + return -ENOMEM; + + if ( !get_page_and_type(page, s->target, PGT_writable_page) ) + { + /* + * The domain can't possibly know about this page yet, so failure + * here is a clear indication of something fishy going on. + */ + domain_crash(s->emulator); + return -ENODATA; + } + + iorp->va = __map_domain_page_global(page); + if ( !iorp->va ) + goto fail; + + iorp->page = page; + clear_page(iorp->va); + return 0; + + fail: + put_page_alloc_ref(page); + put_page_and_type(page); + + return -ENOMEM; +} + +static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf) +{ + struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; + struct page_info *page = iorp->page; + + if ( !page ) + return; + + iorp->page = NULL; + + unmap_domain_page_global(iorp->va); + iorp->va = NULL; + + put_page_alloc_ref(page); + put_page_and_type(page); +} + +bool is_ioreq_server_page(struct domain *d, const struct page_info *page) +{ + const struct hvm_ioreq_server *s; + unsigned int id; + bool found = false; + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + FOR_EACH_IOREQ_SERVER(d, id, s) + { + if ( (s->ioreq.page == page) || (s->bufioreq.page == page) ) + { + found = true; + break; + } + } + + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + + return found; +} + +static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s, + struct hvm_ioreq_vcpu *sv) +{ + ASSERT(spin_is_locked(&s->lock)); + + if ( s->ioreq.va != NULL ) + { + ioreq_t *p = get_ioreq(s, sv->vcpu); + + p->vp_eport = sv->ioreq_evtchn; + } +} + +static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s, + struct vcpu *v) +{ + struct hvm_ioreq_vcpu *sv; + int rc; + + sv = xzalloc(struct hvm_ioreq_vcpu); + + rc = -ENOMEM; + if ( !sv ) + goto fail1; + + spin_lock(&s->lock); + + rc = alloc_unbound_xen_event_channel(v->domain, v->vcpu_id, + s->emulator->domain_id, NULL); + if ( rc < 0 ) + goto fail2; + + sv->ioreq_evtchn = rc; + + if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) ) + { + rc = alloc_unbound_xen_event_channel(v->domain, 0, + s->emulator->domain_id, NULL); + if ( rc < 0 ) + goto fail3; + + s->bufioreq_evtchn = rc; + } + + sv->vcpu = v; + + list_add(&sv->list_entry, &s->ioreq_vcpu_list); + + if ( s->enabled ) + hvm_update_ioreq_evtchn(s, sv); + + spin_unlock(&s->lock); + return 0; + + fail3: + free_xen_event_channel(v->domain, sv->ioreq_evtchn); + + fail2: + spin_unlock(&s->lock); + xfree(sv); + + fail1: + return rc; +} + +static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, + struct vcpu *v) +{ + struct hvm_ioreq_vcpu *sv; + + spin_lock(&s->lock); + + list_for_each_entry ( sv, + &s->ioreq_vcpu_list, + list_entry ) + { + if ( sv->vcpu != v ) + continue; + + list_del(&sv->list_entry); + + if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) ) + free_xen_event_channel(v->domain, s->bufioreq_evtchn); + + free_xen_event_channel(v->domain, sv->ioreq_evtchn); + + xfree(sv); + break; + } + + spin_unlock(&s->lock); +} + +static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s) +{ + struct hvm_ioreq_vcpu *sv, *next; + + spin_lock(&s->lock); + + list_for_each_entry_safe ( sv, + next, + &s->ioreq_vcpu_list, + list_entry ) + { + struct vcpu *v = sv->vcpu; + + list_del(&sv->list_entry); + + if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) ) + free_xen_event_channel(v->domain, s->bufioreq_evtchn); + + free_xen_event_channel(v->domain, sv->ioreq_evtchn); + + xfree(sv); + } + + spin_unlock(&s->lock); +} + +static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s) +{ + int rc; + + rc = hvm_alloc_ioreq_mfn(s, false); + + if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) ) + rc = hvm_alloc_ioreq_mfn(s, true); + + if ( rc ) + hvm_free_ioreq_mfn(s, false); + + return rc; +} + +static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s) +{ + hvm_free_ioreq_mfn(s, true); + hvm_free_ioreq_mfn(s, false); +} + +static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s) +{ + unsigned int i; + + for ( i = 0; i < NR_IO_RANGE_TYPES; i++ ) + rangeset_destroy(s->range[i]); +} + +static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, + ioservid_t id) +{ + unsigned int i; + int rc; + + for ( i = 0; i < NR_IO_RANGE_TYPES; i++ ) + { + char *name; + + rc = asprintf(&name, "ioreq_server %d %s", id, + (i == XEN_DMOP_IO_RANGE_PORT) ? "port" : + (i == XEN_DMOP_IO_RANGE_MEMORY) ? "memory" : + (i == XEN_DMOP_IO_RANGE_PCI) ? "pci" : + ""); + if ( rc ) + goto fail; + + s->range[i] = rangeset_new(s->target, name, + RANGESETF_prettyprint_hex); + + xfree(name); + + rc = -ENOMEM; + if ( !s->range[i] ) + goto fail; + + rangeset_limit(s->range[i], MAX_NR_IO_RANGES); + } + + return 0; + + fail: + hvm_ioreq_server_free_rangesets(s); + + return rc; +} + +static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s) +{ + struct hvm_ioreq_vcpu *sv; + + spin_lock(&s->lock); + + if ( s->enabled ) + goto done; + + arch_ioreq_server_enable(s); + + s->enabled = true; + + list_for_each_entry ( sv, + &s->ioreq_vcpu_list, + list_entry ) + hvm_update_ioreq_evtchn(s, sv); + + done: + spin_unlock(&s->lock); +} + +static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s) +{ + spin_lock(&s->lock); + + if ( !s->enabled ) + goto done; + + arch_ioreq_server_disable(s); + + s->enabled = false; + + done: + spin_unlock(&s->lock); +} + +static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, + struct domain *d, int bufioreq_handling, + ioservid_t id) +{ + struct domain *currd = current->domain; + struct vcpu *v; + int rc; + + s->target = d; + + get_knownalive_domain(currd); + s->emulator = currd; + + spin_lock_init(&s->lock); + INIT_LIST_HEAD(&s->ioreq_vcpu_list); + spin_lock_init(&s->bufioreq_lock); + + s->ioreq.gfn = INVALID_GFN; + s->bufioreq.gfn = INVALID_GFN; + + rc = hvm_ioreq_server_alloc_rangesets(s, id); + if ( rc ) + return rc; + + s->bufioreq_handling = bufioreq_handling; + + for_each_vcpu ( d, v ) + { + rc = hvm_ioreq_server_add_vcpu(s, v); + if ( rc ) + goto fail_add; + } + + return 0; + + fail_add: + hvm_ioreq_server_remove_all_vcpus(s); + arch_ioreq_server_unmap_pages(s); + + hvm_ioreq_server_free_rangesets(s); + + put_domain(s->emulator); + return rc; +} + +static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s) +{ + ASSERT(!s->enabled); + hvm_ioreq_server_remove_all_vcpus(s); + + /* + * NOTE: It is safe to call both arch_ioreq_server_unmap_pages() and + * hvm_ioreq_server_free_pages() in that order. + * This is because the former will do nothing if the pages + * are not mapped, leaving the page to be freed by the latter. + * However if the pages are mapped then the former will set + * the page_info pointer to NULL, meaning the latter will do + * nothing. + */ + arch_ioreq_server_unmap_pages(s); + hvm_ioreq_server_free_pages(s); + + hvm_ioreq_server_free_rangesets(s); + + put_domain(s->emulator); +} + +int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, + ioservid_t *id) +{ + struct hvm_ioreq_server *s; + unsigned int i; + int rc; + + if ( bufioreq_handling > HVM_IOREQSRV_BUFIOREQ_ATOMIC ) + return -EINVAL; + + s = xzalloc(struct hvm_ioreq_server); + if ( !s ) + return -ENOMEM; + + domain_pause(d); + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + for ( i = 0; i < MAX_NR_IOREQ_SERVERS; i++ ) + { + if ( !GET_IOREQ_SERVER(d, i) ) + break; + } + + rc = -ENOSPC; + if ( i >= MAX_NR_IOREQ_SERVERS ) + goto fail; + + /* + * It is safe to call set_ioreq_server() prior to + * hvm_ioreq_server_init() since the target domain is paused. + */ + set_ioreq_server(d, i, s); + + rc = hvm_ioreq_server_init(s, d, bufioreq_handling, i); + if ( rc ) + { + set_ioreq_server(d, i, NULL); + goto fail; + } + + if ( id ) + *id = i; + + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + domain_unpause(d); + + return 0; + + fail: + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + domain_unpause(d); + + xfree(s); + return rc; +} + +int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id) +{ + struct hvm_ioreq_server *s; + int rc; + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + s = get_ioreq_server(d, id); + + rc = -ENOENT; + if ( !s ) + goto out; + + rc = -EPERM; + if ( s->emulator != current->domain ) + goto out; + + domain_pause(d); + + arch_ioreq_server_destroy(s); + + hvm_ioreq_server_disable(s); + + /* + * It is safe to call hvm_ioreq_server_deinit() prior to + * set_ioreq_server() since the target domain is paused. + */ + hvm_ioreq_server_deinit(s); + set_ioreq_server(d, id, NULL); + + domain_unpause(d); + + xfree(s); + + rc = 0; + + out: + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + + return rc; +} + +int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id, + unsigned long *ioreq_gfn, + unsigned long *bufioreq_gfn, + evtchn_port_t *bufioreq_port) +{ + struct hvm_ioreq_server *s; + int rc; + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + s = get_ioreq_server(d, id); + + rc = -ENOENT; + if ( !s ) + goto out; + + rc = -EPERM; + if ( s->emulator != current->domain ) + goto out; + + if ( ioreq_gfn || bufioreq_gfn ) + { + rc = arch_ioreq_server_map_pages(s); + if ( rc ) + goto out; + } + + if ( ioreq_gfn ) + *ioreq_gfn = gfn_x(s->ioreq.gfn); + + if ( HANDLE_BUFIOREQ(s) ) + { + if ( bufioreq_gfn ) + *bufioreq_gfn = gfn_x(s->bufioreq.gfn); + + if ( bufioreq_port ) + *bufioreq_port = s->bufioreq_evtchn; + } + + rc = 0; + + out: + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + + return rc; +} + +int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id, + unsigned long idx, mfn_t *mfn) +{ + struct hvm_ioreq_server *s; + int rc; + + ASSERT(is_hvm_domain(d)); + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + s = get_ioreq_server(d, id); + + rc = -ENOENT; + if ( !s ) + goto out; + + rc = -EPERM; + if ( s->emulator != current->domain ) + goto out; + + rc = hvm_ioreq_server_alloc_pages(s); + if ( rc ) + goto out; + + switch ( idx ) + { + case XENMEM_resource_ioreq_server_frame_bufioreq: + rc = -ENOENT; + if ( !HANDLE_BUFIOREQ(s) ) + goto out; + + *mfn = page_to_mfn(s->bufioreq.page); + rc = 0; + break; + + case XENMEM_resource_ioreq_server_frame_ioreq(0): + *mfn = page_to_mfn(s->ioreq.page); + rc = 0; + break; + + default: + rc = -EINVAL; + break; + } + + out: + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + + return rc; +} + +int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, + uint32_t type, uint64_t start, + uint64_t end) +{ + struct hvm_ioreq_server *s; + struct rangeset *r; + int rc; + + if ( start > end ) + return -EINVAL; + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + s = get_ioreq_server(d, id); + + rc = -ENOENT; + if ( !s ) + goto out; + + rc = -EPERM; + if ( s->emulator != current->domain ) + goto out; + + switch ( type ) + { + case XEN_DMOP_IO_RANGE_PORT: + case XEN_DMOP_IO_RANGE_MEMORY: + case XEN_DMOP_IO_RANGE_PCI: + r = s->range[type]; + break; + + default: + r = NULL; + break; + } + + rc = -EINVAL; + if ( !r ) + goto out; + + rc = -EEXIST; + if ( rangeset_overlaps_range(r, start, end) ) + goto out; + + rc = rangeset_add_range(r, start, end); + + out: + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + + return rc; +} + +int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, + uint32_t type, uint64_t start, + uint64_t end) +{ + struct hvm_ioreq_server *s; + struct rangeset *r; + int rc; + + if ( start > end ) + return -EINVAL; + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + s = get_ioreq_server(d, id); + + rc = -ENOENT; + if ( !s ) + goto out; + + rc = -EPERM; + if ( s->emulator != current->domain ) + goto out; + + switch ( type ) + { + case XEN_DMOP_IO_RANGE_PORT: + case XEN_DMOP_IO_RANGE_MEMORY: + case XEN_DMOP_IO_RANGE_PCI: + r = s->range[type]; + break; + + default: + r = NULL; + break; + } + + rc = -EINVAL; + if ( !r ) + goto out; + + rc = -ENOENT; + if ( !rangeset_contains_range(r, start, end) ) + goto out; + + rc = rangeset_remove_range(r, start, end); + + out: + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + + return rc; +} + +/* + * Map or unmap an ioreq server to specific memory type. For now, only + * HVMMEM_ioreq_server is supported, and in the future new types can be + * introduced, e.g. HVMMEM_ioreq_serverX mapped to ioreq server X. And + * currently, only write operations are to be forwarded to an ioreq server. + * Support for the emulation of read operations can be added when an ioreq + * server has such requirement in the future. + */ +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, + uint32_t type, uint32_t flags) +{ + struct hvm_ioreq_server *s; + int rc; + + if ( type != HVMMEM_ioreq_server ) + return -EINVAL; + + if ( flags & ~XEN_DMOP_IOREQ_MEM_ACCESS_WRITE ) + return -EINVAL; + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + s = get_ioreq_server(d, id); + + rc = -ENOENT; + if ( !s ) + goto out; + + rc = -EPERM; + if ( s->emulator != current->domain ) + goto out; + + rc = arch_ioreq_server_map_mem_type(d, s, flags); + + out: + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + + if ( rc == 0 ) + arch_ioreq_server_map_mem_type_completed(d, s, flags); + + return rc; +} + +int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, + bool enabled) +{ + struct hvm_ioreq_server *s; + int rc; + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + s = get_ioreq_server(d, id); + + rc = -ENOENT; + if ( !s ) + goto out; + + rc = -EPERM; + if ( s->emulator != current->domain ) + goto out; + + domain_pause(d); + + if ( enabled ) + hvm_ioreq_server_enable(s); + else + hvm_ioreq_server_disable(s); + + domain_unpause(d); + + rc = 0; + + out: + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + return rc; +} + +int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v) +{ + struct hvm_ioreq_server *s; + unsigned int id; + int rc; + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + FOR_EACH_IOREQ_SERVER(d, id, s) + { + rc = hvm_ioreq_server_add_vcpu(s, v); + if ( rc ) + goto fail; + } + + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + + return 0; + + fail: + while ( ++id != MAX_NR_IOREQ_SERVERS ) + { + s = GET_IOREQ_SERVER(d, id); + + if ( !s ) + continue; + + hvm_ioreq_server_remove_vcpu(s, v); + } + + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + + return rc; +} + +void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v) +{ + struct hvm_ioreq_server *s; + unsigned int id; + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + FOR_EACH_IOREQ_SERVER(d, id, s) + hvm_ioreq_server_remove_vcpu(s, v); + + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); +} + +void hvm_destroy_all_ioreq_servers(struct domain *d) +{ + struct hvm_ioreq_server *s; + unsigned int id; + + if ( !arch_ioreq_server_destroy_all(d) ) + return; + + spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + + /* No need to domain_pause() as the domain is being torn down */ + + FOR_EACH_IOREQ_SERVER(d, id, s) + { + hvm_ioreq_server_disable(s); + + /* + * It is safe to call hvm_ioreq_server_deinit() prior to + * set_ioreq_server() since the target domain is being destroyed. + */ + hvm_ioreq_server_deinit(s); + set_ioreq_server(d, id, NULL); + + xfree(s); + } + + spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); +} + +struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, + ioreq_t *p) +{ + struct hvm_ioreq_server *s; + uint8_t type; + uint64_t addr; + unsigned int id; + + if ( !arch_ioreq_server_get_type_addr(d, p, &type, &addr) ) + return NULL; + + FOR_EACH_IOREQ_SERVER(d, id, s) + { + struct rangeset *r; + + if ( !s->enabled ) + continue; + + r = s->range[type]; + + switch ( type ) + { + unsigned long start, end; + + case XEN_DMOP_IO_RANGE_PORT: + start = addr; + end = start + p->size - 1; + if ( rangeset_contains_range(r, start, end) ) + return s; + + break; + + case XEN_DMOP_IO_RANGE_MEMORY: + start = hvm_mmio_first_byte(p); + end = hvm_mmio_last_byte(p); + + if ( rangeset_contains_range(r, start, end) ) + return s; + + break; + + case XEN_DMOP_IO_RANGE_PCI: + if ( rangeset_contains_singleton(r, addr >> 32) ) + { + p->type = IOREQ_TYPE_PCI_CONFIG; + p->addr = addr; + return s; + } + + break; + } + } + + return NULL; +} + +static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p) +{ + struct domain *d = current->domain; + struct hvm_ioreq_page *iorp; + buffered_iopage_t *pg; + buf_ioreq_t bp = { .data = p->data, + .addr = p->addr, + .type = p->type, + .dir = p->dir }; + /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */ + int qw = 0; + + /* Ensure buffered_iopage fits in a page */ + BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE); + + iorp = &s->bufioreq; + pg = iorp->va; + + if ( !pg ) + return IOREQ_STATUS_UNHANDLED; + + /* + * Return 0 for the cases we can't deal with: + * - 'addr' is only a 20-bit field, so we cannot address beyond 1MB + * - we cannot buffer accesses to guest memory buffers, as the guest + * may expect the memory buffer to be synchronously accessed + * - the count field is usually used with data_is_ptr and since we don't + * support data_is_ptr we do not waste space for the count field either + */ + if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) ) + return 0; + + switch ( p->size ) + { + case 1: + bp.size = 0; + break; + case 2: + bp.size = 1; + break; + case 4: + bp.size = 2; + break; + case 8: + bp.size = 3; + qw = 1; + break; + default: + gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size); + return IOREQ_STATUS_UNHANDLED; + } + + spin_lock(&s->bufioreq_lock); + + if ( (pg->ptrs.write_pointer - pg->ptrs.read_pointer) >= + (IOREQ_BUFFER_SLOT_NUM - qw) ) + { + /* The queue is full: send the iopacket through the normal path. */ + spin_unlock(&s->bufioreq_lock); + return IOREQ_STATUS_UNHANDLED; + } + + pg->buf_ioreq[pg->ptrs.write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp; + + if ( qw ) + { + bp.data = p->data >> 32; + pg->buf_ioreq[(pg->ptrs.write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp; + } + + /* Make the ioreq_t visible /before/ write_pointer. */ + smp_wmb(); + pg->ptrs.write_pointer += qw ? 2 : 1; + + /* Canonicalize read/write pointers to prevent their overflow. */ + while ( (s->bufioreq_handling == HVM_IOREQSRV_BUFIOREQ_ATOMIC) && + qw++ < IOREQ_BUFFER_SLOT_NUM && + pg->ptrs.read_pointer >= IOREQ_BUFFER_SLOT_NUM ) + { + union bufioreq_pointers old = pg->ptrs, new; + unsigned int n = old.read_pointer / IOREQ_BUFFER_SLOT_NUM; + + new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM; + new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM; + cmpxchg(&pg->ptrs.full, old.full, new.full); + } + + notify_via_xen_event_channel(d, s->bufioreq_evtchn); + spin_unlock(&s->bufioreq_lock); + + return IOREQ_STATUS_HANDLED; +} + +int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p, + bool buffered) +{ + struct vcpu *curr = current; + struct domain *d = curr->domain; + struct hvm_ioreq_vcpu *sv; + + ASSERT(s); + + if ( buffered ) + return hvm_send_buffered_ioreq(s, proto_p); + + if ( unlikely(!vcpu_start_shutdown_deferral(curr)) ) + return IOREQ_STATUS_RETRY; + + list_for_each_entry ( sv, + &s->ioreq_vcpu_list, + list_entry ) + { + if ( sv->vcpu == curr ) + { + evtchn_port_t port = sv->ioreq_evtchn; + ioreq_t *p = get_ioreq(s, curr); + + if ( unlikely(p->state != STATE_IOREQ_NONE) ) + { + gprintk(XENLOG_ERR, "device model set bad IO state %d\n", + p->state); + break; + } + + if ( unlikely(p->vp_eport != port) ) + { + gprintk(XENLOG_ERR, "device model set bad event channel %d\n", + p->vp_eport); + break; + } + + proto_p->state = STATE_IOREQ_NONE; + proto_p->vp_eport = port; + *p = *proto_p; + + prepare_wait_on_xen_event_channel(port); + + /* + * Following happens /after/ blocking and setting up ioreq + * contents. prepare_wait_on_xen_event_channel() is an implicit + * barrier. + */ + p->state = STATE_IOREQ_READY; + notify_via_xen_event_channel(d, port); + + sv->pending = true; + return IOREQ_STATUS_RETRY; + } + } + + return IOREQ_STATUS_UNHANDLED; +} + +unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered) +{ + struct domain *d = current->domain; + struct hvm_ioreq_server *s; + unsigned int id, failed = 0; + + FOR_EACH_IOREQ_SERVER(d, id, s) + { + if ( !s->enabled ) + continue; + + if ( hvm_send_ioreq(s, p, buffered) == IOREQ_STATUS_UNHANDLED ) + failed++; + } + + return failed; +} + +void hvm_ioreq_init(struct domain *d) +{ + spin_lock_init(&d->arch.hvm.ioreq_server.lock); + + arch_ioreq_domain_init(d); +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/ioreq.h +++ b/xen/include/asm-x86/hvm/ioreq.h @@ -XXX,XX +XXX,XX @@ #ifndef __ASM_X86_HVM_IOREQ_H__ #define __ASM_X86_HVM_IOREQ_H__ -#define HANDLE_BUFIOREQ(s) \ - ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) - -bool hvm_io_pending(struct vcpu *v); -bool handle_hvm_io_completion(struct vcpu *v); -bool is_ioreq_server_page(struct domain *d, const struct page_info *page); - -int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, - ioservid_t *id); -int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id); -int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id, - unsigned long *ioreq_gfn, - unsigned long *bufioreq_gfn, - evtchn_port_t *bufioreq_port); -int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id, - unsigned long idx, mfn_t *mfn); -int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, - uint32_t type, uint64_t start, - uint64_t end); -int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, - uint32_t type, uint64_t start, - uint64_t end); -int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, - uint32_t type, uint32_t flags); -int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, - bool enabled); - -int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v); -void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v); -void hvm_destroy_all_ioreq_servers(struct domain *d); - -struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, - ioreq_t *p); -int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p, - bool buffered); -unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered); - -void hvm_ioreq_init(struct domain *d); - -bool arch_ioreq_complete_mmio(void); -bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion); -int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s); -void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s); -void arch_ioreq_server_enable(struct hvm_ioreq_server *s); -void arch_ioreq_server_disable(struct hvm_ioreq_server *s); -void arch_ioreq_server_destroy(struct hvm_ioreq_server *s); -int arch_ioreq_server_map_mem_type(struct domain *d, - struct hvm_ioreq_server *s, - uint32_t flags); -void arch_ioreq_server_map_mem_type_completed(struct domain *d, - struct hvm_ioreq_server *s, - uint32_t flags); -bool arch_ioreq_server_destroy_all(struct domain *d); -bool arch_ioreq_server_get_type_addr(const struct domain *d, - const ioreq_t *p, - uint8_t *type, - uint64_t *addr); -void arch_ioreq_domain_init(struct domain *d); - /* This correlation must not be altered */ #define IOREQ_STATUS_HANDLED X86EMUL_OKAY #define IOREQ_STATUS_UNHANDLED X86EMUL_UNHANDLEABLE diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/xen/include/xen/ioreq.h @@ -XXX,XX +XXX,XX @@ +/* + * ioreq.h: Hardware virtual machine assist interface definitions. + * + * Copyright (c) 2016 Citrix Systems Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef __XEN_IOREQ_H__ +#define __XEN_IOREQ_H__ + +#include <xen/sched.h> + +#define HANDLE_BUFIOREQ(s) \ + ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) + +bool hvm_io_pending(struct vcpu *v); +bool handle_hvm_io_completion(struct vcpu *v); +bool is_ioreq_server_page(struct domain *d, const struct page_info *page); + +int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, + ioservid_t *id); +int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id); +int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id, + unsigned long *ioreq_gfn, + unsigned long *bufioreq_gfn, + evtchn_port_t *bufioreq_port); +int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id, + unsigned long idx, mfn_t *mfn); +int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, + uint32_t type, uint64_t start, + uint64_t end); +int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, + uint32_t type, uint64_t start, + uint64_t end); +int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, + uint32_t type, uint32_t flags); +int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, + bool enabled); + +int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v); +void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v); +void hvm_destroy_all_ioreq_servers(struct domain *d); + +struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, + ioreq_t *p); +int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p, + bool buffered); +unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered); + +void hvm_ioreq_init(struct domain *d); + +bool arch_ioreq_complete_mmio(void); +bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion); +int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s); +void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s); +void arch_ioreq_server_enable(struct hvm_ioreq_server *s); +void arch_ioreq_server_disable(struct hvm_ioreq_server *s); +void arch_ioreq_server_destroy(struct hvm_ioreq_server *s); +int arch_ioreq_server_map_mem_type(struct domain *d, + struct hvm_ioreq_server *s, + uint32_t flags); +void arch_ioreq_server_map_mem_type_completed(struct domain *d, + struct hvm_ioreq_server *s, + uint32_t flags); +bool arch_ioreq_server_destroy_all(struct domain *d); +bool arch_ioreq_server_get_type_addr(const struct domain *d, + const ioreq_t *p, + uint8_t *type, + uint64_t *addr); +void arch_ioreq_domain_init(struct domain *d); + +#endif /* __XEN_IOREQ_H__ */ + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> The IOREQ is a common feature now and this helper will be used on Arm as is. Move it to xen/ioreq.h and remove "hvm" prefix. Although PIO handling on Arm is not introduced with the current series (it will be implemented when we add support for vPCI), technically the PIOs exist on Arm (however they are accessed the same way as MMIO) and it would be better not to diverge now. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> CC: Julien Grall <julien.grall@arm.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - new patch, was split from: "[RFC PATCH V1 01/12] hvm/ioreq: Make x86's IOREQ feature common" Changes V1 -> V2: - remove "hvm" prefix Changes V2 -> V3: - add Paul's R-b Changes V3 -> V4: - add Jan's A-b --- xen/arch/x86/hvm/emulate.c | 4 ++-- xen/arch/x86/hvm/io.c | 2 +- xen/common/ioreq.c | 4 ++-- xen/include/asm-x86/hvm/vcpu.h | 7 ------- xen/include/xen/ioreq.h | 7 +++++++ 5 files changed, 12 insertions(+), 12 deletions(-) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -XXX,XX +XXX,XX @@ static int hvmemul_do_io( rc = hvm_send_ioreq(s, &p, 0); if ( rc != X86EMUL_RETRY || currd->is_shutting_down ) vio->io_req.state = STATE_IOREQ_NONE; - else if ( !hvm_ioreq_needs_completion(&vio->io_req) ) + else if ( !ioreq_needs_completion(&vio->io_req) ) rc = X86EMUL_OKAY; } break; @@ -XXX,XX +XXX,XX @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt, if ( rc == X86EMUL_OKAY && vio->mmio_retry ) rc = X86EMUL_RETRY; - if ( !hvm_ioreq_needs_completion(&vio->io_req) ) + if ( !ioreq_needs_completion(&vio->io_req) ) completion = HVMIO_no_completion; else if ( completion == HVMIO_no_completion ) completion = (vio->io_req.type != IOREQ_TYPE_PIO || diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/io.c +++ b/xen/arch/x86/hvm/io.c @@ -XXX,XX +XXX,XX @@ bool handle_pio(uint16_t port, unsigned int size, int dir) rc = hvmemul_do_pio_buffer(port, size, dir, &data); - if ( hvm_ioreq_needs_completion(&vio->io_req) ) + if ( ioreq_needs_completion(&vio->io_req) ) vio->io_completion = HVMIO_pio_completion; switch ( rc ) diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/ioreq.c +++ b/xen/common/ioreq.c @@ -XXX,XX +XXX,XX @@ static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p) } p = &sv->vcpu->arch.hvm.hvm_io.io_req; - if ( hvm_ioreq_needs_completion(p) ) + if ( ioreq_needs_completion(p) ) p->data = data; sv->pending = false; @@ -XXX,XX +XXX,XX @@ bool handle_hvm_io_completion(struct vcpu *v) if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) ) return false; - vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ? + vio->io_req.state = ioreq_needs_completion(&vio->io_req) ? STATE_IORESP_READY : STATE_IOREQ_NONE; msix_write_completion(v); diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/vcpu.h +++ b/xen/include/asm-x86/hvm/vcpu.h @@ -XXX,XX +XXX,XX @@ struct hvm_vcpu_io { const struct g2m_ioport *g2m_ioport; }; -static inline bool hvm_ioreq_needs_completion(const ioreq_t *ioreq) -{ - return ioreq->state == STATE_IOREQ_READY && - !ioreq->data_is_ptr && - (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE); -} - struct nestedvcpu { bool_t nv_guestmode; /* vcpu in guestmode? */ void *nv_vvmcx; /* l1 guest virtual VMCB/VMCS */ diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/ioreq.h +++ b/xen/include/xen/ioreq.h @@ -XXX,XX +XXX,XX @@ #include <xen/sched.h> +static inline bool ioreq_needs_completion(const ioreq_t *ioreq) +{ + return ioreq->state == STATE_IOREQ_READY && + !ioreq->data_is_ptr && + (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE); +} + #define HANDLE_BUFIOREQ(s) \ ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> The IOREQ is a common feature now and these helpers will be used on Arm as is. Move them to xen/ioreq.h and replace "hvm" prefixes with "ioreq". Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Paul Durrant <paul@xen.org> CC: Julien Grall <julien.grall@arm.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - new patch Changes V1 -> V2: - replace "hvm" prefix by "ioreq" Changes V2 -> V3: - add Paul's R-b Changes V32 -> V4: - add Jan's A-b --- xen/arch/x86/hvm/intercept.c | 5 +++-- xen/arch/x86/hvm/stdvga.c | 4 ++-- xen/common/ioreq.c | 4 ++-- xen/include/asm-x86/hvm/io.h | 16 ---------------- xen/include/xen/ioreq.h | 16 ++++++++++++++++ 5 files changed, 23 insertions(+), 22 deletions(-) diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/intercept.c +++ b/xen/arch/x86/hvm/intercept.c @@ -XXX,XX +XXX,XX @@ * this program; If not, see <http://www.gnu.org/licenses/>. */ +#include <xen/ioreq.h> #include <xen/types.h> #include <xen/sched.h> #include <asm/regs.h> @@ -XXX,XX +XXX,XX @@ static bool_t hvm_mmio_accept(const struct hvm_io_handler *handler, const ioreq_t *p) { - paddr_t first = hvm_mmio_first_byte(p), last; + paddr_t first = ioreq_mmio_first_byte(p), last; BUG_ON(handler->type != IOREQ_TYPE_COPY); @@ -XXX,XX +XXX,XX @@ static bool_t hvm_mmio_accept(const struct hvm_io_handler *handler, return 0; /* Make sure the handler will accept the whole access. */ - last = hvm_mmio_last_byte(p); + last = ioreq_mmio_last_byte(p); if ( last != first && !handler->mmio.ops->check(current, last) ) domain_crash(current->domain); diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/stdvga.c +++ b/xen/arch/x86/hvm/stdvga.c @@ -XXX,XX +XXX,XX @@ static bool_t stdvga_mem_accept(const struct hvm_io_handler *handler, * deadlock when hvm_mmio_internal() is called from * hvm_copy_to/from_guest_phys() in hvm_process_io_intercept(). */ - if ( (hvm_mmio_first_byte(p) < VGA_MEM_BASE) || - (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) ) + if ( (ioreq_mmio_first_byte(p) < VGA_MEM_BASE) || + (ioreq_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) ) return 0; spin_lock(&s->lock); diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/ioreq.c +++ b/xen/common/ioreq.c @@ -XXX,XX +XXX,XX @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, break; case XEN_DMOP_IO_RANGE_MEMORY: - start = hvm_mmio_first_byte(p); - end = hvm_mmio_last_byte(p); + start = ioreq_mmio_first_byte(p); + end = ioreq_mmio_last_byte(p); if ( rangeset_contains_range(r, start, end) ) return s; diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/io.h +++ b/xen/include/asm-x86/hvm/io.h @@ -XXX,XX +XXX,XX @@ struct hvm_mmio_ops { hvm_mmio_write_t write; }; -static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p) -{ - return unlikely(p->df) ? - p->addr - (p->count - 1ul) * p->size : - p->addr; -} - -static inline paddr_t hvm_mmio_last_byte(const ioreq_t *p) -{ - unsigned long size = p->size; - - return unlikely(p->df) ? - p->addr + size - 1: - p->addr + (p->count * size) - 1; -} - typedef int (*portio_action_t)( int dir, unsigned int port, unsigned int bytes, uint32_t *val); diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/ioreq.h +++ b/xen/include/xen/ioreq.h @@ -XXX,XX +XXX,XX @@ #include <xen/sched.h> +static inline paddr_t ioreq_mmio_first_byte(const ioreq_t *p) +{ + return unlikely(p->df) ? + p->addr - (p->count - 1ul) * p->size : + p->addr; +} + +static inline paddr_t ioreq_mmio_last_byte(const ioreq_t *p) +{ + unsigned long size = p->size; + + return unlikely(p->df) ? + p->addr + size - 1: + p->addr + (p->count * size) - 1; +} + static inline bool ioreq_needs_completion(const ioreq_t *ioreq) { return ioreq->state == STATE_IOREQ_READY && -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> The IOREQ is a common feature now and these structs will be used on Arm as is. Move them to xen/ioreq.h and remove "hvm" prefixes. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Acked-by: Jan Beulich <jbeulich@suse.com> CC: Julien Grall <julien.grall@arm.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - new patch Changes V1 -> V2: - remove "hvm" prefix Changes V2 -> V3: - update patch according the "legacy interface" is x86 specific Changes V3 -> V4: - add Jan's A-b --- xen/arch/x86/hvm/emulate.c | 2 +- xen/arch/x86/hvm/ioreq.c | 38 +++++++------- xen/arch/x86/hvm/stdvga.c | 2 +- xen/arch/x86/mm/p2m.c | 8 +-- xen/common/ioreq.c | 108 +++++++++++++++++++-------------------- xen/include/asm-x86/hvm/domain.h | 36 +------------ xen/include/asm-x86/p2m.h | 8 +-- xen/include/xen/ioreq.h | 54 ++++++++++++++++---- 8 files changed, 128 insertions(+), 128 deletions(-) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -XXX,XX +XXX,XX @@ static int hvmemul_do_io( * However, there's no cheap approach to avoid above situations in xen, * so the device model side needs to check the incoming ioreq event. */ - struct hvm_ioreq_server *s = NULL; + struct ioreq_server *s = NULL; p2m_type_t p2mt = p2m_invalid; if ( is_mmio ) diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/ioreq.c +++ b/xen/arch/x86/hvm/ioreq.c @@ -XXX,XX +XXX,XX @@ bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion) return true; } -static gfn_t hvm_alloc_legacy_ioreq_gfn(struct hvm_ioreq_server *s) +static gfn_t hvm_alloc_legacy_ioreq_gfn(struct ioreq_server *s) { struct domain *d = s->target; unsigned int i; @@ -XXX,XX +XXX,XX @@ static gfn_t hvm_alloc_legacy_ioreq_gfn(struct hvm_ioreq_server *s) return INVALID_GFN; } -static gfn_t hvm_alloc_ioreq_gfn(struct hvm_ioreq_server *s) +static gfn_t hvm_alloc_ioreq_gfn(struct ioreq_server *s) { struct domain *d = s->target; unsigned int i; @@ -XXX,XX +XXX,XX @@ static gfn_t hvm_alloc_ioreq_gfn(struct hvm_ioreq_server *s) return hvm_alloc_legacy_ioreq_gfn(s); } -static bool hvm_free_legacy_ioreq_gfn(struct hvm_ioreq_server *s, +static bool hvm_free_legacy_ioreq_gfn(struct ioreq_server *s, gfn_t gfn) { struct domain *d = s->target; @@ -XXX,XX +XXX,XX @@ static bool hvm_free_legacy_ioreq_gfn(struct hvm_ioreq_server *s, return true; } -static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s, gfn_t gfn) +static void hvm_free_ioreq_gfn(struct ioreq_server *s, gfn_t gfn) { struct domain *d = s->target; unsigned int i = gfn_x(gfn) - d->arch.hvm.ioreq_gfn.base; @@ -XXX,XX +XXX,XX @@ static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s, gfn_t gfn) } } -static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf) +static void hvm_unmap_ioreq_gfn(struct ioreq_server *s, bool buf) { - struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; + struct ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; if ( gfn_eq(iorp->gfn, INVALID_GFN) ) return; @@ -XXX,XX +XXX,XX @@ static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf) iorp->gfn = INVALID_GFN; } -static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf) +static int hvm_map_ioreq_gfn(struct ioreq_server *s, bool buf) { struct domain *d = s->target; - struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; + struct ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; int rc; if ( iorp->page ) @@ -XXX,XX +XXX,XX @@ static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf) return rc; } -static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf) +static void hvm_remove_ioreq_gfn(struct ioreq_server *s, bool buf) { struct domain *d = s->target; - struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; + struct ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; if ( gfn_eq(iorp->gfn, INVALID_GFN) ) return; @@ -XXX,XX +XXX,XX @@ static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf) clear_page(iorp->va); } -static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf) +static int hvm_add_ioreq_gfn(struct ioreq_server *s, bool buf) { struct domain *d = s->target; - struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; + struct ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; int rc; if ( gfn_eq(iorp->gfn, INVALID_GFN) ) @@ -XXX,XX +XXX,XX @@ static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf) return rc; } -int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s) +int arch_ioreq_server_map_pages(struct ioreq_server *s) { int rc; @@ -XXX,XX +XXX,XX @@ int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s) return rc; } -void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s) +void arch_ioreq_server_unmap_pages(struct ioreq_server *s) { hvm_unmap_ioreq_gfn(s, true); hvm_unmap_ioreq_gfn(s, false); } -void arch_ioreq_server_enable(struct hvm_ioreq_server *s) +void arch_ioreq_server_enable(struct ioreq_server *s) { hvm_remove_ioreq_gfn(s, false); hvm_remove_ioreq_gfn(s, true); } -void arch_ioreq_server_disable(struct hvm_ioreq_server *s) +void arch_ioreq_server_disable(struct ioreq_server *s) { hvm_add_ioreq_gfn(s, true); hvm_add_ioreq_gfn(s, false); } /* Called when target domain is paused */ -void arch_ioreq_server_destroy(struct hvm_ioreq_server *s) +void arch_ioreq_server_destroy(struct ioreq_server *s) { p2m_set_ioreq_server(s->target, 0, s); } /* Called with ioreq_server lock held */ int arch_ioreq_server_map_mem_type(struct domain *d, - struct hvm_ioreq_server *s, + struct ioreq_server *s, uint32_t flags) { return p2m_set_ioreq_server(d, flags, s); } void arch_ioreq_server_map_mem_type_completed(struct domain *d, - struct hvm_ioreq_server *s, + struct ioreq_server *s, uint32_t flags) { if ( flags == 0 ) diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/stdvga.c +++ b/xen/arch/x86/hvm/stdvga.c @@ -XXX,XX +XXX,XX @@ static int stdvga_mem_write(const struct hvm_io_handler *handler, .dir = IOREQ_WRITE, .data = data, }; - struct hvm_ioreq_server *srv; + struct ioreq_server *srv; if ( !stdvga_cache_is_enabled(s) || !s->stdvga ) goto done; diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -XXX,XX +XXX,XX @@ void p2m_memory_type_changed(struct domain *d) int p2m_set_ioreq_server(struct domain *d, unsigned int flags, - struct hvm_ioreq_server *s) + struct ioreq_server *s) { struct p2m_domain *p2m = p2m_get_hostp2m(d); int rc; @@ -XXX,XX +XXX,XX @@ int p2m_set_ioreq_server(struct domain *d, return rc; } -struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d, - unsigned int *flags) +struct ioreq_server *p2m_get_ioreq_server(struct domain *d, + unsigned int *flags) { struct p2m_domain *p2m = p2m_get_hostp2m(d); - struct hvm_ioreq_server *s; + struct ioreq_server *s; spin_lock(&p2m->ioreq.lock); diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/ioreq.c +++ b/xen/common/ioreq.c @@ -XXX,XX +XXX,XX @@ #include <public/hvm/params.h> static void set_ioreq_server(struct domain *d, unsigned int id, - struct hvm_ioreq_server *s) + struct ioreq_server *s) { ASSERT(id < MAX_NR_IOREQ_SERVERS); ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]); @@ -XXX,XX +XXX,XX @@ static void set_ioreq_server(struct domain *d, unsigned int id, #define GET_IOREQ_SERVER(d, id) \ (d)->arch.hvm.ioreq_server.server[id] -static struct hvm_ioreq_server *get_ioreq_server(const struct domain *d, - unsigned int id) +static struct ioreq_server *get_ioreq_server(const struct domain *d, + unsigned int id) { if ( id >= MAX_NR_IOREQ_SERVERS ) return NULL; @@ -XXX,XX +XXX,XX @@ static struct hvm_ioreq_server *get_ioreq_server(const struct domain *d, continue; \ else -static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v) +static ioreq_t *get_ioreq(struct ioreq_server *s, struct vcpu *v) { shared_iopage_t *p = s->ioreq.va; @@ -XXX,XX +XXX,XX @@ static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v) return &p->vcpu_ioreq[v->vcpu_id]; } -static struct hvm_ioreq_vcpu *get_pending_vcpu(const struct vcpu *v, - struct hvm_ioreq_server **srvp) +static struct ioreq_vcpu *get_pending_vcpu(const struct vcpu *v, + struct ioreq_server **srvp) { struct domain *d = v->domain; - struct hvm_ioreq_server *s; + struct ioreq_server *s; unsigned int id; FOR_EACH_IOREQ_SERVER(d, id, s) { - struct hvm_ioreq_vcpu *sv; + struct ioreq_vcpu *sv; list_for_each_entry ( sv, &s->ioreq_vcpu_list, @@ -XXX,XX +XXX,XX @@ bool hvm_io_pending(struct vcpu *v) return get_pending_vcpu(v, NULL); } -static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p) +static bool hvm_wait_for_io(struct ioreq_vcpu *sv, ioreq_t *p) { unsigned int prev_state = STATE_IOREQ_NONE; unsigned int state = p->state; @@ -XXX,XX +XXX,XX @@ bool handle_hvm_io_completion(struct vcpu *v) { struct domain *d = v->domain; struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io; - struct hvm_ioreq_server *s; - struct hvm_ioreq_vcpu *sv; + struct ioreq_server *s; + struct ioreq_vcpu *sv; enum hvm_io_completion io_completion; if ( has_vpci(d) && vpci_process_pending(v) ) @@ -XXX,XX +XXX,XX @@ bool handle_hvm_io_completion(struct vcpu *v) return true; } -static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf) +static int hvm_alloc_ioreq_mfn(struct ioreq_server *s, bool buf) { - struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; + struct ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; struct page_info *page; if ( iorp->page ) @@ -XXX,XX +XXX,XX @@ static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf) return -ENOMEM; } -static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf) +static void hvm_free_ioreq_mfn(struct ioreq_server *s, bool buf) { - struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; + struct ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; struct page_info *page = iorp->page; if ( !page ) @@ -XXX,XX +XXX,XX @@ static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf) bool is_ioreq_server_page(struct domain *d, const struct page_info *page) { - const struct hvm_ioreq_server *s; + const struct ioreq_server *s; unsigned int id; bool found = false; @@ -XXX,XX +XXX,XX @@ bool is_ioreq_server_page(struct domain *d, const struct page_info *page) return found; } -static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s, - struct hvm_ioreq_vcpu *sv) +static void hvm_update_ioreq_evtchn(struct ioreq_server *s, + struct ioreq_vcpu *sv) { ASSERT(spin_is_locked(&s->lock)); @@ -XXX,XX +XXX,XX @@ static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s, } } -static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s, +static int hvm_ioreq_server_add_vcpu(struct ioreq_server *s, struct vcpu *v) { - struct hvm_ioreq_vcpu *sv; + struct ioreq_vcpu *sv; int rc; - sv = xzalloc(struct hvm_ioreq_vcpu); + sv = xzalloc(struct ioreq_vcpu); rc = -ENOMEM; if ( !sv ) @@ -XXX,XX +XXX,XX @@ static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s, return rc; } -static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, +static void hvm_ioreq_server_remove_vcpu(struct ioreq_server *s, struct vcpu *v) { - struct hvm_ioreq_vcpu *sv; + struct ioreq_vcpu *sv; spin_lock(&s->lock); @@ -XXX,XX +XXX,XX @@ static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s, spin_unlock(&s->lock); } -static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s) +static void hvm_ioreq_server_remove_all_vcpus(struct ioreq_server *s) { - struct hvm_ioreq_vcpu *sv, *next; + struct ioreq_vcpu *sv, *next; spin_lock(&s->lock); @@ -XXX,XX +XXX,XX @@ static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s) spin_unlock(&s->lock); } -static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s) +static int hvm_ioreq_server_alloc_pages(struct ioreq_server *s) { int rc; @@ -XXX,XX +XXX,XX @@ static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s) return rc; } -static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s) +static void hvm_ioreq_server_free_pages(struct ioreq_server *s) { hvm_free_ioreq_mfn(s, true); hvm_free_ioreq_mfn(s, false); } -static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s) +static void hvm_ioreq_server_free_rangesets(struct ioreq_server *s) { unsigned int i; @@ -XXX,XX +XXX,XX @@ static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s) rangeset_destroy(s->range[i]); } -static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, +static int hvm_ioreq_server_alloc_rangesets(struct ioreq_server *s, ioservid_t id) { unsigned int i; @@ -XXX,XX +XXX,XX @@ static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, return rc; } -static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s) +static void hvm_ioreq_server_enable(struct ioreq_server *s) { - struct hvm_ioreq_vcpu *sv; + struct ioreq_vcpu *sv; spin_lock(&s->lock); @@ -XXX,XX +XXX,XX @@ static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s) spin_unlock(&s->lock); } -static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s) +static void hvm_ioreq_server_disable(struct ioreq_server *s) { spin_lock(&s->lock); @@ -XXX,XX +XXX,XX @@ static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s) spin_unlock(&s->lock); } -static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, +static int hvm_ioreq_server_init(struct ioreq_server *s, struct domain *d, int bufioreq_handling, ioservid_t id) { @@ -XXX,XX +XXX,XX @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s, return rc; } -static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s) +static void hvm_ioreq_server_deinit(struct ioreq_server *s) { ASSERT(!s->enabled); hvm_ioreq_server_remove_all_vcpus(s); @@ -XXX,XX +XXX,XX @@ static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s) int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, ioservid_t *id) { - struct hvm_ioreq_server *s; + struct ioreq_server *s; unsigned int i; int rc; if ( bufioreq_handling > HVM_IOREQSRV_BUFIOREQ_ATOMIC ) return -EINVAL; - s = xzalloc(struct hvm_ioreq_server); + s = xzalloc(struct ioreq_server); if ( !s ) return -ENOMEM; @@ -XXX,XX +XXX,XX @@ int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id) { - struct hvm_ioreq_server *s; + struct ioreq_server *s; int rc; spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); @@ -XXX,XX +XXX,XX @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id, unsigned long *bufioreq_gfn, evtchn_port_t *bufioreq_port) { - struct hvm_ioreq_server *s; + struct ioreq_server *s; int rc; spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); @@ -XXX,XX +XXX,XX @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id, int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id, unsigned long idx, mfn_t *mfn) { - struct hvm_ioreq_server *s; + struct ioreq_server *s; int rc; ASSERT(is_hvm_domain(d)); @@ -XXX,XX +XXX,XX @@ int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, uint32_t type, uint64_t start, uint64_t end) { - struct hvm_ioreq_server *s; + struct ioreq_server *s; struct rangeset *r; int rc; @@ -XXX,XX +XXX,XX @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, uint32_t type, uint64_t start, uint64_t end) { - struct hvm_ioreq_server *s; + struct ioreq_server *s; struct rangeset *r; int rc; @@ -XXX,XX +XXX,XX @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, uint32_t type, uint32_t flags) { - struct hvm_ioreq_server *s; + struct ioreq_server *s; int rc; if ( type != HVMMEM_ioreq_server ) @@ -XXX,XX +XXX,XX @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, bool enabled) { - struct hvm_ioreq_server *s; + struct ioreq_server *s; int rc; spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); @@ -XXX,XX +XXX,XX @@ int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v) { - struct hvm_ioreq_server *s; + struct ioreq_server *s; unsigned int id; int rc; @@ -XXX,XX +XXX,XX @@ int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v) void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v) { - struct hvm_ioreq_server *s; + struct ioreq_server *s; unsigned int id; spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); @@ -XXX,XX +XXX,XX @@ void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v) void hvm_destroy_all_ioreq_servers(struct domain *d) { - struct hvm_ioreq_server *s; + struct ioreq_server *s; unsigned int id; if ( !arch_ioreq_server_destroy_all(d) ) @@ -XXX,XX +XXX,XX @@ void hvm_destroy_all_ioreq_servers(struct domain *d) spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); } -struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, - ioreq_t *p) +struct ioreq_server *hvm_select_ioreq_server(struct domain *d, + ioreq_t *p) { - struct hvm_ioreq_server *s; + struct ioreq_server *s; uint8_t type; uint64_t addr; unsigned int id; @@ -XXX,XX +XXX,XX @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, return NULL; } -static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p) +static int hvm_send_buffered_ioreq(struct ioreq_server *s, ioreq_t *p) { struct domain *d = current->domain; - struct hvm_ioreq_page *iorp; + struct ioreq_page *iorp; buffered_iopage_t *pg; buf_ioreq_t bp = { .data = p->data, .addr = p->addr, @@ -XXX,XX +XXX,XX @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p) return IOREQ_STATUS_HANDLED; } -int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p, +int hvm_send_ioreq(struct ioreq_server *s, ioreq_t *proto_p, bool buffered) { struct vcpu *curr = current; struct domain *d = curr->domain; - struct hvm_ioreq_vcpu *sv; + struct ioreq_vcpu *sv; ASSERT(s); @@ -XXX,XX +XXX,XX @@ int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p, unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered) { struct domain *d = current->domain; - struct hvm_ioreq_server *s; + struct ioreq_server *s; unsigned int id, failed = 0; FOR_EACH_IOREQ_SERVER(d, id, s) diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/domain.h +++ b/xen/include/asm-x86/hvm/domain.h @@ -XXX,XX +XXX,XX @@ #include <public/hvm/dm_op.h> -struct hvm_ioreq_page { - gfn_t gfn; - struct page_info *page; - void *va; -}; - -struct hvm_ioreq_vcpu { - struct list_head list_entry; - struct vcpu *vcpu; - evtchn_port_t ioreq_evtchn; - bool pending; -}; - -#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1) -#define MAX_NR_IO_RANGES 256 - -struct hvm_ioreq_server { - struct domain *target, *emulator; - - /* Lock to serialize toolstack modifications */ - spinlock_t lock; - - struct hvm_ioreq_page ioreq; - struct list_head ioreq_vcpu_list; - struct hvm_ioreq_page bufioreq; - - /* Lock to serialize access to buffered ioreq ring */ - spinlock_t bufioreq_lock; - evtchn_port_t bufioreq_evtchn; - struct rangeset *range[NR_IO_RANGE_TYPES]; - bool enabled; - uint8_t bufioreq_handling; -}; - #ifdef CONFIG_MEM_SHARING struct mem_sharing_domain { @@ -XXX,XX +XXX,XX @@ struct hvm_domain { /* Lock protects all other values in the sub-struct and the default */ struct { spinlock_t lock; - struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS]; + struct ioreq_server *server[MAX_NR_IOREQ_SERVERS]; } ioreq_server; /* Cached CF8 for guest PCI config cycles */ diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -XXX,XX +XXX,XX @@ struct p2m_domain { * ioreq server who's responsible for the emulation of * gfns with specific p2m type(for now, p2m_ioreq_server). */ - struct hvm_ioreq_server *server; + struct ioreq_server *server; /* * flags specifies whether read, write or both operations * are to be emulated by an ioreq server. @@ -XXX,XX +XXX,XX @@ static inline unsigned int p2m_get_iommu_flags(p2m_type_t p2mt, mfn_t mfn) } int p2m_set_ioreq_server(struct domain *d, unsigned int flags, - struct hvm_ioreq_server *s); -struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d, - unsigned int *flags); + struct ioreq_server *s); +struct ioreq_server *p2m_get_ioreq_server(struct domain *d, + unsigned int *flags); static inline int p2m_entry_modify(struct p2m_domain *p2m, p2m_type_t nt, p2m_type_t ot, mfn_t nfn, mfn_t ofn, diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/ioreq.h +++ b/xen/include/xen/ioreq.h @@ -XXX,XX +XXX,XX @@ #include <xen/sched.h> +struct ioreq_page { + gfn_t gfn; + struct page_info *page; + void *va; +}; + +struct ioreq_vcpu { + struct list_head list_entry; + struct vcpu *vcpu; + evtchn_port_t ioreq_evtchn; + bool pending; +}; + +#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1) +#define MAX_NR_IO_RANGES 256 + +struct ioreq_server { + struct domain *target, *emulator; + + /* Lock to serialize toolstack modifications */ + spinlock_t lock; + + struct ioreq_page ioreq; + struct list_head ioreq_vcpu_list; + struct ioreq_page bufioreq; + + /* Lock to serialize access to buffered ioreq ring */ + spinlock_t bufioreq_lock; + evtchn_port_t bufioreq_evtchn; + struct rangeset *range[NR_IO_RANGE_TYPES]; + bool enabled; + uint8_t bufioreq_handling; +}; + static inline paddr_t ioreq_mmio_first_byte(const ioreq_t *p) { return unlikely(p->df) ? @@ -XXX,XX +XXX,XX @@ int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v); void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v); void hvm_destroy_all_ioreq_servers(struct domain *d); -struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, - ioreq_t *p); -int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p, +struct ioreq_server *hvm_select_ioreq_server(struct domain *d, + ioreq_t *p); +int hvm_send_ioreq(struct ioreq_server *s, ioreq_t *proto_p, bool buffered); unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered); @@ -XXX,XX +XXX,XX @@ void hvm_ioreq_init(struct domain *d); bool arch_ioreq_complete_mmio(void); bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion); -int arch_ioreq_server_map_pages(struct hvm_ioreq_server *s); -void arch_ioreq_server_unmap_pages(struct hvm_ioreq_server *s); -void arch_ioreq_server_enable(struct hvm_ioreq_server *s); -void arch_ioreq_server_disable(struct hvm_ioreq_server *s); -void arch_ioreq_server_destroy(struct hvm_ioreq_server *s); +int arch_ioreq_server_map_pages(struct ioreq_server *s); +void arch_ioreq_server_unmap_pages(struct ioreq_server *s); +void arch_ioreq_server_enable(struct ioreq_server *s); +void arch_ioreq_server_disable(struct ioreq_server *s); +void arch_ioreq_server_destroy(struct ioreq_server *s); int arch_ioreq_server_map_mem_type(struct domain *d, - struct hvm_ioreq_server *s, + struct ioreq_server *s, uint32_t flags); void arch_ioreq_server_map_mem_type_completed(struct domain *d, - struct hvm_ioreq_server *s, + struct ioreq_server *s, uint32_t flags); bool arch_ioreq_server_destroy_all(struct domain *d); bool arch_ioreq_server_get_type_addr(const struct domain *d, -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> The IOREQ is a common feature now and this struct will be used on Arm as is. Move it to common struct domain. This also significantly reduces the layering violation in the common code (*arch.hvm* usage). We don't move ioreq_gfn since it is not used in the common code (the "legacy" mechanism is x86 specific). Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Acked-by: Jan Beulich <jbeulich@suse.com> CC: Julien Grall <julien.grall@arm.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes V1 -> V2: - new patch Changes V2 -> V3: - remove the mention of "ioreq_gfn" from patch subject/description - update patch according the "legacy interface" is x86 specific - drop hvm_params related changes in arch/x86/hvm/hvm.c - leave ioreq_gfn in hvm_domain Changes V3 -> V4: - rebase - drop the stale part of the comment above struct ioreq_server - add Jan's A-b --- xen/common/ioreq.c | 60 ++++++++++++++++++++-------------------- xen/include/asm-x86/hvm/domain.h | 8 ------ xen/include/xen/sched.h | 10 +++++++ 3 files changed, 40 insertions(+), 38 deletions(-) diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/ioreq.c +++ b/xen/common/ioreq.c @@ -XXX,XX +XXX,XX @@ static void set_ioreq_server(struct domain *d, unsigned int id, struct ioreq_server *s) { ASSERT(id < MAX_NR_IOREQ_SERVERS); - ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]); + ASSERT(!s || !d->ioreq_server.server[id]); - d->arch.hvm.ioreq_server.server[id] = s; + d->ioreq_server.server[id] = s; } #define GET_IOREQ_SERVER(d, id) \ - (d)->arch.hvm.ioreq_server.server[id] + (d)->ioreq_server.server[id] static struct ioreq_server *get_ioreq_server(const struct domain *d, unsigned int id) @@ -XXX,XX +XXX,XX @@ bool is_ioreq_server_page(struct domain *d, const struct page_info *page) unsigned int id; bool found = false; - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_lock_recursive(&d->ioreq_server.lock); FOR_EACH_IOREQ_SERVER(d, id, s) { @@ -XXX,XX +XXX,XX @@ bool is_ioreq_server_page(struct domain *d, const struct page_info *page) } } - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_unlock_recursive(&d->ioreq_server.lock); return found; } @@ -XXX,XX +XXX,XX @@ int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, return -ENOMEM; domain_pause(d); - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_lock_recursive(&d->ioreq_server.lock); for ( i = 0; i < MAX_NR_IOREQ_SERVERS; i++ ) { @@ -XXX,XX +XXX,XX @@ int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, if ( id ) *id = i; - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_unlock_recursive(&d->ioreq_server.lock); domain_unpause(d); return 0; fail: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_unlock_recursive(&d->ioreq_server.lock); domain_unpause(d); xfree(s); @@ -XXX,XX +XXX,XX @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id) struct ioreq_server *s; int rc; - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_lock_recursive(&d->ioreq_server.lock); s = get_ioreq_server(d, id); @@ -XXX,XX +XXX,XX @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id) rc = 0; out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_unlock_recursive(&d->ioreq_server.lock); return rc; } @@ -XXX,XX +XXX,XX @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id, struct ioreq_server *s; int rc; - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_lock_recursive(&d->ioreq_server.lock); s = get_ioreq_server(d, id); @@ -XXX,XX +XXX,XX @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id, rc = 0; out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_unlock_recursive(&d->ioreq_server.lock); return rc; } @@ -XXX,XX +XXX,XX @@ int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id, ASSERT(is_hvm_domain(d)); - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_lock_recursive(&d->ioreq_server.lock); s = get_ioreq_server(d, id); @@ -XXX,XX +XXX,XX @@ int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id, } out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_unlock_recursive(&d->ioreq_server.lock); return rc; } @@ -XXX,XX +XXX,XX @@ int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, if ( start > end ) return -EINVAL; - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_lock_recursive(&d->ioreq_server.lock); s = get_ioreq_server(d, id); @@ -XXX,XX +XXX,XX @@ int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, rc = rangeset_add_range(r, start, end); out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_unlock_recursive(&d->ioreq_server.lock); return rc; } @@ -XXX,XX +XXX,XX @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, if ( start > end ) return -EINVAL; - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_lock_recursive(&d->ioreq_server.lock); s = get_ioreq_server(d, id); @@ -XXX,XX +XXX,XX @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, rc = rangeset_remove_range(r, start, end); out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_unlock_recursive(&d->ioreq_server.lock); return rc; } @@ -XXX,XX +XXX,XX @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, if ( flags & ~XEN_DMOP_IOREQ_MEM_ACCESS_WRITE ) return -EINVAL; - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_lock_recursive(&d->ioreq_server.lock); s = get_ioreq_server(d, id); @@ -XXX,XX +XXX,XX @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, rc = arch_ioreq_server_map_mem_type(d, s, flags); out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_unlock_recursive(&d->ioreq_server.lock); if ( rc == 0 ) arch_ioreq_server_map_mem_type_completed(d, s, flags); @@ -XXX,XX +XXX,XX @@ int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, struct ioreq_server *s; int rc; - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_lock_recursive(&d->ioreq_server.lock); s = get_ioreq_server(d, id); @@ -XXX,XX +XXX,XX @@ int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, rc = 0; out: - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_unlock_recursive(&d->ioreq_server.lock); return rc; } @@ -XXX,XX +XXX,XX @@ int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v) unsigned int id; int rc; - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_lock_recursive(&d->ioreq_server.lock); FOR_EACH_IOREQ_SERVER(d, id, s) { @@ -XXX,XX +XXX,XX @@ int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v) goto fail; } - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_unlock_recursive(&d->ioreq_server.lock); return 0; @@ -XXX,XX +XXX,XX @@ int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v) hvm_ioreq_server_remove_vcpu(s, v); } - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_unlock_recursive(&d->ioreq_server.lock); return rc; } @@ -XXX,XX +XXX,XX @@ void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v) struct ioreq_server *s; unsigned int id; - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_lock_recursive(&d->ioreq_server.lock); FOR_EACH_IOREQ_SERVER(d, id, s) hvm_ioreq_server_remove_vcpu(s, v); - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_unlock_recursive(&d->ioreq_server.lock); } void hvm_destroy_all_ioreq_servers(struct domain *d) @@ -XXX,XX +XXX,XX @@ void hvm_destroy_all_ioreq_servers(struct domain *d) if ( !arch_ioreq_server_destroy_all(d) ) return; - spin_lock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_lock_recursive(&d->ioreq_server.lock); /* No need to domain_pause() as the domain is being torn down */ @@ -XXX,XX +XXX,XX @@ void hvm_destroy_all_ioreq_servers(struct domain *d) xfree(s); } - spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock); + spin_unlock_recursive(&d->ioreq_server.lock); } struct ioreq_server *hvm_select_ioreq_server(struct domain *d, @@ -XXX,XX +XXX,XX @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered) void hvm_ioreq_init(struct domain *d) { - spin_lock_init(&d->arch.hvm.ioreq_server.lock); + spin_lock_init(&d->ioreq_server.lock); arch_ioreq_domain_init(d); } diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/domain.h +++ b/xen/include/asm-x86/hvm/domain.h @@ -XXX,XX +XXX,XX @@ struct hvm_pi_ops { void (*vcpu_block)(struct vcpu *); }; -#define MAX_NR_IOREQ_SERVERS 8 - struct hvm_domain { /* Guest page range used for non-default ioreq servers */ struct { @@ -XXX,XX +XXX,XX @@ struct hvm_domain { unsigned long legacy_mask; /* indexed by HVM param number */ } ioreq_gfn; - /* Lock protects all other values in the sub-struct and the default */ - struct { - spinlock_t lock; - struct ioreq_server *server[MAX_NR_IOREQ_SERVERS]; - } ioreq_server; - /* Cached CF8 for guest PCI config cycles */ uint32_t pci_cf8; diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -XXX,XX +XXX,XX @@ struct sched_unit { struct evtchn_port_ops; +#define MAX_NR_IOREQ_SERVERS 8 + struct domain { domid_t domain_id; @@ -XXX,XX +XXX,XX @@ struct domain struct { unsigned int val; } teardown; + +#ifdef CONFIG_IOREQ_SERVER + /* Lock protects all other values in the sub-struct */ + struct { + spinlock_t lock; + struct ioreq_server *server[MAX_NR_IOREQ_SERVERS]; + } ioreq_server; +#endif }; static inline struct page_list_head *page_to_list( -- 2.7.4
From: Julien Grall <julien.grall@arm.com> As a lot of x86 code can be re-used on Arm later on, this patch moves the IOREQ related dm-op handling to the common code. The idea is to have the top level dm-op handling arch-specific and call into ioreq_server_dm_op() for otherwise unhandled ops. Pros: - More natural than doing it other way around (top level dm-op handling common). - Leave compat_dm_op() in x86 code. Cons: - Code duplication. Both arches have to duplicate do_dm_op(), etc. Also update XSM code a bit to let dm-op be used on Arm. This support is going to be used on Arm to be able run device emulator outside of Xen hypervisor. Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" *** I decided to leave common dm.h to keep struct dmop_args declaration (to be included by Arm's dm.c), alternatively we could avoid introducing new header by moving the declaration into the existing header, but failed to find a suitable one which context would fit. *** Changes RFC -> V1: - update XSM, related changes were pulled from: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features Changes V1 -> V2: - update the author of a patch - update patch description - introduce xen/dm.h and move definitions here Changes V2 -> V3: - no changes Changes V3 -> V4: - rework to have the top level dm-op handling arch-specific - update patch subject/description, was "xen/dm: Make x86's DM feature common" - make a few functions static in common ioreq.c --- xen/arch/x86/hvm/dm.c | 101 +----------------------------------- xen/common/ioreq.c | 135 ++++++++++++++++++++++++++++++++++++++++++------ xen/include/xen/dm.h | 39 ++++++++++++++ xen/include/xen/ioreq.h | 17 +----- xen/include/xsm/dummy.h | 4 +- xen/include/xsm/xsm.h | 6 +-- xen/xsm/dummy.c | 2 +- xen/xsm/flask/hooks.c | 5 +- 8 files changed, 171 insertions(+), 138 deletions(-) create mode 100644 xen/include/xen/dm.h diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/dm.c +++ b/xen/arch/x86/hvm/dm.c @@ -XXX,XX +XXX,XX @@ #include <xen/event.h> #include <xen/guest_access.h> +#include <xen/dm.h> #include <xen/hypercall.h> #include <xen/ioreq.h> #include <xen/nospec.h> @@ -XXX,XX +XXX,XX @@ #include <public/hvm/hvm_op.h> -struct dmop_args { - domid_t domid; - unsigned int nr_bufs; - /* Reserve enough buf elements for all current hypercalls. */ - struct xen_dm_op_buf buf[2]; -}; - static bool _raw_copy_from_guest_buf_offset(void *dst, const struct dmop_args *args, unsigned int buf_idx, @@ -XXX,XX +XXX,XX @@ static int dm_op(const struct dmop_args *op_args) switch ( op.op ) { - case XEN_DMOP_create_ioreq_server: - { - struct xen_dm_op_create_ioreq_server *data = - &op.u.create_ioreq_server; - - const_op = false; - - rc = -EINVAL; - if ( data->pad[0] || data->pad[1] || data->pad[2] ) - break; - - rc = hvm_create_ioreq_server(d, data->handle_bufioreq, - &data->id); - break; - } - - case XEN_DMOP_get_ioreq_server_info: - { - struct xen_dm_op_get_ioreq_server_info *data = - &op.u.get_ioreq_server_info; - const uint16_t valid_flags = XEN_DMOP_no_gfns; - - const_op = false; - - rc = -EINVAL; - if ( data->flags & ~valid_flags ) - break; - - rc = hvm_get_ioreq_server_info(d, data->id, - (data->flags & XEN_DMOP_no_gfns) ? - NULL : &data->ioreq_gfn, - (data->flags & XEN_DMOP_no_gfns) ? - NULL : &data->bufioreq_gfn, - &data->bufioreq_port); - break; - } - - case XEN_DMOP_map_io_range_to_ioreq_server: - { - const struct xen_dm_op_ioreq_server_range *data = - &op.u.map_io_range_to_ioreq_server; - - rc = -EINVAL; - if ( data->pad ) - break; - - rc = hvm_map_io_range_to_ioreq_server(d, data->id, data->type, - data->start, data->end); - break; - } - - case XEN_DMOP_unmap_io_range_from_ioreq_server: - { - const struct xen_dm_op_ioreq_server_range *data = - &op.u.unmap_io_range_from_ioreq_server; - - rc = -EINVAL; - if ( data->pad ) - break; - - rc = hvm_unmap_io_range_from_ioreq_server(d, data->id, data->type, - data->start, data->end); - break; - } - case XEN_DMOP_map_mem_type_to_ioreq_server: { struct xen_dm_op_map_mem_type_to_ioreq_server *data = @@ -XXX,XX +XXX,XX @@ static int dm_op(const struct dmop_args *op_args) break; } - case XEN_DMOP_set_ioreq_server_state: - { - const struct xen_dm_op_set_ioreq_server_state *data = - &op.u.set_ioreq_server_state; - - rc = -EINVAL; - if ( data->pad ) - break; - - rc = hvm_set_ioreq_server_state(d, data->id, !!data->enabled); - break; - } - - case XEN_DMOP_destroy_ioreq_server: - { - const struct xen_dm_op_destroy_ioreq_server *data = - &op.u.destroy_ioreq_server; - - rc = -EINVAL; - if ( data->pad ) - break; - - rc = hvm_destroy_ioreq_server(d, data->id); - break; - } - case XEN_DMOP_track_dirty_vram: { const struct xen_dm_op_track_dirty_vram *data = @@ -XXX,XX +XXX,XX @@ static int dm_op(const struct dmop_args *op_args) } default: - rc = -EOPNOTSUPP; + rc = ioreq_server_dm_op(&op, d, &const_op); break; } diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/ioreq.c +++ b/xen/common/ioreq.c @@ -XXX,XX +XXX,XX @@ static void hvm_ioreq_server_deinit(struct ioreq_server *s) put_domain(s->emulator); } -int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, - ioservid_t *id) +static int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, + ioservid_t *id) { struct ioreq_server *s; unsigned int i; @@ -XXX,XX +XXX,XX @@ int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, return rc; } -int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id) +static int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id) { struct ioreq_server *s; int rc; @@ -XXX,XX +XXX,XX @@ int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id) return rc; } -int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id, - unsigned long *ioreq_gfn, - unsigned long *bufioreq_gfn, - evtchn_port_t *bufioreq_port) +static int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id, + unsigned long *ioreq_gfn, + unsigned long *bufioreq_gfn, + evtchn_port_t *bufioreq_port) { struct ioreq_server *s; int rc; @@ -XXX,XX +XXX,XX @@ int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id, return rc; } -int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, - uint32_t type, uint64_t start, - uint64_t end) +static int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, + uint32_t type, uint64_t start, + uint64_t end) { struct ioreq_server *s; struct rangeset *r; @@ -XXX,XX +XXX,XX @@ int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, return rc; } -int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, - uint32_t type, uint64_t start, - uint64_t end) +static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, + uint32_t type, uint64_t start, + uint64_t end) { struct ioreq_server *s; struct rangeset *r; @@ -XXX,XX +XXX,XX @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, return rc; } -int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, - bool enabled) +static int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, + bool enabled) { struct ioreq_server *s; int rc; @@ -XXX,XX +XXX,XX @@ void hvm_ioreq_init(struct domain *d) arch_ioreq_domain_init(d); } +int ioreq_server_dm_op(struct xen_dm_op *op, struct domain *d, bool *const_op) +{ + long rc; + + switch ( op->op ) + { + case XEN_DMOP_create_ioreq_server: + { + struct xen_dm_op_create_ioreq_server *data = + &op->u.create_ioreq_server; + + *const_op = false; + + rc = -EINVAL; + if ( data->pad[0] || data->pad[1] || data->pad[2] ) + break; + + rc = hvm_create_ioreq_server(d, data->handle_bufioreq, + &data->id); + break; + } + + case XEN_DMOP_get_ioreq_server_info: + { + struct xen_dm_op_get_ioreq_server_info *data = + &op->u.get_ioreq_server_info; + const uint16_t valid_flags = XEN_DMOP_no_gfns; + + *const_op = false; + + rc = -EINVAL; + if ( data->flags & ~valid_flags ) + break; + + rc = hvm_get_ioreq_server_info(d, data->id, + (data->flags & XEN_DMOP_no_gfns) ? + NULL : (unsigned long *)&data->ioreq_gfn, + (data->flags & XEN_DMOP_no_gfns) ? + NULL : (unsigned long *)&data->bufioreq_gfn, + &data->bufioreq_port); + break; + } + + case XEN_DMOP_map_io_range_to_ioreq_server: + { + const struct xen_dm_op_ioreq_server_range *data = + &op->u.map_io_range_to_ioreq_server; + + rc = -EINVAL; + if ( data->pad ) + break; + + rc = hvm_map_io_range_to_ioreq_server(d, data->id, data->type, + data->start, data->end); + break; + } + + case XEN_DMOP_unmap_io_range_from_ioreq_server: + { + const struct xen_dm_op_ioreq_server_range *data = + &op->u.unmap_io_range_from_ioreq_server; + + rc = -EINVAL; + if ( data->pad ) + break; + + rc = hvm_unmap_io_range_from_ioreq_server(d, data->id, data->type, + data->start, data->end); + break; + } + + case XEN_DMOP_set_ioreq_server_state: + { + const struct xen_dm_op_set_ioreq_server_state *data = + &op->u.set_ioreq_server_state; + + rc = -EINVAL; + if ( data->pad ) + break; + + rc = hvm_set_ioreq_server_state(d, data->id, !!data->enabled); + break; + } + + case XEN_DMOP_destroy_ioreq_server: + { + const struct xen_dm_op_destroy_ioreq_server *data = + &op->u.destroy_ioreq_server; + + rc = -EINVAL; + if ( data->pad ) + break; + + rc = hvm_destroy_ioreq_server(d, data->id); + break; + } + + default: + rc = -EOPNOTSUPP; + break; + } + + return rc; +} + /* * Local variables: * mode: C diff --git a/xen/include/xen/dm.h b/xen/include/xen/dm.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/xen/include/xen/dm.h @@ -XXX,XX +XXX,XX @@ +/* + * Copyright (c) 2016 Citrix Systems Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef __XEN_DM_H__ +#define __XEN_DM_H__ + +#include <xen/sched.h> + +struct dmop_args { + domid_t domid; + unsigned int nr_bufs; + /* Reserve enough buf elements for all current hypercalls. */ + struct xen_dm_op_buf buf[2]; +}; + +#endif /* __XEN_DM_H__ */ + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/ioreq.h +++ b/xen/include/xen/ioreq.h @@ -XXX,XX +XXX,XX @@ bool hvm_io_pending(struct vcpu *v); bool handle_hvm_io_completion(struct vcpu *v); bool is_ioreq_server_page(struct domain *d, const struct page_info *page); -int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, - ioservid_t *id); -int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id); -int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id, - unsigned long *ioreq_gfn, - unsigned long *bufioreq_gfn, - evtchn_port_t *bufioreq_port); int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id, unsigned long idx, mfn_t *mfn); -int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, - uint32_t type, uint64_t start, - uint64_t end); -int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, - uint32_t type, uint64_t start, - uint64_t end); int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, uint32_t type, uint32_t flags); -int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, - bool enabled); int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v); void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v); @@ -XXX,XX +XXX,XX @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered); void hvm_ioreq_init(struct domain *d); +int ioreq_server_dm_op(struct xen_dm_op *op, struct domain *d, bool *const_op); + bool arch_ioreq_complete_mmio(void); bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion); int arch_ioreq_server_map_pages(struct ioreq_server *s); diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xsm/dummy.h +++ b/xen/include/xsm/dummy.h @@ -XXX,XX +XXX,XX @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int } } +#endif /* CONFIG_X86 */ + static XSM_INLINE int xsm_dm_op(XSM_DEFAULT_ARG struct domain *d) { XSM_ASSERT_ACTION(XSM_DM_PRIV); return xsm_default_action(action, current->domain, d); } -#endif /* CONFIG_X86 */ - #ifdef CONFIG_ARGO static XSM_INLINE int xsm_argo_enable(const struct domain *d) { diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xsm/xsm.h +++ b/xen/include/xsm/xsm.h @@ -XXX,XX +XXX,XX @@ struct xsm_operations { int (*ioport_permission) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow); int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow); int (*pmu_op) (struct domain *d, unsigned int op); - int (*dm_op) (struct domain *d); #endif + int (*dm_op) (struct domain *d); int (*xen_version) (uint32_t cmd); int (*domain_resource_map) (struct domain *d); #ifdef CONFIG_ARGO @@ -XXX,XX +XXX,XX @@ static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, unsigned int return xsm_ops->pmu_op(d, op); } +#endif /* CONFIG_X86 */ + static inline int xsm_dm_op(xsm_default_t def, struct domain *d) { return xsm_ops->dm_op(d); } -#endif /* CONFIG_X86 */ - static inline int xsm_xen_version (xsm_default_t def, uint32_t op) { return xsm_ops->xen_version(op); diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c index XXXXXXX..XXXXXXX 100644 --- a/xen/xsm/dummy.c +++ b/xen/xsm/dummy.c @@ -XXX,XX +XXX,XX @@ void __init xsm_fixup_ops (struct xsm_operations *ops) set_to_dummy_if_null(ops, ioport_permission); set_to_dummy_if_null(ops, ioport_mapping); set_to_dummy_if_null(ops, pmu_op); - set_to_dummy_if_null(ops, dm_op); #endif + set_to_dummy_if_null(ops, dm_op); set_to_dummy_if_null(ops, xen_version); set_to_dummy_if_null(ops, domain_resource_map); #ifdef CONFIG_ARGO diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c index XXXXXXX..XXXXXXX 100644 --- a/xen/xsm/flask/hooks.c +++ b/xen/xsm/flask/hooks.c @@ -XXX,XX +XXX,XX @@ static int flask_pmu_op (struct domain *d, unsigned int op) return -EPERM; } } +#endif /* CONFIG_X86 */ static int flask_dm_op(struct domain *d) { return current_has_perm(d, SECCLASS_HVM, HVM__DM); } -#endif /* CONFIG_X86 */ - static int flask_xen_version (uint32_t op) { u32 dsid = domain_sid(current->domain); @@ -XXX,XX +XXX,XX @@ static struct xsm_operations flask_ops = { .ioport_permission = flask_ioport_permission, .ioport_mapping = flask_ioport_mapping, .pmu_op = flask_pmu_op, - .dm_op = flask_dm_op, #endif + .dm_op = flask_dm_op, .xen_version = flask_xen_version, .domain_resource_map = flask_domain_resource_map, #ifdef CONFIG_ARGO -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> The IOREQ is a common feature now and these fields will be used on Arm as is. Move them to common struct vcpu as a part of new struct vcpu_io and drop duplicating "io" prefixes. Also move enum hvm_io_completion to xen/sched.h and remove "hvm" prefixes. This patch completely removes layering violation in the common code. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> CC: Julien Grall <julien.grall@arm.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes V1 -> V2: - new patch Changes V2 -> V3: - update patch according the "legacy interface" is x86 specific - update patch description - drop the "io" prefixes from the field names - wrap IO_realmode_completion Changes V3 -> V4: - rename all hvm_vcpu_io locals to "hvio" - rename according to the new renaming scheme IO_ -> VIO_ (io_ -> vio_) - drop "io" prefix from io_completion locals --- xen/arch/x86/hvm/emulate.c | 210 +++++++++++++++++++------------------- xen/arch/x86/hvm/hvm.c | 2 +- xen/arch/x86/hvm/io.c | 32 +++--- xen/arch/x86/hvm/ioreq.c | 6 +- xen/arch/x86/hvm/svm/nestedsvm.c | 2 +- xen/arch/x86/hvm/vmx/realmode.c | 8 +- xen/common/ioreq.c | 26 ++--- xen/include/asm-x86/hvm/emulate.h | 2 +- xen/include/asm-x86/hvm/vcpu.h | 11 -- xen/include/xen/ioreq.h | 2 +- xen/include/xen/sched.h | 19 ++++ 11 files changed, 164 insertions(+), 156 deletions(-) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -XXX,XX +XXX,XX @@ static const struct hvm_io_handler ioreq_server_handler = { */ void hvmemul_cancel(struct vcpu *v) { - struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io; + struct hvm_vcpu_io *hvio = &v->arch.hvm.hvm_io; - vio->io_req.state = STATE_IOREQ_NONE; - vio->io_completion = HVMIO_no_completion; - vio->mmio_cache_count = 0; - vio->mmio_insn_bytes = 0; - vio->mmio_access = (struct npfec){}; - vio->mmio_retry = false; - vio->g2m_ioport = NULL; + v->io.req.state = STATE_IOREQ_NONE; + v->io.completion = VIO_no_completion; + hvio->mmio_cache_count = 0; + hvio->mmio_insn_bytes = 0; + hvio->mmio_access = (struct npfec){}; + hvio->mmio_retry = false; + hvio->g2m_ioport = NULL; hvmemul_cache_disable(v); } @@ -XXX,XX +XXX,XX @@ static int hvmemul_do_io( { struct vcpu *curr = current; struct domain *currd = curr->domain; - struct hvm_vcpu_io *vio = &curr->arch.hvm.hvm_io; + struct vcpu_io *vio = &curr->io; ioreq_t p = { .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO, .addr = addr, @@ -XXX,XX +XXX,XX @@ static int hvmemul_do_io( return X86EMUL_UNHANDLEABLE; } - switch ( vio->io_req.state ) + switch ( vio->req.state ) { case STATE_IOREQ_NONE: break; case STATE_IORESP_READY: - vio->io_req.state = STATE_IOREQ_NONE; - p = vio->io_req; + vio->req.state = STATE_IOREQ_NONE; + p = vio->req; /* Verify the emulation request has been correctly re-issued */ if ( (p.type != (is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO)) || @@ -XXX,XX +XXX,XX @@ static int hvmemul_do_io( } ASSERT(p.count); - vio->io_req = p; + vio->req = p; rc = hvm_io_intercept(&p); @@ -XXX,XX +XXX,XX @@ static int hvmemul_do_io( * our callers and mirror this into latched state. */ ASSERT(p.count <= *reps); - *reps = vio->io_req.count = p.count; + *reps = vio->req.count = p.count; switch ( rc ) { case X86EMUL_OKAY: - vio->io_req.state = STATE_IOREQ_NONE; + vio->req.state = STATE_IOREQ_NONE; break; case X86EMUL_UNHANDLEABLE: { @@ -XXX,XX +XXX,XX @@ static int hvmemul_do_io( if ( s == NULL ) { rc = X86EMUL_RETRY; - vio->io_req.state = STATE_IOREQ_NONE; + vio->req.state = STATE_IOREQ_NONE; break; } @@ -XXX,XX +XXX,XX @@ static int hvmemul_do_io( if ( dir == IOREQ_READ ) { rc = hvm_process_io_intercept(&ioreq_server_handler, &p); - vio->io_req.state = STATE_IOREQ_NONE; + vio->req.state = STATE_IOREQ_NONE; break; } } @@ -XXX,XX +XXX,XX @@ static int hvmemul_do_io( if ( !s ) { rc = hvm_process_io_intercept(&null_handler, &p); - vio->io_req.state = STATE_IOREQ_NONE; + vio->req.state = STATE_IOREQ_NONE; } else { rc = hvm_send_ioreq(s, &p, 0); if ( rc != X86EMUL_RETRY || currd->is_shutting_down ) - vio->io_req.state = STATE_IOREQ_NONE; - else if ( !ioreq_needs_completion(&vio->io_req) ) + vio->req.state = STATE_IOREQ_NONE; + else if ( !ioreq_needs_completion(&vio->req) ) rc = X86EMUL_OKAY; } break; @@ -XXX,XX +XXX,XX @@ static int hvmemul_phys_mmio_access( * cache indexed by linear MMIO address. */ static struct hvm_mmio_cache *hvmemul_find_mmio_cache( - struct hvm_vcpu_io *vio, unsigned long gla, uint8_t dir, bool create) + struct hvm_vcpu_io *hvio, unsigned long gla, uint8_t dir, bool create) { unsigned int i; struct hvm_mmio_cache *cache; - for ( i = 0; i < vio->mmio_cache_count; i ++ ) + for ( i = 0; i < hvio->mmio_cache_count; i ++ ) { - cache = &vio->mmio_cache[i]; + cache = &hvio->mmio_cache[i]; if ( gla == cache->gla && dir == cache->dir ) @@ -XXX,XX +XXX,XX @@ static struct hvm_mmio_cache *hvmemul_find_mmio_cache( if ( !create ) return NULL; - i = vio->mmio_cache_count; - if( i == ARRAY_SIZE(vio->mmio_cache) ) + i = hvio->mmio_cache_count; + if( i == ARRAY_SIZE(hvio->mmio_cache) ) return NULL; - ++vio->mmio_cache_count; + ++hvio->mmio_cache_count; - cache = &vio->mmio_cache[i]; + cache = &hvio->mmio_cache[i]; memset(cache, 0, sizeof (*cache)); cache->gla = gla; @@ -XXX,XX +XXX,XX @@ static struct hvm_mmio_cache *hvmemul_find_mmio_cache( return cache; } -static void latch_linear_to_phys(struct hvm_vcpu_io *vio, unsigned long gla, +static void latch_linear_to_phys(struct hvm_vcpu_io *hvio, unsigned long gla, unsigned long gpa, bool_t write) { - if ( vio->mmio_access.gla_valid ) + if ( hvio->mmio_access.gla_valid ) return; - vio->mmio_gla = gla & PAGE_MASK; - vio->mmio_gpfn = PFN_DOWN(gpa); - vio->mmio_access = (struct npfec){ .gla_valid = 1, - .read_access = 1, - .write_access = write }; + hvio->mmio_gla = gla & PAGE_MASK; + hvio->mmio_gpfn = PFN_DOWN(gpa); + hvio->mmio_access = (struct npfec){ .gla_valid = 1, + .read_access = 1, + .write_access = write }; } static int hvmemul_linear_mmio_access( unsigned long gla, unsigned int size, uint8_t dir, void *buffer, uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt, bool_t known_gpfn) { - struct hvm_vcpu_io *vio = ¤t->arch.hvm.hvm_io; + struct hvm_vcpu_io *hvio = ¤t->arch.hvm.hvm_io; unsigned long offset = gla & ~PAGE_MASK; - struct hvm_mmio_cache *cache = hvmemul_find_mmio_cache(vio, gla, dir, true); + struct hvm_mmio_cache *cache = hvmemul_find_mmio_cache(hvio, gla, dir, true); unsigned int chunk, buffer_offset = 0; paddr_t gpa; unsigned long one_rep = 1; @@ -XXX,XX +XXX,XX @@ static int hvmemul_linear_mmio_access( chunk = min_t(unsigned int, size, PAGE_SIZE - offset); if ( known_gpfn ) - gpa = pfn_to_paddr(vio->mmio_gpfn) | offset; + gpa = pfn_to_paddr(hvio->mmio_gpfn) | offset; else { rc = hvmemul_linear_to_phys(gla, &gpa, chunk, &one_rep, pfec, @@ -XXX,XX +XXX,XX @@ static int hvmemul_linear_mmio_access( if ( rc != X86EMUL_OKAY ) return rc; - latch_linear_to_phys(vio, gla, gpa, dir == IOREQ_WRITE); + latch_linear_to_phys(hvio, gla, gpa, dir == IOREQ_WRITE); } for ( ;; ) @@ -XXX,XX +XXX,XX @@ static inline int hvmemul_linear_mmio_write( static bool known_gla(unsigned long addr, unsigned int bytes, uint32_t pfec) { - const struct hvm_vcpu_io *vio = ¤t->arch.hvm.hvm_io; + const struct hvm_vcpu_io *hvio = ¤t->arch.hvm.hvm_io; if ( pfec & PFEC_write_access ) { - if ( !vio->mmio_access.write_access ) + if ( !hvio->mmio_access.write_access ) return false; } else if ( pfec & PFEC_insn_fetch ) { - if ( !vio->mmio_access.insn_fetch ) + if ( !hvio->mmio_access.insn_fetch ) return false; } - else if ( !vio->mmio_access.read_access ) + else if ( !hvio->mmio_access.read_access ) return false; - return (vio->mmio_gla == (addr & PAGE_MASK) && + return (hvio->mmio_gla == (addr & PAGE_MASK) && (addr & ~PAGE_MASK) + bytes <= PAGE_SIZE); } @@ -XXX,XX +XXX,XX @@ static int linear_read(unsigned long addr, unsigned int bytes, void *p_data, uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt) { pagefault_info_t pfinfo; - struct hvm_vcpu_io *vio = ¤t->arch.hvm.hvm_io; + struct hvm_vcpu_io *hvio = ¤t->arch.hvm.hvm_io; unsigned int offset = addr & ~PAGE_MASK; int rc = HVMTRANS_bad_gfn_to_mfn; @@ -XXX,XX +XXX,XX @@ static int linear_read(unsigned long addr, unsigned int bytes, void *p_data, * we handle this access in the same way to guarantee completion and hence * clean up any interim state. */ - if ( !hvmemul_find_mmio_cache(vio, addr, IOREQ_READ, false) ) + if ( !hvmemul_find_mmio_cache(hvio, addr, IOREQ_READ, false) ) rc = hvm_copy_from_guest_linear(p_data, addr, bytes, pfec, &pfinfo); switch ( rc ) @@ -XXX,XX +XXX,XX @@ static int linear_write(unsigned long addr, unsigned int bytes, void *p_data, uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt) { pagefault_info_t pfinfo; - struct hvm_vcpu_io *vio = ¤t->arch.hvm.hvm_io; + struct hvm_vcpu_io *hvio = ¤t->arch.hvm.hvm_io; unsigned int offset = addr & ~PAGE_MASK; int rc = HVMTRANS_bad_gfn_to_mfn; @@ -XXX,XX +XXX,XX @@ static int linear_write(unsigned long addr, unsigned int bytes, void *p_data, * we handle this access in the same way to guarantee completion and hence * clean up any interim state. */ - if ( !hvmemul_find_mmio_cache(vio, addr, IOREQ_WRITE, false) ) + if ( !hvmemul_find_mmio_cache(hvio, addr, IOREQ_WRITE, false) ) rc = hvm_copy_to_guest_linear(addr, p_data, bytes, pfec, &pfinfo); switch ( rc ) @@ -XXX,XX +XXX,XX @@ static int hvmemul_cmpxchg( struct vcpu *curr = current; unsigned long addr; uint32_t pfec = PFEC_page_present | PFEC_write_access; - struct hvm_vcpu_io *vio = &curr->arch.hvm.hvm_io; + struct hvm_vcpu_io *hvio = &curr->arch.hvm.hvm_io; int rc; void *mapping = NULL; @@ -XXX,XX +XXX,XX @@ static int hvmemul_cmpxchg( /* Fix this in case the guest is really relying on r-m-w atomicity. */ return hvmemul_linear_mmio_write(addr, bytes, p_new, pfec, hvmemul_ctxt, - vio->mmio_access.write_access && - vio->mmio_gla == (addr & PAGE_MASK)); + hvio->mmio_access.write_access && + hvio->mmio_gla == (addr & PAGE_MASK)); } switch ( bytes ) @@ -XXX,XX +XXX,XX @@ static int hvmemul_rep_movs( struct hvm_emulate_ctxt *hvmemul_ctxt = container_of(ctxt, struct hvm_emulate_ctxt, ctxt); struct vcpu *curr = current; - struct hvm_vcpu_io *vio = &curr->arch.hvm.hvm_io; + struct hvm_vcpu_io *hvio = &curr->arch.hvm.hvm_io; unsigned long saddr, daddr, bytes; paddr_t sgpa, dgpa; uint32_t pfec = PFEC_page_present; @@ -XXX,XX +XXX,XX @@ static int hvmemul_rep_movs( if ( hvmemul_ctxt->seg_reg[x86_seg_ss].dpl == 3 ) pfec |= PFEC_user_mode; - if ( vio->mmio_access.read_access && - (vio->mmio_gla == (saddr & PAGE_MASK)) && + if ( hvio->mmio_access.read_access && + (hvio->mmio_gla == (saddr & PAGE_MASK)) && /* * Upon initial invocation don't truncate large batches just because * of a hit for the translation: Doing the guest page table walk is * cheaper than multiple round trips through the device model. Yet * when processing a response we can always re-use the translation. */ - (vio->io_req.state == STATE_IORESP_READY || + (curr->io.req.state == STATE_IORESP_READY || ((!df || *reps == 1) && PAGE_SIZE - (saddr & ~PAGE_MASK) >= *reps * bytes_per_rep)) ) - sgpa = pfn_to_paddr(vio->mmio_gpfn) | (saddr & ~PAGE_MASK); + sgpa = pfn_to_paddr(hvio->mmio_gpfn) | (saddr & ~PAGE_MASK); else { rc = hvmemul_linear_to_phys(saddr, &sgpa, bytes_per_rep, reps, pfec, @@ -XXX,XX +XXX,XX @@ static int hvmemul_rep_movs( } bytes = PAGE_SIZE - (daddr & ~PAGE_MASK); - if ( vio->mmio_access.write_access && - (vio->mmio_gla == (daddr & PAGE_MASK)) && + if ( hvio->mmio_access.write_access && + (hvio->mmio_gla == (daddr & PAGE_MASK)) && /* See comment above. */ - (vio->io_req.state == STATE_IORESP_READY || + (curr->io.req.state == STATE_IORESP_READY || ((!df || *reps == 1) && PAGE_SIZE - (daddr & ~PAGE_MASK) >= *reps * bytes_per_rep)) ) - dgpa = pfn_to_paddr(vio->mmio_gpfn) | (daddr & ~PAGE_MASK); + dgpa = pfn_to_paddr(hvio->mmio_gpfn) | (daddr & ~PAGE_MASK); else { rc = hvmemul_linear_to_phys(daddr, &dgpa, bytes_per_rep, reps, @@ -XXX,XX +XXX,XX @@ static int hvmemul_rep_movs( if ( sp2mt == p2m_mmio_dm ) { - latch_linear_to_phys(vio, saddr, sgpa, 0); + latch_linear_to_phys(hvio, saddr, sgpa, 0); return hvmemul_do_mmio_addr( sgpa, reps, bytes_per_rep, IOREQ_READ, df, dgpa); } if ( dp2mt == p2m_mmio_dm ) { - latch_linear_to_phys(vio, daddr, dgpa, 1); + latch_linear_to_phys(hvio, daddr, dgpa, 1); return hvmemul_do_mmio_addr( dgpa, reps, bytes_per_rep, IOREQ_WRITE, df, sgpa); } @@ -XXX,XX +XXX,XX @@ static int hvmemul_rep_stos( struct hvm_emulate_ctxt *hvmemul_ctxt = container_of(ctxt, struct hvm_emulate_ctxt, ctxt); struct vcpu *curr = current; - struct hvm_vcpu_io *vio = &curr->arch.hvm.hvm_io; + struct hvm_vcpu_io *hvio = &curr->arch.hvm.hvm_io; unsigned long addr, bytes; paddr_t gpa; p2m_type_t p2mt; @@ -XXX,XX +XXX,XX @@ static int hvmemul_rep_stos( return rc; bytes = PAGE_SIZE - (addr & ~PAGE_MASK); - if ( vio->mmio_access.write_access && - (vio->mmio_gla == (addr & PAGE_MASK)) && + if ( hvio->mmio_access.write_access && + (hvio->mmio_gla == (addr & PAGE_MASK)) && /* See respective comment in MOVS processing. */ - (vio->io_req.state == STATE_IORESP_READY || + (curr->io.req.state == STATE_IORESP_READY || ((!df || *reps == 1) && PAGE_SIZE - (addr & ~PAGE_MASK) >= *reps * bytes_per_rep)) ) - gpa = pfn_to_paddr(vio->mmio_gpfn) | (addr & ~PAGE_MASK); + gpa = pfn_to_paddr(hvio->mmio_gpfn) | (addr & ~PAGE_MASK); else { uint32_t pfec = PFEC_page_present | PFEC_write_access; @@ -XXX,XX +XXX,XX @@ static int hvmemul_rep_stos( return X86EMUL_UNHANDLEABLE; case p2m_mmio_dm: - latch_linear_to_phys(vio, addr, gpa, 1); + latch_linear_to_phys(hvio, addr, gpa, 1); return hvmemul_do_mmio_buffer(gpa, reps, bytes_per_rep, IOREQ_WRITE, df, p_data); } @@ -XXX,XX +XXX,XX @@ static const struct x86_emulate_ops hvm_emulate_ops_no_write = { }; /* - * Note that passing HVMIO_no_completion into this function serves as kind + * Note that passing VIO_no_completion into this function serves as kind * of (but not fully) an "auto select completion" indicator. When there's * no completion needed, the passed in value will be ignored in any case. */ static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt, const struct x86_emulate_ops *ops, - enum hvm_io_completion completion) + enum vio_completion completion) { const struct cpu_user_regs *regs = hvmemul_ctxt->ctxt.regs; struct vcpu *curr = current; uint32_t new_intr_shadow; - struct hvm_vcpu_io *vio = &curr->arch.hvm.hvm_io; + struct hvm_vcpu_io *hvio = &curr->arch.hvm.hvm_io; int rc; /* @@ -XXX,XX +XXX,XX @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt, * untouched if it's already enabled, for re-execution to consume * entries populated by an earlier pass. */ - if ( vio->cache->num_ents > vio->cache->max_ents ) + if ( hvio->cache->num_ents > hvio->cache->max_ents ) { - ASSERT(vio->io_req.state == STATE_IOREQ_NONE); - vio->cache->num_ents = 0; + ASSERT(curr->io.req.state == STATE_IOREQ_NONE); + hvio->cache->num_ents = 0; } else - ASSERT(vio->io_req.state == STATE_IORESP_READY); + ASSERT(curr->io.req.state == STATE_IORESP_READY); - hvm_emulate_init_per_insn(hvmemul_ctxt, vio->mmio_insn, - vio->mmio_insn_bytes); + hvm_emulate_init_per_insn(hvmemul_ctxt, hvio->mmio_insn, + hvio->mmio_insn_bytes); - vio->mmio_retry = 0; + hvio->mmio_retry = 0; rc = x86_emulate(&hvmemul_ctxt->ctxt, ops); - if ( rc == X86EMUL_OKAY && vio->mmio_retry ) + if ( rc == X86EMUL_OKAY && hvio->mmio_retry ) rc = X86EMUL_RETRY; - if ( !ioreq_needs_completion(&vio->io_req) ) - completion = HVMIO_no_completion; - else if ( completion == HVMIO_no_completion ) - completion = (vio->io_req.type != IOREQ_TYPE_PIO || - hvmemul_ctxt->is_mem_access) ? HVMIO_mmio_completion - : HVMIO_pio_completion; + if ( !ioreq_needs_completion(&curr->io.req) ) + completion = VIO_no_completion; + else if ( completion == VIO_no_completion ) + completion = (curr->io.req.type != IOREQ_TYPE_PIO || + hvmemul_ctxt->is_mem_access) ? VIO_mmio_completion + : VIO_pio_completion; - switch ( vio->io_completion = completion ) + switch ( curr->io.completion = completion ) { - case HVMIO_no_completion: - case HVMIO_pio_completion: - vio->mmio_cache_count = 0; - vio->mmio_insn_bytes = 0; - vio->mmio_access = (struct npfec){}; + case VIO_no_completion: + case VIO_pio_completion: + hvio->mmio_cache_count = 0; + hvio->mmio_insn_bytes = 0; + hvio->mmio_access = (struct npfec){}; hvmemul_cache_disable(curr); break; - case HVMIO_mmio_completion: - case HVMIO_realmode_completion: - BUILD_BUG_ON(sizeof(vio->mmio_insn) < sizeof(hvmemul_ctxt->insn_buf)); - vio->mmio_insn_bytes = hvmemul_ctxt->insn_buf_bytes; - memcpy(vio->mmio_insn, hvmemul_ctxt->insn_buf, vio->mmio_insn_bytes); + case VIO_mmio_completion: + case VIO_realmode_completion: + BUILD_BUG_ON(sizeof(hvio->mmio_insn) < sizeof(hvmemul_ctxt->insn_buf)); + hvio->mmio_insn_bytes = hvmemul_ctxt->insn_buf_bytes; + memcpy(hvio->mmio_insn, hvmemul_ctxt->insn_buf, hvio->mmio_insn_bytes); break; default: @@ -XXX,XX +XXX,XX @@ static int _hvm_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt, int hvm_emulate_one( struct hvm_emulate_ctxt *hvmemul_ctxt, - enum hvm_io_completion completion) + enum vio_completion completion) { return _hvm_emulate_one(hvmemul_ctxt, &hvm_emulate_ops, completion); } @@ -XXX,XX +XXX,XX @@ int hvm_emulate_one_mmio(unsigned long mfn, unsigned long gla) guest_cpu_user_regs()); ctxt.ctxt.data = &mmio_ro_ctxt; - switch ( rc = _hvm_emulate_one(&ctxt, ops, HVMIO_no_completion) ) + switch ( rc = _hvm_emulate_one(&ctxt, ops, VIO_no_completion) ) { case X86EMUL_UNHANDLEABLE: case X86EMUL_UNIMPLEMENTED: @@ -XXX,XX +XXX,XX @@ void hvm_emulate_one_vm_event(enum emul_kind kind, unsigned int trapnr, { case EMUL_KIND_NOWRITE: rc = _hvm_emulate_one(&ctx, &hvm_emulate_ops_no_write, - HVMIO_no_completion); + VIO_no_completion); break; case EMUL_KIND_SET_CONTEXT_INSN: { struct vcpu *curr = current; - struct hvm_vcpu_io *vio = &curr->arch.hvm.hvm_io; + struct hvm_vcpu_io *hvio = &curr->arch.hvm.hvm_io; - BUILD_BUG_ON(sizeof(vio->mmio_insn) != + BUILD_BUG_ON(sizeof(hvio->mmio_insn) != sizeof(curr->arch.vm_event->emul.insn.data)); - ASSERT(!vio->mmio_insn_bytes); + ASSERT(!hvio->mmio_insn_bytes); /* * Stash insn buffer into mmio buffer here instead of ctx * to avoid having to add more logic to hvm_emulate_one. */ - vio->mmio_insn_bytes = sizeof(vio->mmio_insn); - memcpy(vio->mmio_insn, curr->arch.vm_event->emul.insn.data, - vio->mmio_insn_bytes); + hvio->mmio_insn_bytes = sizeof(hvio->mmio_insn); + memcpy(hvio->mmio_insn, curr->arch.vm_event->emul.insn.data, + hvio->mmio_insn_bytes); } /* Fall-through */ default: ctx.set_context = (kind == EMUL_KIND_SET_CONTEXT_DATA); - rc = hvm_emulate_one(&ctx, HVMIO_no_completion); + rc = hvm_emulate_one(&ctx, VIO_no_completion); } switch ( rc ) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -XXX,XX +XXX,XX @@ void hvm_ud_intercept(struct cpu_user_regs *regs) return; } - switch ( hvm_emulate_one(&ctxt, HVMIO_no_completion) ) + switch ( hvm_emulate_one(&ctxt, VIO_no_completion) ) { case X86EMUL_UNHANDLEABLE: case X86EMUL_UNIMPLEMENTED: diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/io.c +++ b/xen/arch/x86/hvm/io.c @@ -XXX,XX +XXX,XX @@ bool hvm_emulate_one_insn(hvm_emulate_validate_t *validate, const char *descr) hvm_emulate_init_once(&ctxt, validate, guest_cpu_user_regs()); - switch ( rc = hvm_emulate_one(&ctxt, HVMIO_no_completion) ) + switch ( rc = hvm_emulate_one(&ctxt, VIO_no_completion) ) { case X86EMUL_UNHANDLEABLE: hvm_dump_emulation_state(XENLOG_G_WARNING, descr, &ctxt, rc); @@ -XXX,XX +XXX,XX @@ bool hvm_emulate_one_insn(hvm_emulate_validate_t *validate, const char *descr) bool handle_mmio_with_translation(unsigned long gla, unsigned long gpfn, struct npfec access) { - struct hvm_vcpu_io *vio = ¤t->arch.hvm.hvm_io; + struct hvm_vcpu_io *hvio = ¤t->arch.hvm.hvm_io; - vio->mmio_access = access.gla_valid && - access.kind == npfec_kind_with_gla - ? access : (struct npfec){}; - vio->mmio_gla = gla & PAGE_MASK; - vio->mmio_gpfn = gpfn; + hvio->mmio_access = access.gla_valid && + access.kind == npfec_kind_with_gla + ? access : (struct npfec){}; + hvio->mmio_gla = gla & PAGE_MASK; + hvio->mmio_gpfn = gpfn; return handle_mmio(); } bool handle_pio(uint16_t port, unsigned int size, int dir) { struct vcpu *curr = current; - struct hvm_vcpu_io *vio = &curr->arch.hvm.hvm_io; + struct vcpu_io *vio = &curr->io; unsigned int data; int rc; @@ -XXX,XX +XXX,XX @@ bool handle_pio(uint16_t port, unsigned int size, int dir) rc = hvmemul_do_pio_buffer(port, size, dir, &data); - if ( ioreq_needs_completion(&vio->io_req) ) - vio->io_completion = HVMIO_pio_completion; + if ( ioreq_needs_completion(&vio->req) ) + vio->completion = VIO_pio_completion; switch ( rc ) { @@ -XXX,XX +XXX,XX @@ static bool_t g2m_portio_accept(const struct hvm_io_handler *handler, { struct vcpu *curr = current; const struct hvm_domain *hvm = &curr->domain->arch.hvm; - struct hvm_vcpu_io *vio = &curr->arch.hvm.hvm_io; + struct hvm_vcpu_io *hvio = &curr->arch.hvm.hvm_io; struct g2m_ioport *g2m_ioport; unsigned int start, end; @@ -XXX,XX +XXX,XX @@ static bool_t g2m_portio_accept(const struct hvm_io_handler *handler, end = start + g2m_ioport->np; if ( (p->addr >= start) && (p->addr + p->size <= end) ) { - vio->g2m_ioport = g2m_ioport; + hvio->g2m_ioport = g2m_ioport; return 1; } } @@ -XXX,XX +XXX,XX @@ static bool_t g2m_portio_accept(const struct hvm_io_handler *handler, static int g2m_portio_read(const struct hvm_io_handler *handler, uint64_t addr, uint32_t size, uint64_t *data) { - struct hvm_vcpu_io *vio = ¤t->arch.hvm.hvm_io; - const struct g2m_ioport *g2m_ioport = vio->g2m_ioport; + struct hvm_vcpu_io *hvio = ¤t->arch.hvm.hvm_io; + const struct g2m_ioport *g2m_ioport = hvio->g2m_ioport; unsigned int mport = (addr - g2m_ioport->gport) + g2m_ioport->mport; switch ( size ) @@ -XXX,XX +XXX,XX @@ static int g2m_portio_read(const struct hvm_io_handler *handler, static int g2m_portio_write(const struct hvm_io_handler *handler, uint64_t addr, uint32_t size, uint64_t data) { - struct hvm_vcpu_io *vio = ¤t->arch.hvm.hvm_io; - const struct g2m_ioport *g2m_ioport = vio->g2m_ioport; + struct hvm_vcpu_io *hvio = ¤t->arch.hvm.hvm_io; + const struct g2m_ioport *g2m_ioport = hvio->g2m_ioport; unsigned int mport = (addr - g2m_ioport->gport) + g2m_ioport->mport; switch ( size ) diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/ioreq.c +++ b/xen/arch/x86/hvm/ioreq.c @@ -XXX,XX +XXX,XX @@ bool arch_ioreq_complete_mmio(void) return handle_mmio(); } -bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion) +bool arch_vcpu_ioreq_completion(enum vio_completion completion) { - switch ( io_completion ) + switch ( completion ) { - case HVMIO_realmode_completion: + case VIO_realmode_completion: { struct hvm_emulate_ctxt ctxt; diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/svm/nestedsvm.c +++ b/xen/arch/x86/hvm/svm/nestedsvm.c @@ -XXX,XX +XXX,XX @@ enum hvm_intblk nsvm_intr_blocked(struct vcpu *v) * Delay the injection because this would result in delivering * an interrupt *within* the execution of an instruction. */ - if ( v->arch.hvm.hvm_io.io_req.state != STATE_IOREQ_NONE ) + if ( v->io.req.state != STATE_IOREQ_NONE ) return hvm_intblk_shadow; if ( !nv->nv_vmexit_pending && n2vmcb->exit_int_info.v ) diff --git a/xen/arch/x86/hvm/vmx/realmode.c b/xen/arch/x86/hvm/vmx/realmode.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/vmx/realmode.c +++ b/xen/arch/x86/hvm/vmx/realmode.c @@ -XXX,XX +XXX,XX @@ void vmx_realmode_emulate_one(struct hvm_emulate_ctxt *hvmemul_ctxt) perfc_incr(realmode_emulations); - rc = hvm_emulate_one(hvmemul_ctxt, HVMIO_realmode_completion); + rc = hvm_emulate_one(hvmemul_ctxt, VIO_realmode_completion); if ( rc == X86EMUL_UNHANDLEABLE ) { @@ -XXX,XX +XXX,XX @@ void vmx_realmode(struct cpu_user_regs *regs) struct vcpu *curr = current; struct hvm_emulate_ctxt hvmemul_ctxt; struct segment_register *sreg; - struct hvm_vcpu_io *vio = &curr->arch.hvm.hvm_io; + struct hvm_vcpu_io *hvio = &curr->arch.hvm.hvm_io; unsigned long intr_info; unsigned int emulations = 0; @@ -XXX,XX +XXX,XX @@ void vmx_realmode(struct cpu_user_regs *regs) vmx_realmode_emulate_one(&hvmemul_ctxt); - if ( vio->io_req.state != STATE_IOREQ_NONE || vio->mmio_retry ) + if ( curr->io.req.state != STATE_IOREQ_NONE || hvio->mmio_retry ) break; /* Stop emulating unless our segment state is not safe */ @@ -XXX,XX +XXX,XX @@ void vmx_realmode(struct cpu_user_regs *regs) } /* Need to emulate next time if we've started an IO operation */ - if ( vio->io_req.state != STATE_IOREQ_NONE ) + if ( curr->io.req.state != STATE_IOREQ_NONE ) curr->arch.hvm.vmx.vmx_emulate = 1; if ( !curr->arch.hvm.vmx.vmx_emulate && !curr->arch.hvm.vmx.vmx_realmode ) diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/ioreq.c +++ b/xen/common/ioreq.c @@ -XXX,XX +XXX,XX @@ static bool hvm_wait_for_io(struct ioreq_vcpu *sv, ioreq_t *p) break; } - p = &sv->vcpu->arch.hvm.hvm_io.io_req; + p = &sv->vcpu->io.req; if ( ioreq_needs_completion(p) ) p->data = data; @@ -XXX,XX +XXX,XX @@ static bool hvm_wait_for_io(struct ioreq_vcpu *sv, ioreq_t *p) bool handle_hvm_io_completion(struct vcpu *v) { struct domain *d = v->domain; - struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io; + struct vcpu_io *vio = &v->io; struct ioreq_server *s; struct ioreq_vcpu *sv; - enum hvm_io_completion io_completion; + enum vio_completion completion; if ( has_vpci(d) && vpci_process_pending(v) ) { @@ -XXX,XX +XXX,XX @@ bool handle_hvm_io_completion(struct vcpu *v) if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) ) return false; - vio->io_req.state = ioreq_needs_completion(&vio->io_req) ? + vio->req.state = ioreq_needs_completion(&vio->req) ? STATE_IORESP_READY : STATE_IOREQ_NONE; msix_write_completion(v); vcpu_end_shutdown_deferral(v); - io_completion = vio->io_completion; - vio->io_completion = HVMIO_no_completion; + completion = vio->completion; + vio->completion = VIO_no_completion; - switch ( io_completion ) + switch ( completion ) { - case HVMIO_no_completion: + case VIO_no_completion: break; - case HVMIO_mmio_completion: + case VIO_mmio_completion: return arch_ioreq_complete_mmio(); - case HVMIO_pio_completion: - return handle_pio(vio->io_req.addr, vio->io_req.size, - vio->io_req.dir); + case VIO_pio_completion: + return handle_pio(vio->req.addr, vio->req.size, + vio->req.dir); default: - return arch_vcpu_ioreq_completion(io_completion); + return arch_vcpu_ioreq_completion(completion); } return true; diff --git a/xen/include/asm-x86/hvm/emulate.h b/xen/include/asm-x86/hvm/emulate.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/emulate.h +++ b/xen/include/asm-x86/hvm/emulate.h @@ -XXX,XX +XXX,XX @@ bool __nonnull(1, 2) hvm_emulate_one_insn( const char *descr); int hvm_emulate_one( struct hvm_emulate_ctxt *hvmemul_ctxt, - enum hvm_io_completion completion); + enum vio_completion completion); void hvm_emulate_one_vm_event(enum emul_kind kind, unsigned int trapnr, unsigned int errcode); diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/vcpu.h +++ b/xen/include/asm-x86/hvm/vcpu.h @@ -XXX,XX +XXX,XX @@ #include <asm/mtrr.h> #include <public/hvm/ioreq.h> -enum hvm_io_completion { - HVMIO_no_completion, - HVMIO_mmio_completion, - HVMIO_pio_completion, - HVMIO_realmode_completion -}; - struct hvm_vcpu_asid { uint64_t generation; uint32_t asid; @@ -XXX,XX +XXX,XX @@ struct hvm_mmio_cache { }; struct hvm_vcpu_io { - /* I/O request in flight to device model. */ - enum hvm_io_completion io_completion; - ioreq_t io_req; - /* * HVM emulation: * Linear address @mmio_gla maps to MMIO physical frame @mmio_gpfn. diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/ioreq.h +++ b/xen/include/xen/ioreq.h @@ -XXX,XX +XXX,XX @@ void hvm_ioreq_init(struct domain *d); int ioreq_server_dm_op(struct xen_dm_op *op, struct domain *d, bool *const_op); bool arch_ioreq_complete_mmio(void); -bool arch_vcpu_ioreq_completion(enum hvm_io_completion io_completion); +bool arch_vcpu_ioreq_completion(enum vio_completion completion); int arch_ioreq_server_map_pages(struct ioreq_server *s); void arch_ioreq_server_unmap_pages(struct ioreq_server *s); void arch_ioreq_server_enable(struct ioreq_server *s); diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -XXX,XX +XXX,XX @@ void evtchn_destroy_final(struct domain *d); /* from complete_domain_destroy */ struct waitqueue_vcpu; +enum vio_completion { + VIO_no_completion, + VIO_mmio_completion, + VIO_pio_completion, +#ifdef CONFIG_X86 + VIO_realmode_completion, +#endif +}; + +struct vcpu_io { + /* I/O request in flight to device model. */ + enum vio_completion completion; + ioreq_t req; +}; + struct vcpu { int vcpu_id; @@ -XXX,XX +XXX,XX @@ struct vcpu struct vpci_vcpu vpci; struct arch_vcpu arch; + +#ifdef CONFIG_IOREQ_SERVER + struct vcpu_io io; +#endif }; struct sched_unit { -- 2.7.4
From: Julien Grall <julien.grall@arm.com> As x86 implementation of XENMEM_resource_ioreq_server can be re-used on Arm later on, this patch makes it common and removes arch_acquire_resource as unneeded. Also re-order #include-s alphabetically. This support is going to be used on Arm to be able run device emulator outside of Xen hypervisor. Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - no changes Changes V1 -> V2: - update the author of a patch Changes V2 -> V3: - don't wrap #include <xen/ioreq.h> - limit the number of #ifdef-s - re-order #include-s alphabetically Changes V3 -> V4: - rebase - Add Jan's R-b --- xen/arch/x86/mm.c | 44 --------------------------------- xen/common/memory.c | 63 +++++++++++++++++++++++++++++++++++++++--------- xen/include/asm-arm/mm.h | 8 ------ xen/include/asm-x86/mm.h | 4 --- 4 files changed, 51 insertions(+), 68 deletions(-) diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -XXX,XX +XXX,XX @@ static int handle_iomem_range(unsigned long s, unsigned long e, void *p) return err || s > e ? err : _handle_iomem_range(s, e, p); } -int arch_acquire_resource(struct domain *d, unsigned int type, - unsigned int id, unsigned long frame, - unsigned int nr_frames, xen_pfn_t mfn_list[]) -{ - int rc; - - switch ( type ) - { -#ifdef CONFIG_HVM - case XENMEM_resource_ioreq_server: - { - ioservid_t ioservid = id; - unsigned int i; - - rc = -EINVAL; - if ( !is_hvm_domain(d) ) - break; - - if ( id != (unsigned int)ioservid ) - break; - - rc = 0; - for ( i = 0; i < nr_frames; i++ ) - { - mfn_t mfn; - - rc = hvm_get_ioreq_server_frame(d, id, frame + i, &mfn); - if ( rc ) - break; - - mfn_list[i] = mfn_x(mfn); - } - break; - } -#endif - - default: - rc = -EOPNOTSUPP; - break; - } - - return rc; -} - long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { int rc; diff --git a/xen/common/memory.c b/xen/common/memory.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -XXX,XX +XXX,XX @@ */ #include <xen/domain_page.h> -#include <xen/types.h> +#include <xen/errno.h> +#include <xen/event.h> +#include <xen/grant_table.h> +#include <xen/guest_access.h> +#include <xen/hypercall.h> +#include <xen/iocap.h> +#include <xen/ioreq.h> #include <xen/lib.h> +#include <xen/mem_access.h> #include <xen/mm.h> +#include <xen/numa.h> +#include <xen/paging.h> #include <xen/param.h> #include <xen/perfc.h> #include <xen/sched.h> -#include <xen/event.h> -#include <xen/paging.h> -#include <xen/iocap.h> -#include <xen/guest_access.h> -#include <xen/hypercall.h> -#include <xen/errno.h> -#include <xen/numa.h> -#include <xen/mem_access.h> #include <xen/trace.h> -#include <xen/grant_table.h> +#include <xen/types.h> #include <asm/current.h> #include <asm/hardirq.h> #include <asm/p2m.h> @@ -XXX,XX +XXX,XX @@ static int acquire_grant_table(struct domain *d, unsigned int id, return 0; } +static int acquire_ioreq_server(struct domain *d, + unsigned int id, + unsigned long frame, + unsigned int nr_frames, + xen_pfn_t mfn_list[]) +{ +#ifdef CONFIG_IOREQ_SERVER + ioservid_t ioservid = id; + unsigned int i; + int rc; + + if ( !is_hvm_domain(d) ) + return -EINVAL; + + if ( id != (unsigned int)ioservid ) + return -EINVAL; + + for ( i = 0; i < nr_frames; i++ ) + { + mfn_t mfn; + + rc = hvm_get_ioreq_server_frame(d, id, frame + i, &mfn); + if ( rc ) + return rc; + + mfn_list[i] = mfn_x(mfn); + } + + return 0; +#else + return -EOPNOTSUPP; +#endif +} + static int acquire_resource( XEN_GUEST_HANDLE_PARAM(xen_mem_acquire_resource_t) arg) { @@ -XXX,XX +XXX,XX @@ static int acquire_resource( mfn_list); break; + case XENMEM_resource_ioreq_server: + rc = acquire_ioreq_server(d, xmar.id, xmar.frame, xmar.nr_frames, + mfn_list); + break; + default: - rc = arch_acquire_resource(d, xmar.type, xmar.id, xmar.frame, - xmar.nr_frames, mfn_list); + rc = -EOPNOTSUPP; break; } diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-arm/mm.h +++ b/xen/include/asm-arm/mm.h @@ -XXX,XX +XXX,XX @@ static inline void put_page_and_type(struct page_info *page) void clear_and_clean_page(struct page_info *page); -static inline -int arch_acquire_resource(struct domain *d, unsigned int type, unsigned int id, - unsigned long frame, unsigned int nr_frames, - xen_pfn_t mfn_list[]) -{ - return -EOPNOTSUPP; -} - unsigned int arch_get_dma_bitsize(void); #endif /* __ARCH_ARM_MM__ */ diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/mm.h +++ b/xen/include/asm-x86/mm.h @@ -XXX,XX +XXX,XX @@ static inline bool arch_mfn_in_directmap(unsigned long mfn) return mfn <= (virt_to_mfn(eva - 1) + 1); } -int arch_acquire_resource(struct domain *d, unsigned int type, - unsigned int id, unsigned long frame, - unsigned int nr_frames, xen_pfn_t mfn_list[]); - #endif /* __ASM_X86_MM_H__ */ -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> This patch removes "hvm" prefixes and infixes from IOREQ related function names in the common code and performs a renaming where appropriate according to the more consistent new naming scheme: - IOREQ server functions should start with "ioreq_server_" - IOREQ functions should start with "ioreq_" A few function names are clarified to better fit into their purposes: handle_hvm_io_completion -> vcpu_ioreq_handle_completion hvm_io_pending -> vcpu_ioreq_pending hvm_ioreq_init -> ioreq_domain_init hvm_alloc_ioreq_mfn -> ioreq_server_alloc_mfn hvm_free_ioreq_mfn -> ioreq_server_free_mfn Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> CC: Julien Grall <julien.grall@arm.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes V1 -> V2: - new patch Changes V2 -> V3: - update patch according the "legacy interface" is x86 specific - update patch description - rename everything touched according to new naming scheme Changes V3 -> V4: - rebase - rename ioreq_update_evtchn() to ioreq_server_update_evtchn() - add Jan's R-b --- xen/arch/x86/hvm/dm.c | 4 +- xen/arch/x86/hvm/emulate.c | 6 +- xen/arch/x86/hvm/hvm.c | 10 +-- xen/arch/x86/hvm/io.c | 6 +- xen/arch/x86/hvm/ioreq.c | 2 +- xen/arch/x86/hvm/stdvga.c | 4 +- xen/arch/x86/hvm/vmx/vvmx.c | 2 +- xen/common/ioreq.c | 202 ++++++++++++++++++++++---------------------- xen/common/memory.c | 2 +- xen/include/xen/ioreq.h | 30 +++---- 10 files changed, 134 insertions(+), 134 deletions(-) diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/dm.c +++ b/xen/arch/x86/hvm/dm.c @@ -XXX,XX +XXX,XX @@ static int dm_op(const struct dmop_args *op_args) break; if ( first_gfn == 0 ) - rc = hvm_map_mem_type_to_ioreq_server(d, data->id, - data->type, data->flags); + rc = ioreq_server_map_mem_type(d, data->id, + data->type, data->flags); else rc = 0; diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -XXX,XX +XXX,XX @@ static int hvmemul_do_io( * an ioreq server that can handle it. * * Rules: - * A> PIO or MMIO accesses run through hvm_select_ioreq_server() to + * A> PIO or MMIO accesses run through ioreq_server_select() to * choose the ioreq server by range. If no server is found, the access * is ignored. * @@ -XXX,XX +XXX,XX @@ static int hvmemul_do_io( } if ( !s ) - s = hvm_select_ioreq_server(currd, &p); + s = ioreq_server_select(currd, &p); /* If there is no suitable backing DM, just ignore accesses */ if ( !s ) @@ -XXX,XX +XXX,XX @@ static int hvmemul_do_io( } else { - rc = hvm_send_ioreq(s, &p, 0); + rc = ioreq_send(s, &p, 0); if ( rc != X86EMUL_RETRY || currd->is_shutting_down ) vio->req.state = STATE_IOREQ_NONE; else if ( !ioreq_needs_completion(&vio->req) ) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -XXX,XX +XXX,XX @@ void hvm_do_resume(struct vcpu *v) pt_restore_timer(v); - if ( !handle_hvm_io_completion(v) ) + if ( !vcpu_ioreq_handle_completion(v) ) return; if ( unlikely(v->arch.vm_event) ) @@ -XXX,XX +XXX,XX @@ int hvm_domain_initialise(struct domain *d) register_g2m_portio_handler(d); register_vpci_portio_handler(d); - hvm_ioreq_init(d); + ioreq_domain_init(d); hvm_init_guest_time(d); @@ -XXX,XX +XXX,XX @@ void hvm_domain_relinquish_resources(struct domain *d) viridian_domain_deinit(d); - hvm_destroy_all_ioreq_servers(d); + ioreq_server_destroy_all(d); msixtbl_pt_cleanup(d); @@ -XXX,XX +XXX,XX @@ int hvm_vcpu_initialise(struct vcpu *v) if ( rc ) goto fail5; - rc = hvm_all_ioreq_servers_add_vcpu(d, v); + rc = ioreq_server_add_vcpu_all(d, v); if ( rc != 0 ) goto fail6; @@ -XXX,XX +XXX,XX @@ void hvm_vcpu_destroy(struct vcpu *v) { viridian_vcpu_deinit(v); - hvm_all_ioreq_servers_remove_vcpu(v->domain, v); + ioreq_server_remove_vcpu_all(v->domain, v); if ( hvm_altp2m_supported() ) altp2m_vcpu_destroy(v); diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/io.c +++ b/xen/arch/x86/hvm/io.c @@ -XXX,XX +XXX,XX @@ void send_timeoffset_req(unsigned long timeoff) if ( timeoff == 0 ) return; - if ( hvm_broadcast_ioreq(&p, true) != 0 ) + if ( ioreq_broadcast(&p, true) != 0 ) gprintk(XENLOG_ERR, "Unsuccessful timeoffset update\n"); } @@ -XXX,XX +XXX,XX @@ void send_invalidate_req(void) .data = ~0UL, /* flush all */ }; - if ( hvm_broadcast_ioreq(&p, false) != 0 ) + if ( ioreq_broadcast(&p, false) != 0 ) gprintk(XENLOG_ERR, "Unsuccessful map-cache invalidate\n"); } @@ -XXX,XX +XXX,XX @@ bool handle_pio(uint16_t port, unsigned int size, int dir) * We should not advance RIP/EIP if the domain is shutting down or * if X86EMUL_RETRY has been returned by an internal handler. */ - if ( curr->domain->is_shutting_down || !hvm_io_pending(curr) ) + if ( curr->domain->is_shutting_down || !vcpu_ioreq_pending(curr) ) return false; break; diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/ioreq.c +++ b/xen/arch/x86/hvm/ioreq.c @@ -XXX,XX +XXX,XX @@ static int hvm_map_ioreq_gfn(struct ioreq_server *s, bool buf) { /* * If a page has already been allocated (which will happen on - * demand if hvm_get_ioreq_server_frame() is called), then + * demand if ioreq_server_get_frame() is called), then * mapping a guest frame is not permitted. */ if ( gfn_eq(iorp->gfn, INVALID_GFN) ) diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/stdvga.c +++ b/xen/arch/x86/hvm/stdvga.c @@ -XXX,XX +XXX,XX @@ static int stdvga_mem_write(const struct hvm_io_handler *handler, } done: - srv = hvm_select_ioreq_server(current->domain, &p); + srv = ioreq_server_select(current->domain, &p); if ( !srv ) return X86EMUL_UNHANDLEABLE; - return hvm_send_ioreq(srv, &p, 1); + return ioreq_send(srv, &p, 1); } static bool_t stdvga_mem_accept(const struct hvm_io_handler *handler, diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -XXX,XX +XXX,XX @@ void nvmx_switch_guest(void) * don't want to continue as this setup is not implemented nor supported * as of right now. */ - if ( hvm_io_pending(v) ) + if ( vcpu_ioreq_pending(v) ) return; /* * a softirq may interrupt us between a virtual vmentry is diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/ioreq.c +++ b/xen/common/ioreq.c @@ -XXX,XX +XXX,XX @@ static struct ioreq_server *get_ioreq_server(const struct domain *d, * Iterate over all possible ioreq servers. * * NOTE: The iteration is backwards such that more recently created - * ioreq servers are favoured in hvm_select_ioreq_server(). + * ioreq servers are favoured in ioreq_server_select(). * This is a semantic that previously existed when ioreq servers * were held in a linked list. */ @@ -XXX,XX +XXX,XX @@ static struct ioreq_vcpu *get_pending_vcpu(const struct vcpu *v, return NULL; } -bool hvm_io_pending(struct vcpu *v) +bool vcpu_ioreq_pending(struct vcpu *v) { return get_pending_vcpu(v, NULL); } -static bool hvm_wait_for_io(struct ioreq_vcpu *sv, ioreq_t *p) +static bool wait_for_io(struct ioreq_vcpu *sv, ioreq_t *p) { unsigned int prev_state = STATE_IOREQ_NONE; unsigned int state = p->state; @@ -XXX,XX +XXX,XX @@ static bool hvm_wait_for_io(struct ioreq_vcpu *sv, ioreq_t *p) return true; } -bool handle_hvm_io_completion(struct vcpu *v) +bool vcpu_ioreq_handle_completion(struct vcpu *v) { struct domain *d = v->domain; struct vcpu_io *vio = &v->io; @@ -XXX,XX +XXX,XX @@ bool handle_hvm_io_completion(struct vcpu *v) } sv = get_pending_vcpu(v, &s); - if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) ) + if ( sv && !wait_for_io(sv, get_ioreq(s, v)) ) return false; vio->req.state = ioreq_needs_completion(&vio->req) ? @@ -XXX,XX +XXX,XX @@ bool handle_hvm_io_completion(struct vcpu *v) return true; } -static int hvm_alloc_ioreq_mfn(struct ioreq_server *s, bool buf) +static int ioreq_server_alloc_mfn(struct ioreq_server *s, bool buf) { struct ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; struct page_info *page; @@ -XXX,XX +XXX,XX @@ static int hvm_alloc_ioreq_mfn(struct ioreq_server *s, bool buf) { /* * If a guest frame has already been mapped (which may happen - * on demand if hvm_get_ioreq_server_info() is called), then + * on demand if ioreq_server_get_info() is called), then * allocating a page is not permitted. */ if ( !gfn_eq(iorp->gfn, INVALID_GFN) ) @@ -XXX,XX +XXX,XX @@ static int hvm_alloc_ioreq_mfn(struct ioreq_server *s, bool buf) return -ENOMEM; } -static void hvm_free_ioreq_mfn(struct ioreq_server *s, bool buf) +static void ioreq_server_free_mfn(struct ioreq_server *s, bool buf) { struct ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq; struct page_info *page = iorp->page; @@ -XXX,XX +XXX,XX @@ bool is_ioreq_server_page(struct domain *d, const struct page_info *page) return found; } -static void hvm_update_ioreq_evtchn(struct ioreq_server *s, - struct ioreq_vcpu *sv) +static void ioreq_server_update_evtchn(struct ioreq_server *s, + struct ioreq_vcpu *sv) { ASSERT(spin_is_locked(&s->lock)); @@ -XXX,XX +XXX,XX @@ static void hvm_update_ioreq_evtchn(struct ioreq_server *s, } } -static int hvm_ioreq_server_add_vcpu(struct ioreq_server *s, - struct vcpu *v) +static int ioreq_server_add_vcpu(struct ioreq_server *s, + struct vcpu *v) { struct ioreq_vcpu *sv; int rc; @@ -XXX,XX +XXX,XX @@ static int hvm_ioreq_server_add_vcpu(struct ioreq_server *s, list_add(&sv->list_entry, &s->ioreq_vcpu_list); if ( s->enabled ) - hvm_update_ioreq_evtchn(s, sv); + ioreq_server_update_evtchn(s, sv); spin_unlock(&s->lock); return 0; @@ -XXX,XX +XXX,XX @@ static int hvm_ioreq_server_add_vcpu(struct ioreq_server *s, return rc; } -static void hvm_ioreq_server_remove_vcpu(struct ioreq_server *s, - struct vcpu *v) +static void ioreq_server_remove_vcpu(struct ioreq_server *s, + struct vcpu *v) { struct ioreq_vcpu *sv; @@ -XXX,XX +XXX,XX @@ static void hvm_ioreq_server_remove_vcpu(struct ioreq_server *s, spin_unlock(&s->lock); } -static void hvm_ioreq_server_remove_all_vcpus(struct ioreq_server *s) +static void ioreq_server_remove_all_vcpus(struct ioreq_server *s) { struct ioreq_vcpu *sv, *next; @@ -XXX,XX +XXX,XX @@ static void hvm_ioreq_server_remove_all_vcpus(struct ioreq_server *s) spin_unlock(&s->lock); } -static int hvm_ioreq_server_alloc_pages(struct ioreq_server *s) +static int ioreq_server_alloc_pages(struct ioreq_server *s) { int rc; - rc = hvm_alloc_ioreq_mfn(s, false); + rc = ioreq_server_alloc_mfn(s, false); if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) ) - rc = hvm_alloc_ioreq_mfn(s, true); + rc = ioreq_server_alloc_mfn(s, true); if ( rc ) - hvm_free_ioreq_mfn(s, false); + ioreq_server_free_mfn(s, false); return rc; } -static void hvm_ioreq_server_free_pages(struct ioreq_server *s) +static void ioreq_server_free_pages(struct ioreq_server *s) { - hvm_free_ioreq_mfn(s, true); - hvm_free_ioreq_mfn(s, false); + ioreq_server_free_mfn(s, true); + ioreq_server_free_mfn(s, false); } -static void hvm_ioreq_server_free_rangesets(struct ioreq_server *s) +static void ioreq_server_free_rangesets(struct ioreq_server *s) { unsigned int i; @@ -XXX,XX +XXX,XX @@ static void hvm_ioreq_server_free_rangesets(struct ioreq_server *s) rangeset_destroy(s->range[i]); } -static int hvm_ioreq_server_alloc_rangesets(struct ioreq_server *s, - ioservid_t id) +static int ioreq_server_alloc_rangesets(struct ioreq_server *s, + ioservid_t id) { unsigned int i; int rc; @@ -XXX,XX +XXX,XX @@ static int hvm_ioreq_server_alloc_rangesets(struct ioreq_server *s, return 0; fail: - hvm_ioreq_server_free_rangesets(s); + ioreq_server_free_rangesets(s); return rc; } -static void hvm_ioreq_server_enable(struct ioreq_server *s) +static void ioreq_server_enable(struct ioreq_server *s) { struct ioreq_vcpu *sv; @@ -XXX,XX +XXX,XX @@ static void hvm_ioreq_server_enable(struct ioreq_server *s) list_for_each_entry ( sv, &s->ioreq_vcpu_list, list_entry ) - hvm_update_ioreq_evtchn(s, sv); + ioreq_server_update_evtchn(s, sv); done: spin_unlock(&s->lock); } -static void hvm_ioreq_server_disable(struct ioreq_server *s) +static void ioreq_server_disable(struct ioreq_server *s) { spin_lock(&s->lock); @@ -XXX,XX +XXX,XX @@ static void hvm_ioreq_server_disable(struct ioreq_server *s) spin_unlock(&s->lock); } -static int hvm_ioreq_server_init(struct ioreq_server *s, - struct domain *d, int bufioreq_handling, - ioservid_t id) +static int ioreq_server_init(struct ioreq_server *s, + struct domain *d, int bufioreq_handling, + ioservid_t id) { struct domain *currd = current->domain; struct vcpu *v; @@ -XXX,XX +XXX,XX @@ static int hvm_ioreq_server_init(struct ioreq_server *s, s->ioreq.gfn = INVALID_GFN; s->bufioreq.gfn = INVALID_GFN; - rc = hvm_ioreq_server_alloc_rangesets(s, id); + rc = ioreq_server_alloc_rangesets(s, id); if ( rc ) return rc; @@ -XXX,XX +XXX,XX @@ static int hvm_ioreq_server_init(struct ioreq_server *s, for_each_vcpu ( d, v ) { - rc = hvm_ioreq_server_add_vcpu(s, v); + rc = ioreq_server_add_vcpu(s, v); if ( rc ) goto fail_add; } @@ -XXX,XX +XXX,XX @@ static int hvm_ioreq_server_init(struct ioreq_server *s, return 0; fail_add: - hvm_ioreq_server_remove_all_vcpus(s); + ioreq_server_remove_all_vcpus(s); arch_ioreq_server_unmap_pages(s); - hvm_ioreq_server_free_rangesets(s); + ioreq_server_free_rangesets(s); put_domain(s->emulator); return rc; } -static void hvm_ioreq_server_deinit(struct ioreq_server *s) +static void ioreq_server_deinit(struct ioreq_server *s) { ASSERT(!s->enabled); - hvm_ioreq_server_remove_all_vcpus(s); + ioreq_server_remove_all_vcpus(s); /* * NOTE: It is safe to call both arch_ioreq_server_unmap_pages() and - * hvm_ioreq_server_free_pages() in that order. + * ioreq_server_free_pages() in that order. * This is because the former will do nothing if the pages * are not mapped, leaving the page to be freed by the latter. * However if the pages are mapped then the former will set @@ -XXX,XX +XXX,XX @@ static void hvm_ioreq_server_deinit(struct ioreq_server *s) * nothing. */ arch_ioreq_server_unmap_pages(s); - hvm_ioreq_server_free_pages(s); + ioreq_server_free_pages(s); - hvm_ioreq_server_free_rangesets(s); + ioreq_server_free_rangesets(s); put_domain(s->emulator); } -static int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, - ioservid_t *id) +static int ioreq_server_create(struct domain *d, int bufioreq_handling, + ioservid_t *id) { struct ioreq_server *s; unsigned int i; @@ -XXX,XX +XXX,XX @@ static int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, /* * It is safe to call set_ioreq_server() prior to - * hvm_ioreq_server_init() since the target domain is paused. + * ioreq_server_init() since the target domain is paused. */ set_ioreq_server(d, i, s); - rc = hvm_ioreq_server_init(s, d, bufioreq_handling, i); + rc = ioreq_server_init(s, d, bufioreq_handling, i); if ( rc ) { set_ioreq_server(d, i, NULL); @@ -XXX,XX +XXX,XX @@ static int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling, return rc; } -static int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id) +static int ioreq_server_destroy(struct domain *d, ioservid_t id) { struct ioreq_server *s; int rc; @@ -XXX,XX +XXX,XX @@ static int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id) arch_ioreq_server_destroy(s); - hvm_ioreq_server_disable(s); + ioreq_server_disable(s); /* - * It is safe to call hvm_ioreq_server_deinit() prior to + * It is safe to call ioreq_server_deinit() prior to * set_ioreq_server() since the target domain is paused. */ - hvm_ioreq_server_deinit(s); + ioreq_server_deinit(s); set_ioreq_server(d, id, NULL); domain_unpause(d); @@ -XXX,XX +XXX,XX @@ static int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id) return rc; } -static int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id, - unsigned long *ioreq_gfn, - unsigned long *bufioreq_gfn, - evtchn_port_t *bufioreq_port) +static int ioreq_server_get_info(struct domain *d, ioservid_t id, + unsigned long *ioreq_gfn, + unsigned long *bufioreq_gfn, + evtchn_port_t *bufioreq_port) { struct ioreq_server *s; int rc; @@ -XXX,XX +XXX,XX @@ static int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id, return rc; } -int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id, - unsigned long idx, mfn_t *mfn) +int ioreq_server_get_frame(struct domain *d, ioservid_t id, + unsigned long idx, mfn_t *mfn) { struct ioreq_server *s; int rc; @@ -XXX,XX +XXX,XX @@ int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id, if ( s->emulator != current->domain ) goto out; - rc = hvm_ioreq_server_alloc_pages(s); + rc = ioreq_server_alloc_pages(s); if ( rc ) goto out; @@ -XXX,XX +XXX,XX @@ int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id, return rc; } -static int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, - uint32_t type, uint64_t start, - uint64_t end) +static int ioreq_server_map_io_range(struct domain *d, ioservid_t id, + uint32_t type, uint64_t start, + uint64_t end) { struct ioreq_server *s; struct rangeset *r; @@ -XXX,XX +XXX,XX @@ static int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id, return rc; } -static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, - uint32_t type, uint64_t start, - uint64_t end) +static int ioreq_server_unmap_io_range(struct domain *d, ioservid_t id, + uint32_t type, uint64_t start, + uint64_t end) { struct ioreq_server *s; struct rangeset *r; @@ -XXX,XX +XXX,XX @@ static int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id, * Support for the emulation of read operations can be added when an ioreq * server has such requirement in the future. */ -int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, - uint32_t type, uint32_t flags) +int ioreq_server_map_mem_type(struct domain *d, ioservid_t id, + uint32_t type, uint32_t flags) { struct ioreq_server *s; int rc; @@ -XXX,XX +XXX,XX @@ int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, return rc; } -static int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, - bool enabled) +static int ioreq_server_set_state(struct domain *d, ioservid_t id, + bool enabled) { struct ioreq_server *s; int rc; @@ -XXX,XX +XXX,XX @@ static int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, domain_pause(d); if ( enabled ) - hvm_ioreq_server_enable(s); + ioreq_server_enable(s); else - hvm_ioreq_server_disable(s); + ioreq_server_disable(s); domain_unpause(d); @@ -XXX,XX +XXX,XX @@ static int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id, return rc; } -int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v) +int ioreq_server_add_vcpu_all(struct domain *d, struct vcpu *v) { struct ioreq_server *s; unsigned int id; @@ -XXX,XX +XXX,XX @@ int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v) FOR_EACH_IOREQ_SERVER(d, id, s) { - rc = hvm_ioreq_server_add_vcpu(s, v); + rc = ioreq_server_add_vcpu(s, v); if ( rc ) goto fail; } @@ -XXX,XX +XXX,XX @@ int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v) if ( !s ) continue; - hvm_ioreq_server_remove_vcpu(s, v); + ioreq_server_remove_vcpu(s, v); } spin_unlock_recursive(&d->ioreq_server.lock); @@ -XXX,XX +XXX,XX @@ int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v) return rc; } -void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v) +void ioreq_server_remove_vcpu_all(struct domain *d, struct vcpu *v) { struct ioreq_server *s; unsigned int id; @@ -XXX,XX +XXX,XX @@ void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v) spin_lock_recursive(&d->ioreq_server.lock); FOR_EACH_IOREQ_SERVER(d, id, s) - hvm_ioreq_server_remove_vcpu(s, v); + ioreq_server_remove_vcpu(s, v); spin_unlock_recursive(&d->ioreq_server.lock); } -void hvm_destroy_all_ioreq_servers(struct domain *d) +void ioreq_server_destroy_all(struct domain *d) { struct ioreq_server *s; unsigned int id; @@ -XXX,XX +XXX,XX @@ void hvm_destroy_all_ioreq_servers(struct domain *d) FOR_EACH_IOREQ_SERVER(d, id, s) { - hvm_ioreq_server_disable(s); + ioreq_server_disable(s); /* - * It is safe to call hvm_ioreq_server_deinit() prior to + * It is safe to call ioreq_server_deinit() prior to * set_ioreq_server() since the target domain is being destroyed. */ - hvm_ioreq_server_deinit(s); + ioreq_server_deinit(s); set_ioreq_server(d, id, NULL); xfree(s); @@ -XXX,XX +XXX,XX @@ void hvm_destroy_all_ioreq_servers(struct domain *d) spin_unlock_recursive(&d->ioreq_server.lock); } -struct ioreq_server *hvm_select_ioreq_server(struct domain *d, - ioreq_t *p) +struct ioreq_server *ioreq_server_select(struct domain *d, + ioreq_t *p) { struct ioreq_server *s; uint8_t type; @@ -XXX,XX +XXX,XX @@ struct ioreq_server *hvm_select_ioreq_server(struct domain *d, return NULL; } -static int hvm_send_buffered_ioreq(struct ioreq_server *s, ioreq_t *p) +static int ioreq_send_buffered(struct ioreq_server *s, ioreq_t *p) { struct domain *d = current->domain; struct ioreq_page *iorp; @@ -XXX,XX +XXX,XX @@ static int hvm_send_buffered_ioreq(struct ioreq_server *s, ioreq_t *p) return IOREQ_STATUS_HANDLED; } -int hvm_send_ioreq(struct ioreq_server *s, ioreq_t *proto_p, - bool buffered) +int ioreq_send(struct ioreq_server *s, ioreq_t *proto_p, + bool buffered) { struct vcpu *curr = current; struct domain *d = curr->domain; @@ -XXX,XX +XXX,XX @@ int hvm_send_ioreq(struct ioreq_server *s, ioreq_t *proto_p, ASSERT(s); if ( buffered ) - return hvm_send_buffered_ioreq(s, proto_p); + return ioreq_send_buffered(s, proto_p); if ( unlikely(!vcpu_start_shutdown_deferral(curr)) ) return IOREQ_STATUS_RETRY; @@ -XXX,XX +XXX,XX @@ int hvm_send_ioreq(struct ioreq_server *s, ioreq_t *proto_p, return IOREQ_STATUS_UNHANDLED; } -unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered) +unsigned int ioreq_broadcast(ioreq_t *p, bool buffered) { struct domain *d = current->domain; struct ioreq_server *s; @@ -XXX,XX +XXX,XX @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered) if ( !s->enabled ) continue; - if ( hvm_send_ioreq(s, p, buffered) == IOREQ_STATUS_UNHANDLED ) + if ( ioreq_send(s, p, buffered) == IOREQ_STATUS_UNHANDLED ) failed++; } return failed; } -void hvm_ioreq_init(struct domain *d) +void ioreq_domain_init(struct domain *d) { spin_lock_init(&d->ioreq_server.lock); @@ -XXX,XX +XXX,XX @@ int ioreq_server_dm_op(struct xen_dm_op *op, struct domain *d, bool *const_op) if ( data->pad[0] || data->pad[1] || data->pad[2] ) break; - rc = hvm_create_ioreq_server(d, data->handle_bufioreq, - &data->id); + rc = ioreq_server_create(d, data->handle_bufioreq, + &data->id); break; } @@ -XXX,XX +XXX,XX @@ int ioreq_server_dm_op(struct xen_dm_op *op, struct domain *d, bool *const_op) if ( data->flags & ~valid_flags ) break; - rc = hvm_get_ioreq_server_info(d, data->id, - (data->flags & XEN_DMOP_no_gfns) ? - NULL : (unsigned long *)&data->ioreq_gfn, - (data->flags & XEN_DMOP_no_gfns) ? - NULL : (unsigned long *)&data->bufioreq_gfn, - &data->bufioreq_port); + rc = ioreq_server_get_info(d, data->id, + (data->flags & XEN_DMOP_no_gfns) ? + NULL : (unsigned long *)&data->ioreq_gfn, + (data->flags & XEN_DMOP_no_gfns) ? + NULL : (unsigned long *)&data->bufioreq_gfn, + &data->bufioreq_port); break; } @@ -XXX,XX +XXX,XX @@ int ioreq_server_dm_op(struct xen_dm_op *op, struct domain *d, bool *const_op) if ( data->pad ) break; - rc = hvm_map_io_range_to_ioreq_server(d, data->id, data->type, - data->start, data->end); + rc = ioreq_server_map_io_range(d, data->id, data->type, + data->start, data->end); break; } @@ -XXX,XX +XXX,XX @@ int ioreq_server_dm_op(struct xen_dm_op *op, struct domain *d, bool *const_op) if ( data->pad ) break; - rc = hvm_unmap_io_range_from_ioreq_server(d, data->id, data->type, - data->start, data->end); + rc = ioreq_server_unmap_io_range(d, data->id, data->type, + data->start, data->end); break; } @@ -XXX,XX +XXX,XX @@ int ioreq_server_dm_op(struct xen_dm_op *op, struct domain *d, bool *const_op) if ( data->pad ) break; - rc = hvm_set_ioreq_server_state(d, data->id, !!data->enabled); + rc = ioreq_server_set_state(d, data->id, !!data->enabled); break; } @@ -XXX,XX +XXX,XX @@ int ioreq_server_dm_op(struct xen_dm_op *op, struct domain *d, bool *const_op) if ( data->pad ) break; - rc = hvm_destroy_ioreq_server(d, data->id); + rc = ioreq_server_destroy(d, data->id); break; } diff --git a/xen/common/memory.c b/xen/common/memory.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -XXX,XX +XXX,XX @@ static int acquire_ioreq_server(struct domain *d, { mfn_t mfn; - rc = hvm_get_ioreq_server_frame(d, id, frame + i, &mfn); + rc = ioreq_server_get_frame(d, id, frame + i, &mfn); if ( rc ) return rc; diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/ioreq.h +++ b/xen/include/xen/ioreq.h @@ -XXX,XX +XXX,XX @@ static inline bool ioreq_needs_completion(const ioreq_t *ioreq) #define HANDLE_BUFIOREQ(s) \ ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) -bool hvm_io_pending(struct vcpu *v); -bool handle_hvm_io_completion(struct vcpu *v); +bool vcpu_ioreq_pending(struct vcpu *v); +bool vcpu_ioreq_handle_completion(struct vcpu *v); bool is_ioreq_server_page(struct domain *d, const struct page_info *page); -int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id, - unsigned long idx, mfn_t *mfn); -int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id, - uint32_t type, uint32_t flags); +int ioreq_server_get_frame(struct domain *d, ioservid_t id, + unsigned long idx, mfn_t *mfn); +int ioreq_server_map_mem_type(struct domain *d, ioservid_t id, + uint32_t type, uint32_t flags); -int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v); -void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v); -void hvm_destroy_all_ioreq_servers(struct domain *d); +int ioreq_server_add_vcpu_all(struct domain *d, struct vcpu *v); +void ioreq_server_remove_vcpu_all(struct domain *d, struct vcpu *v); +void ioreq_server_destroy_all(struct domain *d); -struct ioreq_server *hvm_select_ioreq_server(struct domain *d, - ioreq_t *p); -int hvm_send_ioreq(struct ioreq_server *s, ioreq_t *proto_p, - bool buffered); -unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered); +struct ioreq_server *ioreq_server_select(struct domain *d, + ioreq_t *p); +int ioreq_send(struct ioreq_server *s, ioreq_t *proto_p, + bool buffered); +unsigned int ioreq_broadcast(ioreq_t *p, bool buffered); -void hvm_ioreq_init(struct domain *d); +void ioreq_domain_init(struct domain *d); int ioreq_server_dm_op(struct xen_dm_op *op, struct domain *d, bool *const_op); -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> The cmpxchg() in ioreq_send_buffered() operates on memory shared with the emulator domain (and the target domain if the legacy interface is used). In order to be on the safe side we need to switch to guest_cmpxchg64() to prevent a domain to DoS Xen on Arm. As there is no plan to support the legacy interface on Arm, we will have a page to be mapped in a single domain at the time, so we can use s->emulator in guest_cmpxchg64() safely. Thankfully the only user of the legacy interface is x86 so far and there is not concern regarding the atomics operations. Please note, that the legacy interface *must* not be used on Arm without revisiting the code. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> CC: Julien Grall <julien.grall@arm.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - new patch Changes V1 -> V2: - move earlier to avoid breaking arm32 compilation - add an explanation to commit description and hvm_allow_set_param() - pass s->emulator Changes V2 -> V3: - update patch description Changes V3 -> V4: - add Stefano's A-b - drop comment from arm/hvm.c --- xen/common/ioreq.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/ioreq.c +++ b/xen/common/ioreq.c @@ -XXX,XX +XXX,XX @@ #include <xen/trace.h> #include <xen/vpci.h> +#include <asm/guest_atomics.h> #include <asm/hvm/ioreq.h> #include <public/hvm/ioreq.h> @@ -XXX,XX +XXX,XX @@ static int ioreq_send_buffered(struct ioreq_server *s, ioreq_t *p) new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM; new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM; - cmpxchg(&pg->ptrs.full, old.full, new.full); + guest_cmpxchg64(s->emulator, &pg->ptrs.full, old.full, new.full); } notify_via_xen_event_channel(d, s->bufioreq_evtchn); -- 2.7.4
From: Julien Grall <julien.grall@arm.com> This patch adds basic IOREQ/DM support on Arm. The subsequent patches will improve functionality and add remaining bits. The IOREQ/DM features are supposed to be built with IOREQ_SERVER option enabled, which is disabled by default on Arm for now. Please note, the "PIO handling" TODO is expected to left unaddressed for the current series. It is not an big issue for now while Xen doesn't have support for vPCI on Arm. On Arm64 they are only used for PCI IO Bar and we would probably want to expose them to emulator as PIO access to make a DM completely arch-agnostic. So "PIO handling" should be implemented when we add support for vPCI. Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - was split into: - arm/ioreq: Introduce arch specific bits for IOREQ/DM features - xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm - update patch description - update asm-arm/hvm/ioreq.h according to the newly introduced arch functions: - arch_hvm_destroy_ioreq_server() - arch_handle_hvm_io_completion() - update arch files to include xen/ioreq.h - remove HVMOP plumbing - rewrite a logic to handle properly case when hvm_send_ioreq() returns IO_RETRY - add a logic to handle properly handle_hvm_io_completion() return value - rename handle_mmio() to ioreq_handle_complete_mmio() - move paging_mark_pfn_dirty() to asm-arm/paging.h - remove forward declaration for hvm_ioreq_server in asm-arm/paging.h - move try_fwd_ioserv() to ioreq.c, provide stubs if !CONFIG_IOREQ_SERVER - do not remove #ifdef CONFIG_IOREQ_SERVER in memory.c for guarding xen/ioreq.h - use gdprintk in try_fwd_ioserv(), remove unneeded prints - update list of #include-s - move has_vpci() to asm-arm/domain.h - add a comment (TODO) to unimplemented yet handle_pio() - remove hvm_mmio_first(last)_byte() and hvm_ioreq_(page/vcpu/server) structs from the arch files, they were already moved to the common code - remove set_foreign_p2m_entry() changes, they will be properly implemented in the follow-up patch - select IOREQ_SERVER for Arm instead of Arm64 in Kconfig - remove x86's realmode and other unneeded stubs from xen/ioreq.h - clafify ioreq_t p.df usage in try_fwd_ioserv() - set ioreq_t p.count to 1 in try_fwd_ioserv() Changes V1 -> V2: - was split into: - arm/ioreq: Introduce arch specific bits for IOREQ/DM features - xen/arm: Stick around in leave_hypervisor_to_guest until I/O has completed - update the author of a patch - update patch description - move a loop in leave_hypervisor_to_guest() to a separate patch - set IOREQ_SERVER disabled by default - remove already clarified /* XXX */ - replace BUG() by ASSERT_UNREACHABLE() in handle_pio() - remove default case for handling the return value of try_handle_mmio() - remove struct hvm_domain, enum hvm_io_completion, struct hvm_vcpu_io, struct hvm_vcpu from asm-arm/domain.h, these are common materials now - update everything according to the recent changes (IOREQ related function names don't contain "hvm" prefixes/infixes anymore, IOREQ related fields are part of common struct vcpu/domain now, etc) Changes V2 -> V3: - update patch according the "legacy interface" is x86 specific - add dummy arch hooks - remove dummy paging_mark_pfn_dirty() - don’t include <xen/domain_page.h> in common ioreq.c - don’t include <public/hvm/ioreq.h> in arch ioreq.h - remove #define ioreq_params(d, i) Changes V3 -> V4: - rebase - update patch according to the renaming IO_ -> VIO_ (io_ -> vio_) and misc changes to arch hooks - update patch according to the IOREQ related dm-op handling changes - don't include <xen/ioreq.h> from arch header - make all arch hooks out-of-line - add a comment above IOREQ_STATUS_* #define-s --- xen/arch/arm/Makefile | 2 + xen/arch/arm/dm.c | 122 +++++++++++++++++++++++ xen/arch/arm/domain.c | 9 ++ xen/arch/arm/io.c | 12 ++- xen/arch/arm/ioreq.c | 213 ++++++++++++++++++++++++++++++++++++++++ xen/arch/arm/traps.c | 13 +++ xen/include/asm-arm/domain.h | 3 + xen/include/asm-arm/hvm/ioreq.h | 72 ++++++++++++++ xen/include/asm-arm/mmio.h | 1 + 9 files changed, 446 insertions(+), 1 deletion(-) create mode 100644 xen/arch/arm/dm.c create mode 100644 xen/arch/arm/ioreq.c create mode 100644 xen/include/asm-arm/hvm/ioreq.h diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/Makefile +++ b/xen/arch/arm/Makefile @@ -XXX,XX +XXX,XX @@ obj-y += cpuerrata.o obj-y += cpufeature.o obj-y += decode.o obj-y += device.o +obj-$(CONFIG_IOREQ_SERVER) += dm.o obj-y += domain.o obj-y += domain_build.init.o obj-y += domctl.o @@ -XXX,XX +XXX,XX @@ obj-y += guest_atomics.o obj-y += guest_walk.o obj-y += hvm.o obj-y += io.o +obj-$(CONFIG_IOREQ_SERVER) += ioreq.o obj-y += irq.o obj-y += kernel.init.o obj-$(CONFIG_LIVEPATCH) += livepatch.o diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/xen/arch/arm/dm.c @@ -XXX,XX +XXX,XX @@ +/* + * Copyright (c) 2019 Arm ltd. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; If not, see <http://www.gnu.org/licenses/>. + */ + +#include <xen/dm.h> +#include <xen/guest_access.h> +#include <xen/hypercall.h> +#include <xen/ioreq.h> +#include <xen/nospec.h> + +static int dm_op(const struct dmop_args *op_args) +{ + struct domain *d; + struct xen_dm_op op; + bool const_op = true; + long rc; + size_t offset; + + static const uint8_t op_size[] = { + [XEN_DMOP_create_ioreq_server] = sizeof(struct xen_dm_op_create_ioreq_server), + [XEN_DMOP_get_ioreq_server_info] = sizeof(struct xen_dm_op_get_ioreq_server_info), + [XEN_DMOP_map_io_range_to_ioreq_server] = sizeof(struct xen_dm_op_ioreq_server_range), + [XEN_DMOP_unmap_io_range_from_ioreq_server] = sizeof(struct xen_dm_op_ioreq_server_range), + [XEN_DMOP_set_ioreq_server_state] = sizeof(struct xen_dm_op_set_ioreq_server_state), + [XEN_DMOP_destroy_ioreq_server] = sizeof(struct xen_dm_op_destroy_ioreq_server), + }; + + rc = rcu_lock_remote_domain_by_id(op_args->domid, &d); + if ( rc ) + return rc; + + rc = xsm_dm_op(XSM_DM_PRIV, d); + if ( rc ) + goto out; + + offset = offsetof(struct xen_dm_op, u); + + rc = -EFAULT; + if ( op_args->buf[0].size < offset ) + goto out; + + if ( copy_from_guest_offset((void *)&op, op_args->buf[0].h, 0, offset) ) + goto out; + + if ( op.op >= ARRAY_SIZE(op_size) ) + { + rc = -EOPNOTSUPP; + goto out; + } + + op.op = array_index_nospec(op.op, ARRAY_SIZE(op_size)); + + if ( op_args->buf[0].size < offset + op_size[op.op] ) + goto out; + + if ( copy_from_guest_offset((void *)&op.u, op_args->buf[0].h, offset, + op_size[op.op]) ) + goto out; + + rc = -EINVAL; + if ( op.pad ) + goto out; + + rc = ioreq_server_dm_op(&op, d, &const_op); + + if ( (!rc || rc == -ERESTART) && + !const_op && copy_to_guest_offset(op_args->buf[0].h, offset, + (void *)&op.u, op_size[op.op]) ) + rc = -EFAULT; + + out: + rcu_unlock_domain(d); + + return rc; +} + +long do_dm_op(domid_t domid, + unsigned int nr_bufs, + XEN_GUEST_HANDLE_PARAM(xen_dm_op_buf_t) bufs) +{ + struct dmop_args args; + int rc; + + if ( nr_bufs > ARRAY_SIZE(args.buf) ) + return -E2BIG; + + args.domid = domid; + args.nr_bufs = array_index_nospec(nr_bufs, ARRAY_SIZE(args.buf) + 1); + + if ( copy_from_guest_offset(&args.buf[0], bufs, 0, args.nr_bufs) ) + return -EFAULT; + + rc = dm_op(&args); + + if ( rc == -ERESTART ) + rc = hypercall_create_continuation(__HYPERVISOR_dm_op, "iih", + domid, nr_bufs, bufs); + + return rc; +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -XXX,XX +XXX,XX @@ #include <xen/guest_access.h> #include <xen/hypercall.h> #include <xen/init.h> +#include <xen/ioreq.h> #include <xen/lib.h> #include <xen/livepatch.h> #include <xen/sched.h> @@ -XXX,XX +XXX,XX @@ int arch_domain_create(struct domain *d, ASSERT(config != NULL); +#ifdef CONFIG_IOREQ_SERVER + ioreq_domain_init(d); +#endif + /* p2m_init relies on some value initialized by the IOMMU subsystem */ if ( (rc = iommu_domain_init(d, config->iommu_opts)) != 0 ) goto fail; @@ -XXX,XX +XXX,XX @@ int domain_relinquish_resources(struct domain *d) if (ret ) return ret; +#ifdef CONFIG_IOREQ_SERVER + ioreq_server_destroy_all(d); +#endif + PROGRESS(xen): ret = relinquish_memory(d, &d->xenpage_list); if ( ret ) diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/io.c +++ b/xen/arch/arm/io.c @@ -XXX,XX +XXX,XX @@ * GNU General Public License for more details. */ +#include <xen/ioreq.h> #include <xen/lib.h> #include <xen/spinlock.h> #include <xen/sched.h> @@ -XXX,XX +XXX,XX @@ #include <asm/cpuerrata.h> #include <asm/current.h> #include <asm/mmio.h> +#include <asm/hvm/ioreq.h> #include "decode.h" @@ -XXX,XX +XXX,XX @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs, handler = find_mmio_handler(v->domain, info.gpa); if ( !handler ) - return IO_UNHANDLED; + { + int rc; + + rc = try_fwd_ioserv(regs, v, &info); + if ( rc == IO_HANDLED ) + return handle_ioserv(regs, v); + + return rc; + } /* All the instructions used on emulated MMIO region should be valid */ if ( !dabt.valid ) diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/xen/arch/arm/ioreq.c @@ -XXX,XX +XXX,XX @@ +/* + * arm/ioreq.c: hardware virtual machine I/O emulation + * + * Copyright (c) 2019 Arm ltd. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; If not, see <http://www.gnu.org/licenses/>. + */ + +#include <xen/domain.h> +#include <xen/ioreq.h> + +#include <asm/traps.h> + +#include <public/hvm/ioreq.h> + +enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v) +{ + const union hsr hsr = { .bits = regs->hsr }; + const struct hsr_dabt dabt = hsr.dabt; + /* Code is similar to handle_read */ + uint8_t size = (1 << dabt.size) * 8; + register_t r = v->io.req.data; + + /* We are done with the IO */ + v->io.req.state = STATE_IOREQ_NONE; + + if ( dabt.write ) + return IO_HANDLED; + + /* + * Sign extend if required. + * Note that we expect the read handler to have zeroed the bits + * outside the requested access size. + */ + if ( dabt.sign && (r & (1UL << (size - 1))) ) + { + /* + * We are relying on register_t using the same as + * an unsigned long in order to keep the 32-bit assembly + * code smaller. + */ + BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long)); + r |= (~0UL) << size; + } + + set_user_reg(regs, dabt.reg, r); + + return IO_HANDLED; +} + +enum io_state try_fwd_ioserv(struct cpu_user_regs *regs, + struct vcpu *v, mmio_info_t *info) +{ + struct vcpu_io *vio = &v->io; + ioreq_t p = { + .type = IOREQ_TYPE_COPY, + .addr = info->gpa, + .size = 1 << info->dabt.size, + .count = 1, + .dir = !info->dabt.write, + /* + * On x86, df is used by 'rep' instruction to tell the direction + * to iterate (forward or backward). + * On Arm, all the accesses to MMIO region will do a single + * memory access. So for now, we can safely always set to 0. + */ + .df = 0, + .data = get_user_reg(regs, info->dabt.reg), + .state = STATE_IOREQ_READY, + }; + struct ioreq_server *s = NULL; + enum io_state rc; + + switch ( vio->req.state ) + { + case STATE_IOREQ_NONE: + break; + + case STATE_IORESP_READY: + return IO_HANDLED; + + default: + gdprintk(XENLOG_ERR, "wrong state %u\n", vio->req.state); + return IO_ABORT; + } + + s = ioreq_server_select(v->domain, &p); + if ( !s ) + return IO_UNHANDLED; + + if ( !info->dabt.valid ) + return IO_ABORT; + + vio->req = p; + + rc = ioreq_send(s, &p, 0); + if ( rc != IO_RETRY || v->domain->is_shutting_down ) + vio->req.state = STATE_IOREQ_NONE; + else if ( !ioreq_needs_completion(&vio->req) ) + rc = IO_HANDLED; + else + vio->completion = VIO_mmio_completion; + + return rc; +} + +bool arch_ioreq_complete_mmio(void) +{ + struct vcpu *v = current; + struct cpu_user_regs *regs = guest_cpu_user_regs(); + const union hsr hsr = { .bits = regs->hsr }; + paddr_t addr = v->io.req.addr; + + if ( try_handle_mmio(regs, hsr, addr) == IO_HANDLED ) + { + advance_pc(regs, hsr); + return true; + } + + return false; +} + +bool arch_vcpu_ioreq_completion(enum vio_completion completion) +{ + ASSERT_UNREACHABLE(); + return true; +} + +/* + * The "legacy" mechanism of mapping magic pages for the IOREQ servers + * is x86 specific, so the following hooks don't need to be implemented on Arm: + * - arch_ioreq_server_map_pages + * - arch_ioreq_server_unmap_pages + * - arch_ioreq_server_enable + * - arch_ioreq_server_disable + */ +int arch_ioreq_server_map_pages(struct ioreq_server *s) +{ + return -EOPNOTSUPP; +} + +void arch_ioreq_server_unmap_pages(struct ioreq_server *s) +{ +} + +void arch_ioreq_server_enable(struct ioreq_server *s) +{ +} + +void arch_ioreq_server_disable(struct ioreq_server *s) +{ +} + +void arch_ioreq_server_destroy(struct ioreq_server *s) +{ +} + +int arch_ioreq_server_map_mem_type(struct domain *d, + struct ioreq_server *s, + uint32_t flags) +{ + return -EOPNOTSUPP; +} + +void arch_ioreq_server_map_mem_type_completed(struct domain *d, + struct ioreq_server *s, + uint32_t flags) +{ +} + +bool arch_ioreq_server_destroy_all(struct domain *d) +{ + return true; +} + +bool arch_ioreq_server_get_type_addr(const struct domain *d, + const ioreq_t *p, + uint8_t *type, + uint64_t *addr) +{ + if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO ) + return false; + + *type = (p->type == IOREQ_TYPE_PIO) ? + XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY; + *addr = p->addr; + + return true; +} + +void arch_ioreq_domain_init(struct domain *d) +{ +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -XXX,XX +XXX,XX @@ #include <xen/hypercall.h> #include <xen/init.h> #include <xen/iocap.h> +#include <xen/ioreq.h> #include <xen/irq.h> #include <xen/lib.h> #include <xen/mem_access.h> @@ -XXX,XX +XXX,XX @@ static arm_hypercall_t arm_hypercall_table[] = { #ifdef CONFIG_HYPFS HYPERCALL(hypfs_op, 5), #endif +#ifdef CONFIG_IOREQ_SERVER + HYPERCALL(dm_op, 3), +#endif }; #ifndef NDEBUG @@ -XXX,XX +XXX,XX @@ static void do_trap_stage2_abort_guest(struct cpu_user_regs *regs, case IO_HANDLED: advance_pc(regs, hsr); return; + case IO_RETRY: + /* finish later */ + return; case IO_UNHANDLED: /* IO unhandled, try another way to handle it. */ break; @@ -XXX,XX +XXX,XX @@ static void check_for_vcpu_work(void) { struct vcpu *v = current; +#ifdef CONFIG_IOREQ_SERVER + local_irq_enable(); + vcpu_ioreq_handle_completion(v); + local_irq_disable(); +#endif + if ( likely(!v->arch.need_flush_to_ram) ) return; diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -XXX,XX +XXX,XX @@ #include <asm/gic.h> #include <asm/vgic.h> #include <asm/vpl011.h> +#include <public/hvm/dm_op.h> #include <public/hvm/params.h> struct hvm_domain @@ -XXX,XX +XXX,XX @@ static inline void arch_vcpu_block(struct vcpu *v) {} #define arch_vm_assist_valid_mask(d) (1UL << VMASST_TYPE_runstate_update_flag) +#define has_vpci(d) ({ (void)(d); false; }) + #endif /* __ASM_DOMAIN_H__ */ /* diff --git a/xen/include/asm-arm/hvm/ioreq.h b/xen/include/asm-arm/hvm/ioreq.h new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/xen/include/asm-arm/hvm/ioreq.h @@ -XXX,XX +XXX,XX @@ +/* + * hvm.h: Hardware virtual machine assist interface definitions. + * + * Copyright (c) 2016 Citrix Systems Inc. + * Copyright (c) 2019 Arm ltd. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef __ASM_ARM_HVM_IOREQ_H__ +#define __ASM_ARM_HVM_IOREQ_H__ + +#ifdef CONFIG_IOREQ_SERVER +enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v); +enum io_state try_fwd_ioserv(struct cpu_user_regs *regs, + struct vcpu *v, mmio_info_t *info); +#else +static inline enum io_state handle_ioserv(struct cpu_user_regs *regs, + struct vcpu *v) +{ + return IO_UNHANDLED; +} + +static inline enum io_state try_fwd_ioserv(struct cpu_user_regs *regs, + struct vcpu *v, mmio_info_t *info) +{ + return IO_UNHANDLED; +} +#endif + +bool ioreq_complete_mmio(void); + +static inline bool handle_pio(uint16_t port, unsigned int size, int dir) +{ + /* + * TODO: For Arm64, the main user will be PCI. So this should be + * implemented when we add support for vPCI. + */ + ASSERT_UNREACHABLE(); + return true; +} + +static inline void msix_write_completion(struct vcpu *v) +{ +} + +/* This correlation must not be altered */ +#define IOREQ_STATUS_HANDLED IO_HANDLED +#define IOREQ_STATUS_UNHANDLED IO_UNHANDLED +#define IOREQ_STATUS_RETRY IO_RETRY + +#endif /* __ASM_ARM_HVM_IOREQ_H__ */ + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/include/asm-arm/mmio.h b/xen/include/asm-arm/mmio.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-arm/mmio.h +++ b/xen/include/asm-arm/mmio.h @@ -XXX,XX +XXX,XX @@ enum io_state IO_ABORT, /* The IO was handled by the helper and led to an abort. */ IO_HANDLED, /* The IO was successfully handled by the helper. */ IO_UNHANDLED, /* The IO was not handled by the helper. */ + IO_RETRY, /* Retry the emulation for some reason */ }; typedef int (*mmio_read_t)(struct vcpu *v, mmio_info_t *info, -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> This patch adds proper handling of return value of vcpu_ioreq_handle_completion() which involves using a loop in leave_hypervisor_to_guest(). The reason to use an unbounded loop here is the fact that vCPU shouldn't continue until the I/O has completed. The IOREQ code is using wait_on_xen_event_channel(). Yet, this can still "exit" early if an event has been received. But this doesn't mean the I/O has completed (in can be just a spurious wake-up). So we need to check if the I/O has completed and wait again if it hasn't (we will block the vCPU again until an event is received). This loop makes sure that all the vCPU works are done before we return to the guest. The call chain below: check_for_vcpu_work -> vcpu_ioreq_handle_completion -> wait_for_io -> wait_on_xen_event_channel The worse that can happen here if the vCPU will never run again (the I/O will never complete). But, in Xen case, if the I/O never completes then it most likely means that something went horribly wrong with the Device Emulator. And it is most likely not safe to continue. So letting the vCPU to spin forever if the I/O never completes is a safer action than letting it continue and leaving the guest in unclear state and is the best what we can do for now. Please note, using this loop we will not spin forever on a pCPU, preventing any other vCPUs from being scheduled. At every loop we will call check_for_pcpu_work() that will process pending softirqs. In case of failure, the guest will crash and the vCPU will be unscheduled. In normal case, if the rescheduling is necessary (might be set by a timer or by a caller in check_for_vcpu_work(), where wait_for_io() is a preemption point) the vCPU will be rescheduled to give place to someone else. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> CC: Julien Grall <julien.grall@arm.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes V1 -> V2: - new patch, changes were derived from (+ new explanation): arm/ioreq: Introduce arch specific bits for IOREQ/DM features Changes V2 -> V3: - update patch description Changes V3 -> V4: - update patch description and comment in code --- xen/arch/arm/traps.c | 38 +++++++++++++++++++++++++++++++++----- 1 file changed, 33 insertions(+), 5 deletions(-) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -XXX,XX +XXX,XX @@ static void check_for_pcpu_work(void) * Process pending work for the vCPU. Any call should be fast or * implement preemption. */ -static void check_for_vcpu_work(void) +static bool check_for_vcpu_work(void) { struct vcpu *v = current; #ifdef CONFIG_IOREQ_SERVER + bool handled; + local_irq_enable(); - vcpu_ioreq_handle_completion(v); + handled = vcpu_ioreq_handle_completion(v); local_irq_disable(); + + if ( !handled ) + return true; #endif if ( likely(!v->arch.need_flush_to_ram) ) - return; + return false; /* * Give a chance for the pCPU to process work before handling the vCPU @@ -XXX,XX +XXX,XX @@ static void check_for_vcpu_work(void) local_irq_enable(); p2m_flush_vm(v); local_irq_disable(); + + return false; } /* @@ -XXX,XX +XXX,XX @@ void leave_hypervisor_to_guest(void) { local_irq_disable(); - check_for_vcpu_work(); - check_for_pcpu_work(); + /* + * The reason to use an unbounded loop here is the fact that vCPU + * shouldn't continue until the I/O has completed. + * + * The worse that can happen here if the vCPU will never run again + * (the I/O will never complete). But, in Xen case, if the I/O never + * completes then it most likely means that something went horribly + * wrong with the Device Emulator. And it is most likely not safe + * to continue. So letting the vCPU to spin forever if the I/O never + * completes is a safer action than letting it continue and leaving + * the guest in unclear state and is the best what we can do for now. + * + * Please note, using this loop we will not spin forever on a pCPU, + * preventing any other vCPUs from being scheduled. At every loop + * we will call check_for_pcpu_work() that will process pending + * softirqs. In case of failure, the guest will crash and the vCPU + * will be unscheduled. In normal case, if the rescheduling is necessary + * (might be set by a timer or by a caller in check_for_vcpu_work(), + * the vCPU will be rescheduled to give place to someone else. + */ + do { + check_for_pcpu_work(); + } while ( check_for_vcpu_work() ); vgic_sync_to_lrs(); -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> This patch implements reference counting of foreign entries in in set_foreign_p2m_entry() on Arm. This is a mandatory action if we want to run emulator (IOREQ server) in other than dom0 domain, as we can't trust it to do the right thing if it is not running in dom0. So we need to grab a reference on the page to avoid it disappearing. It is valid to always pass "p2m_map_foreign_rw" type to guest_physmap_add_entry() since the current and foreign domains would be always different. A case when they are equal would be rejected by rcu_lock_remote_domain_by_id(). Besides the similar comment in the code put a respective ASSERT() to catch incorrect usage in future. It was tested with IOREQ feature to confirm that all the pages given to this function belong to a domain, so we can use the same approach as for XENMAPSPACE_gmfn_foreign handling in xenmem_add_to_physmap_one(). This involves adding an extra parameter for the foreign domain to set_foreign_p2m_entry() and a helper to indicate whether the arch supports the reference counting of foreign entries and the restriction for the hardware domain in the common code can be skipped for it. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> CC: Julien Grall <julien.grall@arm.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - new patch, was split from: "[RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features" - rewrite a logic to handle properly reference in set_foreign_p2m_entry() instead of treating foreign entries as p2m_ram_rw Changes V1 -> V2: - rebase according to the recent changes to acquire_resource() - update patch description - introduce arch_refcounts_p2m() - add an explanation why p2m_map_foreign_rw is valid - move set_foreign_p2m_entry() to p2m-common.h - add const to new parameter Changes V2 -> V3: - update patch description - rename arch_refcounts_p2m() to arch_acquire_resource_check() - move comment to x86’s arch_acquire_resource_check() - return rc in Arm's set_foreign_p2m_entry() - put a respective ASSERT() into Arm's set_foreign_p2m_entry() Changes V3 -> V4: - update arch_acquire_resource_check() implementation on x86 and common code which uses it, pass struct domain to the function - put ASSERT() to x86/Arm set_foreign_p2m_entry() - use arch_acquire_resource_check() in p2m_add_foreign() instead of open-coding it --- xen/arch/arm/p2m.c | 26 ++++++++++++++++++++++++++ xen/arch/x86/mm/p2m.c | 9 ++++++--- xen/common/memory.c | 9 ++------- xen/include/asm-arm/p2m.h | 19 +++++++++---------- xen/include/asm-x86/p2m.h | 19 ++++++++++++++++--- xen/include/xen/p2m-common.h | 4 ++++ 6 files changed, 63 insertions(+), 23 deletions(-) diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -XXX,XX +XXX,XX @@ int guest_physmap_remove_page(struct domain *d, gfn_t gfn, mfn_t mfn, return p2m_remove_mapping(d, gfn, (1 << page_order), mfn); } +int set_foreign_p2m_entry(struct domain *d, const struct domain *fd, + unsigned long gfn, mfn_t mfn) +{ + struct page_info *page = mfn_to_page(mfn); + int rc; + + ASSERT(arch_acquire_resource_check(d)); + + if ( !get_page(page, fd) ) + return -EINVAL; + + /* + * It is valid to always use p2m_map_foreign_rw here as if this gets + * called then d != fd. A case when d == fd would be rejected by + * rcu_lock_remote_domain_by_id() earlier. Put a respective ASSERT() + * to catch incorrect usage in future. + */ + ASSERT(d != fd); + + rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_map_foreign_rw); + if ( rc ) + put_page(page); + + return rc; +} + static struct page_info *p2m_allocate_root(void) { struct page_info *page; diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -XXX,XX +XXX,XX @@ static int set_typed_p2m_entry(struct domain *d, unsigned long gfn_l, } /* Set foreign mfn in the given guest's p2m table. */ -int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn) +int set_foreign_p2m_entry(struct domain *d, const struct domain *fd, + unsigned long gfn, mfn_t mfn) { + ASSERT(arch_acquire_resource_check(d)); + return set_typed_p2m_entry(d, gfn, mfn, PAGE_ORDER_4K, p2m_map_foreign, p2m_get_hostp2m(d)->default_access); } @@ -XXX,XX +XXX,XX @@ static int p2m_add_foreign(struct domain *tdom, unsigned long fgfn, * hvm fixme: until support is added to p2m teardown code to cleanup any * foreign entries, limit this to hardware domain only. */ - if ( !is_hardware_domain(tdom) ) + if ( !arch_acquire_resource_check(tdom) ) return -EPERM; if ( foreigndom == DOMID_XEN ) @@ -XXX,XX +XXX,XX @@ static int p2m_add_foreign(struct domain *tdom, unsigned long fgfn, * will update the m2p table which will result in mfn -> gpfn of dom0 * and not fgfn of domU. */ - rc = set_foreign_p2m_entry(tdom, gpfn, mfn); + rc = set_foreign_p2m_entry(tdom, fdom, gpfn, mfn); if ( rc ) gdprintk(XENLOG_WARNING, "set_foreign_p2m_entry failed. " "gpfn:%lx mfn:%lx fgfn:%lx td:%d fd:%d\n", diff --git a/xen/common/memory.c b/xen/common/memory.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -XXX,XX +XXX,XX @@ static int acquire_resource( xen_pfn_t mfn_list[32]; int rc; - /* - * FIXME: Until foreign pages inserted into the P2M are properly - * reference counted, it is unsafe to allow mapping of - * resource pages unless the caller is the hardware domain. - */ - if ( paging_mode_translate(currd) && !is_hardware_domain(currd) ) + if ( !arch_acquire_resource_check(currd) ) return -EACCES; if ( copy_from_guest(&xmar, arg, 1) ) @@ -XXX,XX +XXX,XX @@ static int acquire_resource( for ( i = 0; !rc && i < xmar.nr_frames; i++ ) { - rc = set_foreign_p2m_entry(currd, gfn_list[i], + rc = set_foreign_p2m_entry(currd, d, gfn_list[i], _mfn(mfn_list[i])); /* rc should be -EIO for any iteration other than the first */ if ( rc && i ) diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-arm/p2m.h +++ b/xen/include/asm-arm/p2m.h @@ -XXX,XX +XXX,XX @@ typedef enum { #endif #include <xen/p2m-common.h> +static inline bool arch_acquire_resource_check(struct domain *d) +{ + /* + * The reference counting of foreign entries in set_foreign_p2m_entry() + * is supported on Arm. + */ + return true; +} + static inline void p2m_altp2m_check(struct vcpu *v, uint16_t idx) { @@ -XXX,XX +XXX,XX @@ static inline gfn_t gfn_next_boundary(gfn_t gfn, unsigned int order) return gfn_add(gfn, 1UL << order); } -static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, - mfn_t mfn) -{ - /* - * NOTE: If this is implemented then proper reference counting of - * foreign entries will need to be implemented. - */ - return -EOPNOTSUPP; -} - /* * A vCPU has cache enabled only when the MMU is enabled and data cache * is enabled. diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -XXX,XX +XXX,XX @@ struct p2m_domain { #endif #include <xen/p2m-common.h> +static inline bool arch_acquire_resource_check(struct domain *d) +{ + /* + * The reference counting of foreign entries in set_foreign_p2m_entry() + * is not supported for translated domains on x86. + * + * FIXME: Until foreign pages inserted into the P2M are properly + * reference counted, it is unsafe to allow mapping of + * resource pages unless the caller is the hardware domain. + */ + if ( paging_mode_translate(d) && !is_hardware_domain(d) ) + return false; + + return true; +} + /* * Updates vCPU's n2pm to match its np2m_base in VMCx12 and returns that np2m. */ @@ -XXX,XX +XXX,XX @@ int p2m_finish_type_change(struct domain *d, int p2m_is_logdirty_range(struct p2m_domain *, unsigned long start, unsigned long end); -/* Set foreign entry in the p2m table (for priv-mapping) */ -int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn); - /* Set mmio addresses in the p2m table (for pass-through) */ int set_mmio_p2m_entry(struct domain *d, gfn_t gfn, mfn_t mfn, unsigned int order); diff --git a/xen/include/xen/p2m-common.h b/xen/include/xen/p2m-common.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/p2m-common.h +++ b/xen/include/xen/p2m-common.h @@ -XXX,XX +XXX,XX @@ #include <xen/mm.h> +/* Set foreign entry in the p2m table */ +int set_foreign_p2m_entry(struct domain *d, const struct domain *fd, + unsigned long gfn, mfn_t mfn); + /* Remove a page from a domain's p2m table */ int __must_check guest_physmap_remove_page(struct domain *d, gfn_t gfn, mfn_t mfn, -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> This patch introduces a helper the main purpose of which is to check if a domain is using IOREQ server(s). On Arm the current benefit is to avoid calling vcpu_ioreq_handle_completion() (which implies iterating over all possible IOREQ servers anyway) on every return in leave_hypervisor_to_guest() if there is no active servers for the particular domain. Also this helper will be used by one of the subsequent patches on Arm. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> CC: Julien Grall <julien.grall@arm.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - new patch Changes V1 -> V2: - update patch description - guard helper with CONFIG_IOREQ_SERVER - remove "hvm" prefix - modify helper to just return d->arch.hvm.ioreq_server.nr_servers - put suitable ASSERT()s - use ASSERT(d->ioreq_server.server[id] ? !s : !!s) in set_ioreq_server() - remove d->ioreq_server.nr_servers = 0 from hvm_ioreq_init() Changes V2 -> V3: - update patch description - remove ASSERT()s from the helper, add a comment - use #ifdef CONFIG_IOREQ_SERVER inside function body - use new ASSERT() construction in set_ioreq_server() Changes V3 -> V4: - update patch description - drop per-domain variable "nr_servers" - reimplement a helper to count the non-NULL entries - make the helper out-of-line --- xen/arch/arm/traps.c | 15 +++++++++------ xen/common/ioreq.c | 16 ++++++++++++++++ xen/include/xen/ioreq.h | 2 ++ 3 files changed, 27 insertions(+), 6 deletions(-) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -XXX,XX +XXX,XX @@ static bool check_for_vcpu_work(void) struct vcpu *v = current; #ifdef CONFIG_IOREQ_SERVER - bool handled; + if ( domain_has_ioreq_server(v->domain) ) + { + bool handled; - local_irq_enable(); - handled = vcpu_ioreq_handle_completion(v); - local_irq_disable(); + local_irq_enable(); + handled = vcpu_ioreq_handle_completion(v); + local_irq_disable(); - if ( !handled ) - return true; + if ( !handled ) + return true; + } #endif if ( likely(!v->arch.need_flush_to_ram) ) diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/ioreq.c +++ b/xen/common/ioreq.c @@ -XXX,XX +XXX,XX @@ static ioreq_t *get_ioreq(struct ioreq_server *s, struct vcpu *v) return &p->vcpu_ioreq[v->vcpu_id]; } +/* + * This should only be used when d == current->domain or when they're + * distinct and d is paused. Otherwise the result is stale before + * the caller can inspect it. + */ +bool domain_has_ioreq_server(const struct domain *d) +{ + const struct ioreq_server *s; + unsigned int id; + + FOR_EACH_IOREQ_SERVER(d, id, s) + return true; + + return false; +} + static struct ioreq_vcpu *get_pending_vcpu(const struct vcpu *v, struct ioreq_server **srvp) { diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/ioreq.h +++ b/xen/include/xen/ioreq.h @@ -XXX,XX +XXX,XX @@ static inline bool ioreq_needs_completion(const ioreq_t *ioreq) #define HANDLE_BUFIOREQ(s) \ ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) +bool domain_has_ioreq_server(const struct domain *d); + bool vcpu_ioreq_pending(struct vcpu *v); bool vcpu_ioreq_handle_completion(struct vcpu *v); bool is_ioreq_server_page(struct domain *d, const struct page_info *page); -- 2.7.4
From: Julien Grall <julien.grall@arm.com> This patch adds ability to the device emulator to notify otherend (some entity running in the guest) using a SPI and implements Arm specific bits for it. Proposed interface allows emulator to set the logical level of a one of a domain's IRQ lines. We can't reuse the existing DM op (xen_dm_op_set_isa_irq_level) to inject an interrupt as the "isa_irq" field is only 8-bit and able to cover IRQ 0 - 255, whereas we need a wider range (0 - 1020). Please note, for egde-triggered interrupt (which is used for the virtio-mmio emulation) we only trigger the interrupt on Arm if the level is asserted (rising edge) and do nothing if the level is deasserted (falling edge), so the call could be named "trigger_irq" (without the level parameter). But, in order to model the line closely (to be able to support level-triggered interrupt) we need to know whether the line is low or high, so the proposed interface has been chosen. However, it is worth mentioning that in case of the level-triggered interrupt, we should keep injecting the interrupt to the guest until the line is deasserted (this is not covered by current patch). Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - check incoming parameters in arch_dm_op() - add explicit padding to struct xen_dm_op_set_irq_level Changes V1 -> V2: - update the author of a patch - update patch description - check that padding is always 0 - mention that interface is Arm only and only SPIs are supported for now - allow to set the logical level of a line for non-allocated interrupts only - add xen_dm_op_set_irq_level_t Changes V2 -> V3: - no changes Changes V3 -> V4: - update patch description - update patch according to the IOREQ related dm-op handling changes --- tools/include/xendevicemodel.h | 4 +++ tools/libs/devicemodel/core.c | 18 ++++++++++ tools/libs/devicemodel/libxendevicemodel.map | 1 + xen/arch/arm/dm.c | 54 +++++++++++++++++++++++++++- xen/include/public/hvm/dm_op.h | 16 +++++++++ 5 files changed, 92 insertions(+), 1 deletion(-) diff --git a/tools/include/xendevicemodel.h b/tools/include/xendevicemodel.h index XXXXXXX..XXXXXXX 100644 --- a/tools/include/xendevicemodel.h +++ b/tools/include/xendevicemodel.h @@ -XXX,XX +XXX,XX @@ int xendevicemodel_set_isa_irq_level( xendevicemodel_handle *dmod, domid_t domid, uint8_t irq, unsigned int level); +int xendevicemodel_set_irq_level( + xendevicemodel_handle *dmod, domid_t domid, unsigned int irq, + unsigned int level); + /** * This function maps a PCI INTx line to a an IRQ line. * diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/devicemodel/core.c +++ b/tools/libs/devicemodel/core.c @@ -XXX,XX +XXX,XX @@ int xendevicemodel_set_isa_irq_level( return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op)); } +int xendevicemodel_set_irq_level( + xendevicemodel_handle *dmod, domid_t domid, uint32_t irq, + unsigned int level) +{ + struct xen_dm_op op; + struct xen_dm_op_set_irq_level *data; + + memset(&op, 0, sizeof(op)); + + op.op = XEN_DMOP_set_irq_level; + data = &op.u.set_irq_level; + + data->irq = irq; + data->level = level; + + return xendevicemodel_op(dmod, domid, 1, &op, sizeof(op)); +} + int xendevicemodel_set_pci_link_route( xendevicemodel_handle *dmod, domid_t domid, uint8_t link, uint8_t irq) { diff --git a/tools/libs/devicemodel/libxendevicemodel.map b/tools/libs/devicemodel/libxendevicemodel.map index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/devicemodel/libxendevicemodel.map +++ b/tools/libs/devicemodel/libxendevicemodel.map @@ -XXX,XX +XXX,XX @@ VERS_1.2 { global: xendevicemodel_relocate_memory; xendevicemodel_pin_memory_cacheattr; + xendevicemodel_set_irq_level; } VERS_1.1; VERS_1.3 { diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/dm.c +++ b/xen/arch/arm/dm.c @@ -XXX,XX +XXX,XX @@ #include <xen/ioreq.h> #include <xen/nospec.h> +#include <asm/vgic.h> + static int dm_op(const struct dmop_args *op_args) { struct domain *d; @@ -XXX,XX +XXX,XX @@ static int dm_op(const struct dmop_args *op_args) [XEN_DMOP_unmap_io_range_from_ioreq_server] = sizeof(struct xen_dm_op_ioreq_server_range), [XEN_DMOP_set_ioreq_server_state] = sizeof(struct xen_dm_op_set_ioreq_server_state), [XEN_DMOP_destroy_ioreq_server] = sizeof(struct xen_dm_op_destroy_ioreq_server), + [XEN_DMOP_set_irq_level] = sizeof(struct xen_dm_op_set_irq_level), }; rc = rcu_lock_remote_domain_by_id(op_args->domid, &d); @@ -XXX,XX +XXX,XX @@ static int dm_op(const struct dmop_args *op_args) if ( op.pad ) goto out; - rc = ioreq_server_dm_op(&op, d, &const_op); + switch ( op.op ) + { + case XEN_DMOP_set_irq_level: + { + const struct xen_dm_op_set_irq_level *data = + &op.u.set_irq_level; + unsigned int i; + + /* Only SPIs are supported */ + if ( (data->irq < NR_LOCAL_IRQS) || (data->irq >= vgic_num_irqs(d)) ) + { + rc = -EINVAL; + break; + } + + if ( data->level != 0 && data->level != 1 ) + { + rc = -EINVAL; + break; + } + + /* Check that padding is always 0 */ + for ( i = 0; i < sizeof(data->pad); i++ ) + { + if ( data->pad[i] ) + { + rc = -EINVAL; + break; + } + } + + /* + * Allow to set the logical level of a line for non-allocated + * interrupts only. + */ + if ( test_bit(data->irq, d->arch.vgic.allocated_irqs) ) + { + rc = -EINVAL; + break; + } + + vgic_inject_irq(d, NULL, data->irq, data->level); + rc = 0; + break; + } + + default: + rc = ioreq_server_dm_op(&op, d, &const_op); + break; + } if ( (!rc || rc == -ERESTART) && !const_op && copy_to_guest_offset(op_args->buf[0].h, offset, diff --git a/xen/include/public/hvm/dm_op.h b/xen/include/public/hvm/dm_op.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/public/hvm/dm_op.h +++ b/xen/include/public/hvm/dm_op.h @@ -XXX,XX +XXX,XX @@ struct xen_dm_op_pin_memory_cacheattr { }; typedef struct xen_dm_op_pin_memory_cacheattr xen_dm_op_pin_memory_cacheattr_t; +/* + * XEN_DMOP_set_irq_level: Set the logical level of a one of a domain's + * IRQ lines (currently Arm only). + * Only SPIs are supported. + */ +#define XEN_DMOP_set_irq_level 19 + +struct xen_dm_op_set_irq_level { + uint32_t irq; + /* IN - Level: 0 -> deasserted, 1 -> asserted */ + uint8_t level; + uint8_t pad[3]; +}; +typedef struct xen_dm_op_set_irq_level xen_dm_op_set_irq_level_t; + struct xen_dm_op { uint32_t op; uint32_t pad; @@ -XXX,XX +XXX,XX @@ struct xen_dm_op { xen_dm_op_track_dirty_vram_t track_dirty_vram; xen_dm_op_set_pci_intx_level_t set_pci_intx_level; xen_dm_op_set_isa_irq_level_t set_isa_irq_level; + xen_dm_op_set_irq_level_t set_irq_level; xen_dm_op_set_pci_link_route_t set_pci_link_route; xen_dm_op_modified_memory_t modified_memory; xen_dm_op_set_mem_type_t set_mem_type; -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> In order to avoid code duplication (both handle_read() and handle_ioserv() contain the same code for the sign-extension) put this code to a common helper to be used for both. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> CC: Julien Grall <julien.grall@arm.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes V1 -> V2: - new patch Changes V2 -> V3: - no changes Changes V3 -> V4: - no changes here, but in new patch: "xen/arm: io: Harden sign extension check" --- xen/arch/arm/io.c | 18 ++---------------- xen/arch/arm/ioreq.c | 17 +---------------- xen/include/asm-arm/traps.h | 24 ++++++++++++++++++++++++ 3 files changed, 27 insertions(+), 32 deletions(-) diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/io.c +++ b/xen/arch/arm/io.c @@ -XXX,XX +XXX,XX @@ #include <asm/cpuerrata.h> #include <asm/current.h> #include <asm/mmio.h> +#include <asm/traps.h> #include <asm/hvm/ioreq.h> #include "decode.h" @@ -XXX,XX +XXX,XX @@ static enum io_state handle_read(const struct mmio_handler *handler, * setting r). */ register_t r = 0; - uint8_t size = (1 << dabt.size) * 8; if ( !handler->ops->read(v, info, &r, handler->priv) ) return IO_ABORT; - /* - * Sign extend if required. - * Note that we expect the read handler to have zeroed the bits - * outside the requested access size. - */ - if ( dabt.sign && (r & (1UL << (size - 1))) ) - { - /* - * We are relying on register_t using the same as - * an unsigned long in order to keep the 32-bit assembly - * code smaller. - */ - BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long)); - r |= (~0UL) << size; - } + r = sign_extend(dabt, r); set_user_reg(regs, dabt.reg, r); diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/ioreq.c +++ b/xen/arch/arm/ioreq.c @@ -XXX,XX +XXX,XX @@ enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v) const union hsr hsr = { .bits = regs->hsr }; const struct hsr_dabt dabt = hsr.dabt; /* Code is similar to handle_read */ - uint8_t size = (1 << dabt.size) * 8; register_t r = v->io.req.data; /* We are done with the IO */ @@ -XXX,XX +XXX,XX @@ enum io_state handle_ioserv(struct cpu_user_regs *regs, struct vcpu *v) if ( dabt.write ) return IO_HANDLED; - /* - * Sign extend if required. - * Note that we expect the read handler to have zeroed the bits - * outside the requested access size. - */ - if ( dabt.sign && (r & (1UL << (size - 1))) ) - { - /* - * We are relying on register_t using the same as - * an unsigned long in order to keep the 32-bit assembly - * code smaller. - */ - BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long)); - r |= (~0UL) << size; - } + r = sign_extend(dabt, r); set_user_reg(regs, dabt.reg, r); diff --git a/xen/include/asm-arm/traps.h b/xen/include/asm-arm/traps.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-arm/traps.h +++ b/xen/include/asm-arm/traps.h @@ -XXX,XX +XXX,XX @@ static inline bool VABORT_GEN_BY_GUEST(const struct cpu_user_regs *regs) (unsigned long)abort_guest_exit_end == regs->pc; } +/* Check whether the sign extension is required and perform it */ +static inline register_t sign_extend(const struct hsr_dabt dabt, register_t r) +{ + uint8_t size = (1 << dabt.size) * 8; + + /* + * Sign extend if required. + * Note that we expect the read handler to have zeroed the bits + * outside the requested access size. + */ + if ( dabt.sign && (r & (1UL << (size - 1))) ) + { + /* + * We are relying on register_t using the same as + * an unsigned long in order to keep the 32-bit assembly + * code smaller. + */ + BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long)); + r |= (~0UL) << size; + } + + return r; +} + #endif /* __ASM_ARM_TRAPS__ */ /* * Local variables: -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> In the ideal world we would never get an undefined behavior when propagating the sign bit since that bit can only be set for access size smaller than the register size (i.e byte/half-word for aarch32, byte/half-word/word for aarch64). In the real world we need to care for *possible* hardware bug such as advertising a sign extension for either 64-bit (or 32-bit) on Arm64 (resp. Arm32). So harden a bit more the code to prevent undefined behavior when propagating the sign bit in case of buggy hardware. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> CC: Julien Grall <julien.grall@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes V3 -> V4: - new patch --- xen/include/asm-arm/traps.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/xen/include/asm-arm/traps.h b/xen/include/asm-arm/traps.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-arm/traps.h +++ b/xen/include/asm-arm/traps.h @@ -XXX,XX +XXX,XX @@ static inline register_t sign_extend(const struct hsr_dabt dabt, register_t r) * Note that we expect the read handler to have zeroed the bits * outside the requested access size. */ - if ( dabt.sign && (r & (1UL << (size - 1))) ) + if ( dabt.sign && (size < sizeof(register_t) * 8) && + (r & (1UL << (size - 1))) ) { /* * We are relying on register_t using the same as -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> As the IOREQ is a common feature now and we also need to invalidate qemu/demu mapcache on Arm when the required condition occurs this patch moves this function to the common code (and remames it to ioreq_signal_mapcache_invalidate). This patch also moves per-domain qemu_mapcache_invalidate variable out of the arch sub-struct (and drops "qemu" prefix). We don't put this variable inside the #ifdef CONFIG_IOREQ_SERVER at the end of struct domain, but in the hole next to the group of 5 bools further up which is more efficient. The subsequent patch will add mapcache invalidation handling on Arm. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> CC: Julien Grall <julien.grall@arm.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - move send_invalidate_req() to the common code - update patch subject/description - move qemu_mapcache_invalidate out of the arch sub-struct, update checks - remove #if defined(CONFIG_ARM64) from the common code Changes V1 -> V2: - was split into: - xen/ioreq: Make x86's send_invalidate_req() common - xen/arm: Add mapcache invalidation handling - update patch description/subject - move Arm bits to a separate patch - don't alter the common code, the flag is set by arch code - rename send_invalidate_req() to send_invalidate_ioreq() - guard qemu_mapcache_invalidate with CONFIG_IOREQ_SERVER - use bool instead of bool_t - remove blank line blank line between head comment and #include-s Changes V2 -> V3: - update patch description - drop "qemu" prefix from the variable name - rename send_invalidate_req() to ioreq_signal_mapcache_invalidate() Changes V3 -> V4: - change variable location in struct domain --- xen/arch/x86/hvm/hypercall.c | 9 +++++---- xen/arch/x86/hvm/io.c | 14 -------------- xen/common/ioreq.c | 14 ++++++++++++++ xen/include/asm-x86/hvm/domain.h | 1 - xen/include/asm-x86/hvm/io.h | 1 - xen/include/xen/ioreq.h | 1 + xen/include/xen/sched.h | 5 +++++ 7 files changed, 25 insertions(+), 20 deletions(-) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -XXX,XX +XXX,XX @@ */ #include <xen/lib.h> #include <xen/hypercall.h> +#include <xen/ioreq.h> #include <xen/nospec.h> #include <asm/hvm/emulate.h> @@ -XXX,XX +XXX,XX @@ static long hvm_memory_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) rc = compat_memory_op(cmd, arg); if ( (cmd & MEMOP_CMD_MASK) == XENMEM_decrease_reservation ) - curr->domain->arch.hvm.qemu_mapcache_invalidate = true; + curr->domain->mapcache_invalidate = true; return rc; } @@ -XXX,XX +XXX,XX @@ int hvm_hypercall(struct cpu_user_regs *regs) HVM_DBG_LOG(DBG_LEVEL_HCALL, "hcall%lu -> %lx", eax, regs->rax); - if ( unlikely(currd->arch.hvm.qemu_mapcache_invalidate) && - test_and_clear_bool(currd->arch.hvm.qemu_mapcache_invalidate) ) - send_invalidate_req(); + if ( unlikely(currd->mapcache_invalidate) && + test_and_clear_bool(currd->mapcache_invalidate) ) + ioreq_signal_mapcache_invalidate(); return curr->hcall_preempted ? HVM_HCALL_preempted : HVM_HCALL_completed; } diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/x86/hvm/io.c +++ b/xen/arch/x86/hvm/io.c @@ -XXX,XX +XXX,XX @@ void send_timeoffset_req(unsigned long timeoff) gprintk(XENLOG_ERR, "Unsuccessful timeoffset update\n"); } -/* Ask ioemu mapcache to invalidate mappings. */ -void send_invalidate_req(void) -{ - ioreq_t p = { - .type = IOREQ_TYPE_INVALIDATE, - .size = 4, - .dir = IOREQ_WRITE, - .data = ~0UL, /* flush all */ - }; - - if ( ioreq_broadcast(&p, false) != 0 ) - gprintk(XENLOG_ERR, "Unsuccessful map-cache invalidate\n"); -} - bool hvm_emulate_one_insn(hvm_emulate_validate_t *validate, const char *descr) { struct hvm_emulate_ctxt ctxt; diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c index XXXXXXX..XXXXXXX 100644 --- a/xen/common/ioreq.c +++ b/xen/common/ioreq.c @@ -XXX,XX +XXX,XX @@ #include <public/hvm/ioreq.h> #include <public/hvm/params.h> +/* Ask ioemu mapcache to invalidate mappings. */ +void ioreq_signal_mapcache_invalidate(void) +{ + ioreq_t p = { + .type = IOREQ_TYPE_INVALIDATE, + .size = 4, + .dir = IOREQ_WRITE, + .data = ~0UL, /* flush all */ + }; + + if ( ioreq_broadcast(&p, false) != 0 ) + gprintk(XENLOG_ERR, "Unsuccessful map-cache invalidate\n"); +} + static void set_ioreq_server(struct domain *d, unsigned int id, struct ioreq_server *s) { diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/domain.h +++ b/xen/include/asm-x86/hvm/domain.h @@ -XXX,XX +XXX,XX @@ struct hvm_domain { struct viridian_domain *viridian; - bool_t qemu_mapcache_invalidate; bool_t is_s3_suspended; /* diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/asm-x86/hvm/io.h +++ b/xen/include/asm-x86/hvm/io.h @@ -XXX,XX +XXX,XX @@ bool relocate_portio_handler( unsigned int size); void send_timeoffset_req(unsigned long timeoff); -void send_invalidate_req(void); bool handle_mmio_with_translation(unsigned long gla, unsigned long gpfn, struct npfec); bool handle_pio(uint16_t port, unsigned int size, int dir); diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/ioreq.h +++ b/xen/include/xen/ioreq.h @@ -XXX,XX +XXX,XX @@ struct ioreq_server *ioreq_server_select(struct domain *d, int ioreq_send(struct ioreq_server *s, ioreq_t *proto_p, bool buffered); unsigned int ioreq_broadcast(ioreq_t *p, bool buffered); +void ioreq_signal_mapcache_invalidate(void); void ioreq_domain_init(struct domain *d); diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -XXX,XX +XXX,XX @@ struct domain * unpaused for the first time by the systemcontroller. */ bool creation_finished; + /* + * Indicates that mapcache invalidation request should be sent to + * the device emulator. + */ + bool mapcache_invalidate; /* Which guest this guest has privileges on */ struct domain *target; -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> We need to send mapcache invalidation request to qemu/demu everytime the page gets removed from a guest. At the moment, the Arm code doesn't explicitely remove the existing mapping before inserting the new mapping. Instead, this is done implicitely by __p2m_set_entry(). So we need to recognize a case when old entry is a RAM page *and* the new MFN is different in order to set the corresponding flag. The most suitable place to do this is p2m_free_entry(), there we can find the correct leaf type. The invalidation request will be sent in do_trap_hypercall() later on. Taking into the account the following the do_trap_hypercall() is the best place to send invalidation request: - The only way a guest can modify its P2M on Arm is via an hypercall - When sending the invalidation request, the vCPU will be blocked until all the IOREQ servers have acknowledged the invalidation Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> CC: Julien Grall <julien.grall@arm.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" *** Please note, this patch depends on the following which is on review: https://patchwork.kernel.org/patch/11803383/ This patch is on par with x86 code (whether it is buggy or not). If there is a need to improve/harden something, this can be done on a follow-up. *** Changes V1 -> V2: - new patch, some changes were derived from (+ new explanation): xen/ioreq: Make x86's invalidate qemu mapcache handling common - put setting of the flag into __p2m_set_entry() - clarify the conditions when the flag should be set - use domain_has_ioreq_server() - update do_trap_hypercall() by adding local variable Changes V2 -> V3: - update patch description - move check to p2m_free_entry() - add a comment - use "curr" instead of "v" in do_trap_hypercall() Changes V3 -> V4: - update patch description - re-order check in p2m_free_entry() to call domain_has_ioreq_server() only if p2m->domain == current->domain - add a comment in do_trap_hypercall() --- xen/arch/arm/p2m.c | 25 +++++++++++++++++-------- xen/arch/arm/traps.c | 20 +++++++++++++++++--- 2 files changed, 34 insertions(+), 11 deletions(-) diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -XXX,XX +XXX,XX @@ #include <xen/cpu.h> #include <xen/domain_page.h> #include <xen/iocap.h> +#include <xen/ioreq.h> #include <xen/lib.h> #include <xen/sched.h> #include <xen/softirq.h> @@ -XXX,XX +XXX,XX @@ static void p2m_free_entry(struct p2m_domain *p2m, if ( !p2m_is_valid(entry) ) return; - /* Nothing to do but updating the stats if the entry is a super-page. */ - if ( p2m_is_superpage(entry, level) ) + if ( p2m_is_superpage(entry, level) || (level == 3) ) { - p2m->stats.mappings[level]--; - return; - } +#ifdef CONFIG_IOREQ_SERVER + /* + * If this gets called (non-recursively) then either the entry + * was replaced by an entry with a different base (valid case) or + * the shattering of a superpage was failed (error case). + * So, at worst, the spurious mapcache invalidation might be sent. + */ + if ( (p2m->domain == current->domain) && + domain_has_ioreq_server(p2m->domain) && + p2m_is_ram(entry.p2m.type) ) + p2m->domain->mapcache_invalidate = true; +#endif - if ( level == 3 ) - { p2m->stats.mappings[level]--; - p2m_put_l3_page(entry); + /* Nothing to do if the entry is a super-page. */ + if ( level == 3 ) + p2m_put_l3_page(entry); return; } diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index XXXXXXX..XXXXXXX 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -XXX,XX +XXX,XX @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr, const union hsr hsr) { arm_hypercall_fn_t call = NULL; + struct vcpu *curr = current; BUILD_BUG_ON(NR_hypercalls < ARRAY_SIZE(arm_hypercall_table) ); @@ -XXX,XX +XXX,XX @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr, return; } - current->hcall_preempted = false; + curr->hcall_preempted = false; perfc_incra(hypercalls, *nr); call = arm_hypercall_table[*nr].fn; @@ -XXX,XX +XXX,XX @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr, HYPERCALL_RESULT_REG(regs) = call(HYPERCALL_ARGS(regs)); #ifndef NDEBUG - if ( !current->hcall_preempted ) + if ( !curr->hcall_preempted ) { /* Deliberately corrupt parameter regs used by this hypercall. */ switch ( arm_hypercall_table[*nr].nr_args ) { @@ -XXX,XX +XXX,XX @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr, #endif /* Ensure the hypercall trap instruction is re-executed. */ - if ( current->hcall_preempted ) + if ( curr->hcall_preempted ) regs->pc -= 4; /* re-execute 'hvc #XEN_HYPERCALL_TAG' */ + +#ifdef CONFIG_IOREQ_SERVER + /* + * Taking into the account the following the do_trap_hypercall() + * is the best place to send invalidation request: + * - The only way a guest can modify its P2M on Arm is via an hypercall + * - When sending the invalidation request, the vCPU will be blocked + * until all the IOREQ servers have acknowledged the invalidation + */ + if ( unlikely(curr->domain->mapcache_invalidate) && + test_and_clear_bool(curr->domain->mapcache_invalidate) ) + ioreq_signal_mapcache_invalidate(); +#endif } void arch_hypercall_tasklet_result(struct vcpu *v, long res) -- 2.7.4
From: Julien Grall <julien.grall@arm.com> This patch creates specific device node in the Guest device-tree with allocated MMIO range and SPI interrupt if specific 'virtio' property is present in domain config. Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Please note, this is a split/cleanup/hardening of Julien's PoC: "Add support for Guest IO forwarding to a device emulator" Changes RFC -> V1: - was squashed with: "[RFC PATCH V1 09/12] libxl: Handle virtio-mmio irq in more correct way" "[RFC PATCH V1 11/12] libxl: Insert "dma-coherent" property into virtio-mmio device node" "[RFC PATCH V1 12/12] libxl: Fix duplicate memory node in DT" - move VirtIO MMIO #define-s to xen/include/public/arch-arm.h Changes V1 -> V2: - update the author of a patch Changes V2 -> V3: - no changes Changes V3 -> V4: - no changes --- tools/libs/light/libxl_arm.c | 58 ++++++++++++++++++++++++++++++++++++++-- tools/libs/light/libxl_types.idl | 1 + tools/xl/xl_parse.c | 1 + xen/include/public/arch-arm.h | 5 ++++ 4 files changed, 63 insertions(+), 2 deletions(-) diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/light/libxl_arm.c +++ b/tools/libs/light/libxl_arm.c @@ -XXX,XX +XXX,XX @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, { uint32_t nr_spis = 0; unsigned int i; - uint32_t vuart_irq; - bool vuart_enabled = false; + uint32_t vuart_irq, virtio_irq; + bool vuart_enabled = false, virtio_enabled = false; /* * If pl011 vuart is enabled then increment the nr_spis to allow allocation @@ -XXX,XX +XXX,XX @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, vuart_enabled = true; } + /* + * XXX: Handle properly virtio + * A proper solution would be the toolstack to allocate the interrupts + * used by each virtio backend and let the backend now which one is used + */ + if (libxl_defbool_val(d_config->b_info.arch_arm.virtio)) { + nr_spis += (GUEST_VIRTIO_MMIO_SPI - 32) + 1; + virtio_irq = GUEST_VIRTIO_MMIO_SPI; + virtio_enabled = true; + } + for (i = 0; i < d_config->b_info.num_irqs; i++) { uint32_t irq = d_config->b_info.irqs[i]; uint32_t spi; @@ -XXX,XX +XXX,XX @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, return ERROR_FAIL; } + /* The same check as for vpl011 */ + if (virtio_enabled && irq == virtio_irq) { + LOG(ERROR, "Physical IRQ %u conflicting with virtio SPI\n", irq); + return ERROR_FAIL; + } + if (irq < 32) continue; @@ -XXX,XX +XXX,XX @@ static int make_vpl011_uart_node(libxl__gc *gc, void *fdt, return 0; } +static int make_virtio_mmio_node(libxl__gc *gc, void *fdt, + uint64_t base, uint32_t irq) +{ + int res; + gic_interrupt intr; + /* Placeholder for virtio@ + a 64-bit number + \0 */ + char buf[24]; + + snprintf(buf, sizeof(buf), "virtio@%"PRIx64, base); + res = fdt_begin_node(fdt, buf); + if (res) return res; + + res = fdt_property_compat(gc, fdt, 1, "virtio,mmio"); + if (res) return res; + + res = fdt_property_regs(gc, fdt, GUEST_ROOT_ADDRESS_CELLS, GUEST_ROOT_SIZE_CELLS, + 1, base, GUEST_VIRTIO_MMIO_SIZE); + if (res) return res; + + set_interrupt(intr, irq, 0xf, DT_IRQ_TYPE_EDGE_RISING); + res = fdt_property_interrupts(gc, fdt, &intr, 1); + if (res) return res; + + res = fdt_property(fdt, "dma-coherent", NULL, 0); + if (res) return res; + + res = fdt_end_node(fdt); + if (res) return res; + + return 0; + +} + static const struct arch_info *get_arch_info(libxl__gc *gc, const struct xc_dom_image *dom) { @@ -XXX,XX +XXX,XX @@ next_resize: if (info->tee == LIBXL_TEE_TYPE_OPTEE) FDT( make_optee_node(gc, fdt) ); + if (libxl_defbool_val(info->arch_arm.virtio)) + FDT( make_virtio_mmio_node(gc, fdt, GUEST_VIRTIO_MMIO_BASE, GUEST_VIRTIO_MMIO_SPI) ); + if (pfdt) FDT( copy_partial_fdt(gc, fdt, pfdt) ); @@ -XXX,XX +XXX,XX @@ void libxl__arch_domain_build_info_setdefault(libxl__gc *gc, { /* ACPI is disabled by default */ libxl_defbool_setdefault(&b_info->acpi, false); + libxl_defbool_setdefault(&b_info->arch_arm.virtio, false); if (b_info->type != LIBXL_DOMAIN_TYPE_PV) return; diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/light/libxl_types.idl +++ b/tools/libs/light/libxl_types.idl @@ -XXX,XX +XXX,XX @@ libxl_domain_build_info = Struct("domain_build_info",[ ("arch_arm", Struct(None, [("gic_version", libxl_gic_version), + ("virtio", libxl_defbool), ("vuart", libxl_vuart_type), ])), # Alternate p2m is not bound to any architecture or guest type, as it is diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c index XXXXXXX..XXXXXXX 100644 --- a/tools/xl/xl_parse.c +++ b/tools/xl/xl_parse.c @@ -XXX,XX +XXX,XX @@ skip_usbdev: } xlu_cfg_get_defbool(config, "dm_restrict", &b_info->dm_restrict, 0); + xlu_cfg_get_defbool(config, "virtio", &b_info->arch_arm.virtio, 0); if (c_info->type == LIBXL_DOMAIN_TYPE_HVM) { if (!xlu_cfg_get_string (config, "vga", &buf, 0)) { diff --git a/xen/include/public/arch-arm.h b/xen/include/public/arch-arm.h index XXXXXXX..XXXXXXX 100644 --- a/xen/include/public/arch-arm.h +++ b/xen/include/public/arch-arm.h @@ -XXX,XX +XXX,XX @@ typedef uint64_t xen_callback_t; #define PSCI_cpu_on 2 #define PSCI_migrate 3 +/* VirtIO MMIO definitions */ +#define GUEST_VIRTIO_MMIO_BASE xen_mk_ullong(0x02000000) +#define GUEST_VIRTIO_MMIO_SIZE xen_mk_ullong(0x200) +#define GUEST_VIRTIO_MMIO_SPI 33 + #endif #ifndef __ASSEMBLY__ -- 2.7.4
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> This patch adds basic support for configuring and assisting virtio-disk backend (emualator) which is intended to run out of Qemu and could be run in any domain. Xenstore was chosen as a communication interface for the emulator running in non-toolstack domain to be able to get configuration either by reading Xenstore directly or by receiving command line parameters (an updated 'xl devd' running in the same domain would read Xenstore beforehand and call backend executable with the required arguments). An example of domain configuration (two disks are assigned to the guest, the latter is in readonly mode): vdisk = [ 'backend=DomD, disks=rw:/dev/mmcblk0p3;ro:/dev/mmcblk1p3' ] Where per-disk Xenstore entries are: - filename and readonly flag (configured via "vdisk" property) - base and irq (allocated dynamically) Besides handling 'visible' params described in configuration file, patch also allocates virtio-mmio specific ones for each device and writes them into Xenstore. virtio-mmio params (irq and base) are unique per guest domain, they allocated at the domain creation time and passed through to the emulator. Each VirtIO device has at least one pair of these params. TODO: 1. An extra "virtio" property could be removed. 2. Update documentation. Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> [On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com> --- Changes RFC -> V1: - no changes Changes V1 -> V2: - rebase according to the new location of libxl_virtio_disk.c Changes V2 -> V3: - no changes Changes V3 -> V4: - rebase according to the new argument for DEFINE_DEVICE_TYPE_STRUCT Please note, there is a real concern about VirtIO interrupts allocation. [Just copy here what Stefano said in RFC thread] So, if we end up allocating let's say 6 virtio interrupts for a domain, the chance of a clash with a physical interrupt of a passthrough device is real. I am not entirely sure how to solve it, but these are a few ideas: - choosing virtio interrupts that are less likely to conflict (maybe > 1000) - make the virtio irq (optionally) configurable so that a user could override the default irq and specify one that doesn't conflict - implementing support for virq != pirq (even the xl interface doesn't allow to specify the virq number for passthrough devices, see "irqs") Also there is one suggestion from Wei Chen regarding a parameter for domain config file which I haven't addressed yet. [Just copy here what Wei said in V2 thread] Can we keep use the same 'disk' parameter for virtio-disk, but add an option like "model=virtio-disk"? For example: disk = [ 'backend=DomD, disks=rw:/dev/mmcblk0p3,model=virtio-disk' ] Just like what Xen has done for x86 virtio-net. --- tools/libs/light/Makefile | 1 + tools/libs/light/libxl_arm.c | 56 ++++++++++++--- tools/libs/light/libxl_create.c | 1 + tools/libs/light/libxl_internal.h | 1 + tools/libs/light/libxl_types.idl | 15 ++++ tools/libs/light/libxl_types_internal.idl | 1 + tools/libs/light/libxl_virtio_disk.c | 109 ++++++++++++++++++++++++++++ tools/xl/Makefile | 2 +- tools/xl/xl.h | 3 + tools/xl/xl_cmdtable.c | 15 ++++ tools/xl/xl_parse.c | 115 ++++++++++++++++++++++++++++++ tools/xl/xl_virtio_disk.c | 46 ++++++++++++ 12 files changed, 354 insertions(+), 11 deletions(-) create mode 100644 tools/libs/light/libxl_virtio_disk.c create mode 100644 tools/xl/xl_virtio_disk.c diff --git a/tools/libs/light/Makefile b/tools/libs/light/Makefile index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/light/Makefile +++ b/tools/libs/light/Makefile @@ -XXX,XX +XXX,XX @@ SRCS-y += libxl_genid.c SRCS-y += _libxl_types.c SRCS-y += libxl_flask.c SRCS-y += _libxl_types_internal.c +SRCS-y += libxl_virtio_disk.c ifeq ($(CONFIG_LIBNL),y) CFLAGS_LIBXL += $(LIBNL3_CFLAGS) diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/light/libxl_arm.c +++ b/tools/libs/light/libxl_arm.c @@ -XXX,XX +XXX,XX @@ #include <assert.h> #include <xen/device_tree_defs.h> +#ifndef container_of +#define container_of(ptr, type, member) ({ \ + typeof( ((type *)0)->member ) *__mptr = (ptr); \ + (type *)( (char *)__mptr - offsetof(type,member) );}) +#endif + static const char *gicv_to_string(libxl_gic_version gic_version) { switch (gic_version) { @@ -XXX,XX +XXX,XX @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, vuart_enabled = true; } - /* - * XXX: Handle properly virtio - * A proper solution would be the toolstack to allocate the interrupts - * used by each virtio backend and let the backend now which one is used - */ if (libxl_defbool_val(d_config->b_info.arch_arm.virtio)) { - nr_spis += (GUEST_VIRTIO_MMIO_SPI - 32) + 1; + uint64_t virtio_base; + libxl_device_virtio_disk *virtio_disk; + + virtio_base = GUEST_VIRTIO_MMIO_BASE; virtio_irq = GUEST_VIRTIO_MMIO_SPI; + + if (!d_config->num_virtio_disks) { + LOG(ERROR, "Virtio is enabled, but no Virtio devices present\n"); + return ERROR_FAIL; + } + virtio_disk = &d_config->virtio_disks[0]; + + for (i = 0; i < virtio_disk->num_disks; i++) { + virtio_disk->disks[i].base = virtio_base; + virtio_disk->disks[i].irq = virtio_irq; + + LOG(DEBUG, "Allocate Virtio MMIO params: IRQ %u BASE 0x%"PRIx64, + virtio_irq, virtio_base); + + virtio_irq ++; + virtio_base += GUEST_VIRTIO_MMIO_SIZE; + } + virtio_irq --; + + nr_spis += (virtio_irq - 32) + 1; virtio_enabled = true; } @@ -XXX,XX +XXX,XX @@ int libxl__arch_domain_prepare_config(libxl__gc *gc, } /* The same check as for vpl011 */ - if (virtio_enabled && irq == virtio_irq) { - LOG(ERROR, "Physical IRQ %u conflicting with virtio SPI\n", irq); + if (virtio_enabled && + (irq >= GUEST_VIRTIO_MMIO_SPI && irq <= virtio_irq)) { + LOG(ERROR, "Physical IRQ %u conflicting with Virtio IRQ range\n", irq); return ERROR_FAIL; } @@ -XXX,XX +XXX,XX @@ next_resize: if (info->tee == LIBXL_TEE_TYPE_OPTEE) FDT( make_optee_node(gc, fdt) ); - if (libxl_defbool_val(info->arch_arm.virtio)) - FDT( make_virtio_mmio_node(gc, fdt, GUEST_VIRTIO_MMIO_BASE, GUEST_VIRTIO_MMIO_SPI) ); + if (libxl_defbool_val(info->arch_arm.virtio)) { + libxl_domain_config *d_config = + container_of(info, libxl_domain_config, b_info); + libxl_device_virtio_disk *virtio_disk = &d_config->virtio_disks[0]; + unsigned int i; + + for (i = 0; i < virtio_disk->num_disks; i++) { + uint64_t base = virtio_disk->disks[i].base; + uint32_t irq = virtio_disk->disks[i].irq; + + FDT( make_virtio_mmio_node(gc, fdt, base, irq) ); + } + } if (pfdt) FDT( copy_partial_fdt(gc, fdt, pfdt) ); diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/light/libxl_create.c +++ b/tools/libs/light/libxl_create.c @@ -XXX,XX +XXX,XX @@ const libxl__device_type *device_type_tbl[] = { &libxl__dtdev_devtype, &libxl__vdispl_devtype, &libxl__vsnd_devtype, + &libxl__virtio_disk_devtype, NULL }; diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/light/libxl_internal.h +++ b/tools/libs/light/libxl_internal.h @@ -XXX,XX +XXX,XX @@ extern const libxl__device_type libxl__vdispl_devtype; extern const libxl__device_type libxl__p9_devtype; extern const libxl__device_type libxl__pvcallsif_devtype; extern const libxl__device_type libxl__vsnd_devtype; +extern const libxl__device_type libxl__virtio_disk_devtype; extern const libxl__device_type *device_type_tbl[]; diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/light/libxl_types.idl +++ b/tools/libs/light/libxl_types.idl @@ -XXX,XX +XXX,XX @@ libxl_device_vsnd = Struct("device_vsnd", [ ("pcms", Array(libxl_vsnd_pcm, "num_vsnd_pcms")) ]) +libxl_virtio_disk_param = Struct("virtio_disk_param", [ + ("filename", string), + ("readonly", bool), + ("irq", uint32), + ("base", uint64), + ]) + +libxl_device_virtio_disk = Struct("device_virtio_disk", [ + ("backend_domid", libxl_domid), + ("backend_domname", string), + ("devid", libxl_devid), + ("disks", Array(libxl_virtio_disk_param, "num_disks")), + ]) + libxl_domain_config = Struct("domain_config", [ ("c_info", libxl_domain_create_info), ("b_info", libxl_domain_build_info), @@ -XXX,XX +XXX,XX @@ libxl_domain_config = Struct("domain_config", [ ("pvcallsifs", Array(libxl_device_pvcallsif, "num_pvcallsifs")), ("vdispls", Array(libxl_device_vdispl, "num_vdispls")), ("vsnds", Array(libxl_device_vsnd, "num_vsnds")), + ("virtio_disks", Array(libxl_device_virtio_disk, "num_virtio_disks")), # a channel manifests as a console with a name, # see docs/misc/channels.txt ("channels", Array(libxl_device_channel, "num_channels")), diff --git a/tools/libs/light/libxl_types_internal.idl b/tools/libs/light/libxl_types_internal.idl index XXXXXXX..XXXXXXX 100644 --- a/tools/libs/light/libxl_types_internal.idl +++ b/tools/libs/light/libxl_types_internal.idl @@ -XXX,XX +XXX,XX @@ libxl__device_kind = Enumeration("device_kind", [ (14, "PVCALLS"), (15, "VSND"), (16, "VINPUT"), + (17, "VIRTIO_DISK"), ]) libxl__console_backend = Enumeration("console_backend", [ diff --git a/tools/libs/light/libxl_virtio_disk.c b/tools/libs/light/libxl_virtio_disk.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/tools/libs/light/libxl_virtio_disk.c @@ -XXX,XX +XXX,XX @@ +/* + * Copyright (C) 2020 EPAM Systems Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#include "libxl_internal.h" + +static int libxl__device_virtio_disk_setdefault(libxl__gc *gc, uint32_t domid, + libxl_device_virtio_disk *virtio_disk, + bool hotplug) +{ + return libxl__resolve_domid(gc, virtio_disk->backend_domname, + &virtio_disk->backend_domid); +} + +static int libxl__virtio_disk_from_xenstore(libxl__gc *gc, const char *libxl_path, + libxl_devid devid, + libxl_device_virtio_disk *virtio_disk) +{ + const char *be_path; + int rc; + + virtio_disk->devid = devid; + rc = libxl__xs_read_mandatory(gc, XBT_NULL, + GCSPRINTF("%s/backend", libxl_path), + &be_path); + if (rc) return rc; + + rc = libxl__backendpath_parse_domid(gc, be_path, &virtio_disk->backend_domid); + if (rc) return rc; + + return 0; +} + +static void libxl__update_config_virtio_disk(libxl__gc *gc, + libxl_device_virtio_disk *dst, + libxl_device_virtio_disk *src) +{ + dst->devid = src->devid; +} + +static int libxl_device_virtio_disk_compare(libxl_device_virtio_disk *d1, + libxl_device_virtio_disk *d2) +{ + return COMPARE_DEVID(d1, d2); +} + +static void libxl__device_virtio_disk_add(libxl__egc *egc, uint32_t domid, + libxl_device_virtio_disk *virtio_disk, + libxl__ao_device *aodev) +{ + libxl__device_add_async(egc, domid, &libxl__virtio_disk_devtype, virtio_disk, aodev); +} + +static int libxl__set_xenstore_virtio_disk(libxl__gc *gc, uint32_t domid, + libxl_device_virtio_disk *virtio_disk, + flexarray_t *back, flexarray_t *front, + flexarray_t *ro_front) +{ + int rc; + unsigned int i; + + for (i = 0; i < virtio_disk->num_disks; i++) { + rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/filename", i), + GCSPRINTF("%s", virtio_disk->disks[i].filename)); + if (rc) return rc; + + rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/readonly", i), + GCSPRINTF("%d", virtio_disk->disks[i].readonly)); + if (rc) return rc; + + rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/base", i), + GCSPRINTF("%lu", virtio_disk->disks[i].base)); + if (rc) return rc; + + rc = flexarray_append_pair(ro_front, GCSPRINTF("%d/irq", i), + GCSPRINTF("%u", virtio_disk->disks[i].irq)); + if (rc) return rc; + } + + return 0; +} + +static LIBXL_DEFINE_UPDATE_DEVID(virtio_disk) +static LIBXL_DEFINE_DEVICE_FROM_TYPE(virtio_disk) +static LIBXL_DEFINE_DEVICES_ADD(virtio_disk) + +DEFINE_DEVICE_TYPE_STRUCT(virtio_disk, VIRTIO_DISK, virtio_disks, + .update_config = (device_update_config_fn_t) libxl__update_config_virtio_disk, + .from_xenstore = (device_from_xenstore_fn_t) libxl__virtio_disk_from_xenstore, + .set_xenstore_config = (device_set_xenstore_config_fn_t) libxl__set_xenstore_virtio_disk +); + +/* + * Local variables: + * mode: C + * c-basic-offset: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/tools/xl/Makefile b/tools/xl/Makefile index XXXXXXX..XXXXXXX 100644 --- a/tools/xl/Makefile +++ b/tools/xl/Makefile @@ -XXX,XX +XXX,XX @@ XL_OBJS += xl_vtpm.o xl_block.o xl_nic.o xl_usb.o XL_OBJS += xl_sched.o xl_pci.o xl_vcpu.o xl_cdrom.o xl_mem.o XL_OBJS += xl_info.o xl_console.o xl_misc.o XL_OBJS += xl_vmcontrol.o xl_saverestore.o xl_migrate.o -XL_OBJS += xl_vdispl.o xl_vsnd.o xl_vkb.o +XL_OBJS += xl_vdispl.o xl_vsnd.o xl_vkb.o xl_virtio_disk.o $(XL_OBJS): CFLAGS += $(CFLAGS_libxentoollog) $(XL_OBJS): CFLAGS += $(CFLAGS_XL) diff --git a/tools/xl/xl.h b/tools/xl/xl.h index XXXXXXX..XXXXXXX 100644 --- a/tools/xl/xl.h +++ b/tools/xl/xl.h @@ -XXX,XX +XXX,XX @@ int main_vsnddetach(int argc, char **argv); int main_vkbattach(int argc, char **argv); int main_vkblist(int argc, char **argv); int main_vkbdetach(int argc, char **argv); +int main_virtio_diskattach(int argc, char **argv); +int main_virtio_disklist(int argc, char **argv); +int main_virtio_diskdetach(int argc, char **argv); int main_usbctrl_attach(int argc, char **argv); int main_usbctrl_detach(int argc, char **argv); int main_usbdev_attach(int argc, char **argv); diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c index XXXXXXX..XXXXXXX 100644 --- a/tools/xl/xl_cmdtable.c +++ b/tools/xl/xl_cmdtable.c @@ -XXX,XX +XXX,XX @@ struct cmd_spec cmd_table[] = { "Destroy a domain's virtual sound device", "<Domain> <DevId>", }, + { "virtio-disk-attach", + &main_virtio_diskattach, 1, 1, + "Create a new virtio block device", + " TBD\n" + }, + { "virtio-disk-list", + &main_virtio_disklist, 0, 0, + "List virtio block devices for a domain", + "<Domain(s)>", + }, + { "virtio-disk-detach", + &main_virtio_diskdetach, 0, 1, + "Destroy a domain's virtio block device", + "<Domain> <DevId>", + }, { "uptime", &main_uptime, 0, 0, "Print uptime for all/some domains", diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c index XXXXXXX..XXXXXXX 100644 --- a/tools/xl/xl_parse.c +++ b/tools/xl/xl_parse.c @@ -XXX,XX +XXX,XX @@ out: if (rc) exit(EXIT_FAILURE); } +#define MAX_VIRTIO_DISKS 4 + +static int parse_virtio_disk_config(libxl_device_virtio_disk *virtio_disk, char *token) +{ + char *oparg; + libxl_string_list disks = NULL; + int i, rc; + + if (MATCH_OPTION("backend", token, oparg)) { + virtio_disk->backend_domname = strdup(oparg); + } else if (MATCH_OPTION("disks", token, oparg)) { + split_string_into_string_list(oparg, ";", &disks); + + virtio_disk->num_disks = libxl_string_list_length(&disks); + if (virtio_disk->num_disks > MAX_VIRTIO_DISKS) { + fprintf(stderr, "vdisk: currently only %d disks are supported", + MAX_VIRTIO_DISKS); + return 1; + } + virtio_disk->disks = xcalloc(virtio_disk->num_disks, + sizeof(*virtio_disk->disks)); + + for(i = 0; i < virtio_disk->num_disks; i++) { + char *disk_opt; + + rc = split_string_into_pair(disks[i], ":", &disk_opt, + &virtio_disk->disks[i].filename); + if (rc) { + fprintf(stderr, "vdisk: failed to split \"%s\" into pair\n", + disks[i]); + goto out; + } + + if (!strcmp(disk_opt, "ro")) + virtio_disk->disks[i].readonly = 1; + else if (!strcmp(disk_opt, "rw")) + virtio_disk->disks[i].readonly = 0; + else { + fprintf(stderr, "vdisk: failed to parse \"%s\" disk option\n", + disk_opt); + rc = 1; + } + free(disk_opt); + + if (rc) goto out; + } + } else { + fprintf(stderr, "Unknown string \"%s\" in vdisk spec\n", token); + rc = 1; goto out; + } + + rc = 0; + +out: + libxl_string_list_dispose(&disks); + return rc; +} + +static void parse_virtio_disk_list(const XLU_Config *config, + libxl_domain_config *d_config) +{ + XLU_ConfigList *virtio_disks; + const char *item; + char *buf = NULL; + int rc; + + if (!xlu_cfg_get_list (config, "vdisk", &virtio_disks, 0, 0)) { + libxl_domain_build_info *b_info = &d_config->b_info; + int entry = 0; + + /* XXX Remove an extra property */ + libxl_defbool_setdefault(&b_info->arch_arm.virtio, false); + if (!libxl_defbool_val(b_info->arch_arm.virtio)) { + fprintf(stderr, "Virtio device requires Virtio property to be set\n"); + exit(EXIT_FAILURE); + } + + while ((item = xlu_cfg_get_listitem(virtio_disks, entry)) != NULL) { + libxl_device_virtio_disk *virtio_disk; + char *p; + + virtio_disk = ARRAY_EXTEND_INIT(d_config->virtio_disks, + d_config->num_virtio_disks, + libxl_device_virtio_disk_init); + + buf = strdup(item); + + p = strtok (buf, ","); + while (p != NULL) + { + while (*p == ' ') p++; + + rc = parse_virtio_disk_config(virtio_disk, p); + if (rc) goto out; + + p = strtok (NULL, ","); + } + + entry++; + + if (virtio_disk->num_disks == 0) { + fprintf(stderr, "At least one virtio disk should be specified\n"); + rc = 1; goto out; + } + } + } + + rc = 0; + +out: + free(buf); + if (rc) exit(EXIT_FAILURE); +} + void parse_config_data(const char *config_source, const char *config_data, int config_len, @@ -XXX,XX +XXX,XX @@ skip_usbdev: } parse_vkb_list(config, d_config); + parse_virtio_disk_list(config, d_config); xlu_cfg_get_defbool(config, "xend_suspend_evtchn_compat", &c_info->xend_suspend_evtchn_compat, 0); diff --git a/tools/xl/xl_virtio_disk.c b/tools/xl/xl_virtio_disk.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/tools/xl/xl_virtio_disk.c @@ -XXX,XX +XXX,XX @@ +/* + * Copyright (C) 2020 EPAM Systems Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ + +#include <stdlib.h> + +#include <libxl.h> +#include <libxl_utils.h> +#include <libxlutil.h> + +#include "xl.h" +#include "xl_utils.h" +#include "xl_parse.h" + +int main_virtio_diskattach(int argc, char **argv) +{ + return 0; +} + +int main_virtio_disklist(int argc, char **argv) +{ + return 0; +} + +int main_virtio_diskdetach(int argc, char **argv) +{ + return 0; +} + +/* + * Local variables: + * mode: C + * c-basic-offset: 4 + * indent-tabs-mode: nil + * End: + */ -- 2.7.4