From nobody Mon May 6 21:09:16 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=bytedance.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1643121854157716.9804544094641; Tue, 25 Jan 2022 06:44:14 -0800 (PST) Received: from localhost ([::1]:33840 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nCN3N-0005jb-81 for importer@patchew.org; Tue, 25 Jan 2022 09:44:13 -0500 Received: from eggs.gnu.org ([209.51.188.92]:42226) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCLiT-0007JP-0d for qemu-devel@nongnu.org; Tue, 25 Jan 2022 08:18:34 -0500 Received: from [2607:f8b0:4864:20::102a] (port=56259 helo=mail-pj1-x102a.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nCLiQ-0004wv-S5 for qemu-devel@nongnu.org; Tue, 25 Jan 2022 08:18:32 -0500 Received: by mail-pj1-x102a.google.com with SMTP id d5so17532615pjk.5 for ; Tue, 25 Jan 2022 05:18:30 -0800 (PST) Received: from localhost ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id h18sm3555591pfh.51.2022.01.25.05.18.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Jan 2022 05:18:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=wGbj6244a378yuPR0ftQP81z4tBw3+5/JWR9CFPlaUw=; b=YctHY83t9qIOLn+Jc9HKKz4RudXMYC+720q+ArnITgyp97xJ9OAG+Nq9Etn1DgmLjT V7KQ0v1Rr97GHMgXQH2VMQx5Y0KYJF2vxXUxkhKKTD9Laa3WN1cl7iP+llK1reCBTQy0 PJ9NNIG/N2k7B8hnGE2WJnYDUo7oSyFDlXi4v7jX0TjFKSas1+YQXOnWU4xKKfnynkNk 8hsTX/PK6ikKPOdwJyfV1ul9EHYK/X4BktVjC9gXTBtt/sIYkkdGsh/C2BZVBGGBrKsr jQxvG3KB0Bi0wZ1EqXlMw1AcNwl9LwH91fza4KLIrFpk+AuT0fujX5QcRSAVtPbH1zo7 +3Hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=wGbj6244a378yuPR0ftQP81z4tBw3+5/JWR9CFPlaUw=; b=Dh87g+slEwa+d6VzTRNkhKPxwDd5NrX6rVWOcCGd3ai9wwOKmmABopRAonEurc6k/B 8XjtWpVXSpdfe7KWKqCm9Y0cJtVSagJLvOGLFF5UcJrLnytZGENtoGPNvczAYs7C0G/v lnxyawzyFOsXCN5spSGy1XETjF0Ez0rq6mFzRLoxk6Eprs3Wp/bMlwwcuu0kp9r+bBco SniiF54qCpwabkmmNm3CHMT0HAeqBNTmvSsnyIOiUkDjwXApxBONpU6Z+Ld5bO2zgBZg 8Q7M3MjvFzHoi1aHsMLMLEIKivHkHnlgmFx66d6nN+ppYXFKsnFH3CoveydbFomrw6YI CpLA== X-Gm-Message-State: AOAM5321+0vxznWY9bzU2r/m4yIJjW3x6eupSNr0rbSHFHxoVYt1ueOv Nd6n9+POdNeh5vTH8W2M/6zJ X-Google-Smtp-Source: ABdhPJwJUC/arhEoqNXDvggwUzPLGiXSaZGNqizZ2BZ12+zqjUVt2Yv/TmGMmv+IhBgEepe8G4vG7Q== X-Received: by 2002:a17:902:ab05:b0:14b:e74:d7bd with SMTP id ik5-20020a170902ab0500b0014b0e74d7bdmr18757672plb.126.1643116709395; Tue, 25 Jan 2022 05:18:29 -0800 (PST) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, kwolf@redhat.com, mreitz@redhat.com, mlureau@redhat.com Subject: [PATCH 1/5] linux-headers: Add vduse.h Date: Tue, 25 Jan 2022 21:17:56 +0800 Message-Id: <20220125131800.91-2-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220125131800.91-1-xieyongji@bytedance.com> References: <20220125131800.91-1-xieyongji@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Host-Lookup-Failed: Reverse DNS lookup failed for 2607:f8b0:4864:20::102a (failed) Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::102a; envelope-from=xieyongji@bytedance.com; helo=mail-pj1-x102a.google.com X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, PDS_HP_HELO_NORDNS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1643121857870100003 Content-Type: text/plain; charset="utf-8" This adds vduse header to standard headers so that the relevant VDUSE API can be used in subsequent patches. Signed-off-by: Xie Yongji --- include/standard-headers/linux/vduse.h | 306 +++++++++++++++++++++++++ scripts/update-linux-headers.sh | 1 + 2 files changed, 307 insertions(+) create mode 100644 include/standard-headers/linux/vduse.h diff --git a/include/standard-headers/linux/vduse.h b/include/standard-head= ers/linux/vduse.h new file mode 100644 index 0000000000..4242bc9fdf --- /dev/null +++ b/include/standard-headers/linux/vduse.h @@ -0,0 +1,306 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _VDUSE_H_ +#define _VDUSE_H_ + +#include "standard-headers/linux/types.h" + +#define VDUSE_BASE 0x81 + +/* The ioctls for control device (/dev/vduse/control) */ + +#define VDUSE_API_VERSION 0 + +/* + * Get the version of VDUSE API that kernel supported (VDUSE_API_VERSION). + * This is used for future extension. + */ +#define VDUSE_GET_API_VERSION _IOR(VDUSE_BASE, 0x00, uint64_t) + +/* Set the version of VDUSE API that userspace supported. */ +#define VDUSE_SET_API_VERSION _IOW(VDUSE_BASE, 0x01, uint64_t) + +/** + * struct vduse_dev_config - basic configuration of a VDUSE device + * @name: VDUSE device name, needs to be NUL terminated + * @vendor_id: virtio vendor id + * @device_id: virtio device id + * @features: virtio features + * @vq_num: the number of virtqueues + * @vq_align: the allocation alignment of virtqueue's metadata + * @reserved: for future use, needs to be initialized to zero + * @config_size: the size of the configuration space + * @config: the buffer of the configuration space + * + * Structure used by VDUSE_CREATE_DEV ioctl to create VDUSE device. + */ +struct vduse_dev_config { +#define VDUSE_NAME_MAX 256 + char name[VDUSE_NAME_MAX]; + uint32_t vendor_id; + uint32_t device_id; + uint64_t features; + uint32_t vq_num; + uint32_t vq_align; + uint32_t reserved[13]; + uint32_t config_size; + uint8_t config[]; +}; + +/* Create a VDUSE device which is represented by a char device (/dev/vduse= /$NAME) */ +#define VDUSE_CREATE_DEV _IOW(VDUSE_BASE, 0x02, struct vduse_dev_config) + +/* + * Destroy a VDUSE device. Make sure there are no more references + * to the char device (/dev/vduse/$NAME). + */ +#define VDUSE_DESTROY_DEV _IOW(VDUSE_BASE, 0x03, char[VDUSE_NAME_MAX]) + +/* The ioctls for VDUSE device (/dev/vduse/$NAME) */ + +/** + * struct vduse_iotlb_entry - entry of IOTLB to describe one IOVA region [= start, last] + * @offset: the mmap offset on returned file descriptor + * @start: start of the IOVA region + * @last: last of the IOVA region + * @perm: access permission of the IOVA region + * + * Structure used by VDUSE_IOTLB_GET_FD ioctl to find an overlapped IOVA r= egion. + */ +struct vduse_iotlb_entry { + uint64_t offset; + uint64_t start; + uint64_t last; +#define VDUSE_ACCESS_RO 0x1 +#define VDUSE_ACCESS_WO 0x2 +#define VDUSE_ACCESS_RW 0x3 + uint8_t perm; +}; + +/* + * Find the first IOVA region that overlaps with the range [start, last] + * and return the corresponding file descriptor. Return -EINVAL means the + * IOVA region doesn't exist. Caller should set start and last fields. + */ +#define VDUSE_IOTLB_GET_FD _IOWR(VDUSE_BASE, 0x10, struct vduse_iotlb_entr= y) + +/* + * Get the negotiated virtio features. It's a subset of the features in + * struct vduse_dev_config which can be accepted by virtio driver. It's + * only valid after FEATURES_OK status bit is set. + */ +#define VDUSE_DEV_GET_FEATURES _IOR(VDUSE_BASE, 0x11, uint64_t) + +/** + * struct vduse_config_data - data used to update configuration space + * @offset: the offset from the beginning of configuration space + * @length: the length to write to configuration space + * @buffer: the buffer used to write from + * + * Structure used by VDUSE_DEV_SET_CONFIG ioctl to update device + * configuration space. + */ +struct vduse_config_data { + uint32_t offset; + uint32_t length; + uint8_t buffer[]; +}; + +/* Set device configuration space */ +#define VDUSE_DEV_SET_CONFIG _IOW(VDUSE_BASE, 0x12, struct vduse_config_da= ta) + +/* + * Inject a config interrupt. It's usually used to notify virtio driver + * that device configuration space has changed. + */ +#define VDUSE_DEV_INJECT_CONFIG_IRQ _IO(VDUSE_BASE, 0x13) + +/** + * struct vduse_vq_config - basic configuration of a virtqueue + * @index: virtqueue index + * @max_size: the max size of virtqueue + * @reserved: for future use, needs to be initialized to zero + * + * Structure used by VDUSE_VQ_SETUP ioctl to setup a virtqueue. + */ +struct vduse_vq_config { + uint32_t index; + uint16_t max_size; + uint16_t reserved[13]; +}; + +/* + * Setup the specified virtqueue. Make sure all virtqueues have been + * configured before the device is attached to vDPA bus. + */ +#define VDUSE_VQ_SETUP _IOW(VDUSE_BASE, 0x14, struct vduse_vq_config) + +/** + * struct vduse_vq_state_split - split virtqueue state + * @avail_index: available index + */ +struct vduse_vq_state_split { + uint16_t avail_index; +}; + +/** + * struct vduse_vq_state_packed - packed virtqueue state + * @last_avail_counter: last driver ring wrap counter observed by device + * @last_avail_idx: device available index + * @last_used_counter: device ring wrap counter + * @last_used_idx: used index + */ +struct vduse_vq_state_packed { + uint16_t last_avail_counter; + uint16_t last_avail_idx; + uint16_t last_used_counter; + uint16_t last_used_idx; +}; + +/** + * struct vduse_vq_info - information of a virtqueue + * @index: virtqueue index + * @num: the size of virtqueue + * @desc_addr: address of desc area + * @driver_addr: address of driver area + * @device_addr: address of device area + * @split: split virtqueue state + * @packed: packed virtqueue state + * @ready: ready status of virtqueue + * + * Structure used by VDUSE_VQ_GET_INFO ioctl to get virtqueue's informatio= n. + */ +struct vduse_vq_info { + uint32_t index; + uint32_t num; + uint64_t desc_addr; + uint64_t driver_addr; + uint64_t device_addr; + union { + struct vduse_vq_state_split split; + struct vduse_vq_state_packed packed; + }; + uint8_t ready; +}; + +/* Get the specified virtqueue's information. Caller should set index fiel= d. */ +#define VDUSE_VQ_GET_INFO _IOWR(VDUSE_BASE, 0x15, struct vduse_vq_info) + +/** + * struct vduse_vq_eventfd - eventfd configuration for a virtqueue + * @index: virtqueue index + * @fd: eventfd, -1 means de-assigning the eventfd + * + * Structure used by VDUSE_VQ_SETUP_KICKFD ioctl to setup kick eventfd. + */ +struct vduse_vq_eventfd { + uint32_t index; +#define VDUSE_EVENTFD_DEASSIGN -1 + int fd; +}; + +/* + * Setup kick eventfd for specified virtqueue. The kick eventfd is used + * by VDUSE kernel module to notify userspace to consume the avail vring. + */ +#define VDUSE_VQ_SETUP_KICKFD _IOW(VDUSE_BASE, 0x16, struct vduse_vq_event= fd) + +/* + * Inject an interrupt for specific virtqueue. It's used to notify virtio = driver + * to consume the used vring. + */ +#define VDUSE_VQ_INJECT_IRQ _IOW(VDUSE_BASE, 0x17, uint32_t) + +/* The control messages definition for read(2)/write(2) on /dev/vduse/$NAM= E */ + +/** + * enum vduse_req_type - request type + * @VDUSE_GET_VQ_STATE: get the state for specified virtqueue from userspa= ce + * @VDUSE_SET_STATUS: set the device status + * @VDUSE_UPDATE_IOTLB: Notify userspace to update the memory mapping for + * specified IOVA range via VDUSE_IOTLB_GET_FD ioctl + */ +enum vduse_req_type { + VDUSE_GET_VQ_STATE, + VDUSE_SET_STATUS, + VDUSE_UPDATE_IOTLB, +}; + +/** + * struct vduse_vq_state - virtqueue state + * @index: virtqueue index + * @split: split virtqueue state + * @packed: packed virtqueue state + */ +struct vduse_vq_state { + uint32_t index; + union { + struct vduse_vq_state_split split; + struct vduse_vq_state_packed packed; + }; +}; + +/** + * struct vduse_dev_status - device status + * @status: device status + */ +struct vduse_dev_status { + uint8_t status; +}; + +/** + * struct vduse_iova_range - IOVA range [start, last] + * @start: start of the IOVA range + * @last: last of the IOVA range + */ +struct vduse_iova_range { + uint64_t start; + uint64_t last; +}; + +/** + * struct vduse_dev_request - control request + * @type: request type + * @request_id: request id + * @reserved: for future use + * @vq_state: virtqueue state, only index field is available + * @s: device status + * @iova: IOVA range for updating + * @padding: padding + * + * Structure used by read(2) on /dev/vduse/$NAME. + */ +struct vduse_dev_request { + uint32_t type; + uint32_t request_id; + uint32_t reserved[4]; + union { + struct vduse_vq_state vq_state; + struct vduse_dev_status s; + struct vduse_iova_range iova; + uint32_t padding[32]; + }; +}; + +/** + * struct vduse_dev_response - response to control request + * @request_id: corresponding request id + * @result: the result of request + * @reserved: for future use, needs to be initialized to zero + * @vq_state: virtqueue state + * @padding: padding + * + * Structure used by write(2) on /dev/vduse/$NAME. + */ +struct vduse_dev_response { + uint32_t request_id; +#define VDUSE_REQ_RESULT_OK 0x00 +#define VDUSE_REQ_RESULT_FAILED 0x01 + uint32_t result; + uint32_t reserved[4]; + union { + struct vduse_vq_state vq_state; + uint32_t padding[32]; + }; +}; + +#endif /* _VDUSE_H_ */ diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers= .sh index fea4d6eb65..4c7846076f 100755 --- a/scripts/update-linux-headers.sh +++ b/scripts/update-linux-headers.sh @@ -198,6 +198,7 @@ for i in "$tmpdir"/include/linux/*virtio*.h \ "$tmpdir/include/linux/const.h" \ "$tmpdir/include/linux/kernel.h" \ "$tmpdir/include/linux/vhost_types.h" \ + "$tmpdir/include/linux/vduse.h" \ "$tmpdir/include/linux/sysinfo.h"; do cp_portable "$i" "$output/include/standard-headers/linux" done --=20 2.20.1 From nobody Mon May 6 21:09:16 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=bytedance.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1643120541485537.3478693555313; Tue, 25 Jan 2022 06:22:21 -0800 (PST) Received: from localhost ([::1]:45482 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nCMiC-0000iT-Fa for importer@patchew.org; Tue, 25 Jan 2022 09:22:20 -0500 Received: from eggs.gnu.org ([209.51.188.92]:42294) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCLiY-0007OK-GY for qemu-devel@nongnu.org; Tue, 25 Jan 2022 08:18:38 -0500 Received: from [2607:f8b0:4864:20::102c] (port=54179 helo=mail-pj1-x102c.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nCLiU-0004xY-Sa for qemu-devel@nongnu.org; Tue, 25 Jan 2022 08:18:38 -0500 Received: by mail-pj1-x102c.google.com with SMTP id h12so19942816pjq.3 for ; Tue, 25 Jan 2022 05:18:34 -0800 (PST) Received: from localhost ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id z13sm20630004pfe.20.2022.01.25.05.18.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Jan 2022 05:18:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=OfhLuEA/X0nXL1+CCGeYckLmz2k1ZsANMWQFKwuNrBA=; b=2GSwx++sGZez6d3Gy4PLTyTg38y2sDIZc1ZCsNU5c0PGtuNlTl+BxmHwYGlVGIHnE/ 8YLQIac+YWU1sCz58QOuTGJFI1uJV7WJQ7fIk1cGUGXy5sy/xgfsRo52YnY5oyl2E1UQ ToSVrW+PIfS2hagO5R6rP9bee6N6WEBTzgA2BHDeiQMoRrCGR/JJHxnwKN3iSYTjhgvm W18Am+IUPSGcR/0yI0NyDYZZDXq/gXwaEw7p5Oego5NlWYU7e8bMC7btSVaHfWwRHe2g Zra/4KgyYsN/4+pYem6EEl8av03UzXyOT8eJyk4x09QQDBGg7gRczGb4Rf1M6Z2kF4Vx r2Fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=OfhLuEA/X0nXL1+CCGeYckLmz2k1ZsANMWQFKwuNrBA=; b=Zn9iekpcjXJLcjGKGspYdPBo8l+2oyNvCIASHRD3EmPiHzX0Lk3G8o6k4Cn0+2uHJK kik62Wo9yeu1hlwv8b8hsstWQTUEcnhXD4UqqbeSA0QPJzgXvud5fpuD4xckKg/iAi1a jqG/7DN5morh0tAzv3Q7oim5XmnqK/kuj4qMHmNS+Go6NMXFJrVbZG9U72szO9mVOJSU 9TaxOvyzFUdUlhJcOoc4AMg+bAVbU5W3RxTsectdABCFJfuwWLCBkQtlOrnJw2kgLnke wsJp4FPAc6w2moKNVsYGCHiQkesaoBcQSEb22TKB3YFak6Y5ECpgfnn2FvhQMxAyIBVh UwLw== X-Gm-Message-State: AOAM532KKKVNqG4m6rP9Lq7PIEZk+ZPfp9+V6YmnDwei53TYuYTMlpbn 1dy6Wcfhp4A1SCmhHAt8YmHv X-Google-Smtp-Source: ABdhPJyXuDHdOFAu5G3yppIWE8oRhDIACcmOM4QQODKoELsFWfNM5s8fzLIa+scY9BZ9MTIRisKAtA== X-Received: by 2002:a17:90a:c694:: with SMTP id n20mr3570213pjt.66.1643116713315; Tue, 25 Jan 2022 05:18:33 -0800 (PST) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, kwolf@redhat.com, mreitz@redhat.com, mlureau@redhat.com Subject: [PATCH 2/5] libvduse: Add VDUSE (vDPA Device in Userspace) library Date: Tue, 25 Jan 2022 21:17:57 +0800 Message-Id: <20220125131800.91-3-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220125131800.91-1-xieyongji@bytedance.com> References: <20220125131800.91-1-xieyongji@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Host-Lookup-Failed: Reverse DNS lookup failed for 2607:f8b0:4864:20::102c (failed) Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::102c; envelope-from=xieyongji@bytedance.com; helo=mail-pj1-x102c.google.com X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, PDS_HP_HELO_NORDNS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1643120545395100001 VDUSE [1] is a linux framework that makes it possible to implement software-emulated vDPA devices in userspace. This adds a library as a subproject to help implementing VDUSE backends in QEMU. [1] https://www.kernel.org/doc/html/latest/userspace-api/vduse.html Signed-off-by: Xie Yongji --- meson.build | 15 + meson_options.txt | 2 + scripts/meson-buildoptions.sh | 3 + subprojects/libvduse/include/atomic.h | 1 + subprojects/libvduse/libvduse.c | 1025 +++++++++++++++++++ subprojects/libvduse/libvduse.h | 193 ++++ subprojects/libvduse/meson.build | 10 + subprojects/libvduse/standard-headers/linux | 1 + 8 files changed, 1250 insertions(+) create mode 120000 subprojects/libvduse/include/atomic.h create mode 100644 subprojects/libvduse/libvduse.c create mode 100644 subprojects/libvduse/libvduse.h create mode 100644 subprojects/libvduse/meson.build create mode 120000 subprojects/libvduse/standard-headers/linux diff --git a/meson.build b/meson.build index 333c61deba..864fb50ade 100644 --- a/meson.build +++ b/meson.build @@ -1305,6 +1305,21 @@ if not get_option('fuse_lseek').disabled() endif endif =20 +have_libvduse =3D (targetos =3D=3D 'linux') +if get_option('libvduse').enabled() + if targetos !=3D 'linux' + error('libvduse requires linux') + endif +elif get_option('libvduse').disabled() + have_libvduse =3D false +endif + +libvduse =3D not_found +if have_libvduse + libvduse_proj =3D subproject('libvduse') + libvduse =3D libvduse_proj.get_variable('libvduse_dep') +endif + # libbpf libbpf =3D dependency('libbpf', required: get_option('bpf'), method: 'pkg-= config') if libbpf.found() and not cc.links(''' diff --git a/meson_options.txt b/meson_options.txt index 921967eddb..16790d1814 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -195,6 +195,8 @@ option('virtfs', type: 'feature', value: 'auto', description: 'virtio-9p support') option('virtiofsd', type: 'feature', value: 'auto', description: 'build virtiofs daemon (virtiofsd)') +option('libvduse', type: 'feature', value: 'auto', + description: 'build VDUSE Library') =20 option('capstone', type: 'combo', value: 'auto', choices: ['disabled', 'enabled', 'auto', 'system', 'internal'], diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh index a4af02c527..af5c75d758 100644 --- a/scripts/meson-buildoptions.sh +++ b/scripts/meson-buildoptions.sh @@ -58,6 +58,7 @@ meson_options_help() { printf "%s\n" ' libssh ssh block device support' printf "%s\n" ' libudev Use libudev to enumerate host devices' printf "%s\n" ' libusb libusb support for USB passthrough' + printf "%s\n" ' libvduse build VDUSE Library' printf "%s\n" ' libxml2 libxml2 support for Parallels image for= mat' printf "%s\n" ' linux-aio Linux AIO support' printf "%s\n" ' linux-io-uring Linux io_uring support' @@ -188,6 +189,8 @@ _meson_option_parse() { --disable-libudev) printf "%s" -Dlibudev=3Ddisabled ;; --enable-libusb) printf "%s" -Dlibusb=3Denabled ;; --disable-libusb) printf "%s" -Dlibusb=3Ddisabled ;; + --enable-libvduse) printf "%s" -Dlibvduse=3Denabled ;; + --disable-libvduse) printf "%s" -Dlibvduse=3Ddisabled ;; --enable-libxml2) printf "%s" -Dlibxml2=3Denabled ;; --disable-libxml2) printf "%s" -Dlibxml2=3Ddisabled ;; --enable-linux-aio) printf "%s" -Dlinux_aio=3Denabled ;; diff --git a/subprojects/libvduse/include/atomic.h b/subprojects/libvduse/i= nclude/atomic.h new file mode 120000 index 0000000000..8c2be64f7b --- /dev/null +++ b/subprojects/libvduse/include/atomic.h @@ -0,0 +1 @@ +../../../include/qemu/atomic.h \ No newline at end of file diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvdus= e.c new file mode 100644 index 0000000000..7671864bca --- /dev/null +++ b/subprojects/libvduse/libvduse.c @@ -0,0 +1,1025 @@ +/* + * VDUSE (vDPA Device in Userspace) library + * + * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights res= erved. + * Portions of codes and concepts borrowed from libvhost-user.c, so: + * Copyright IBM, Corp. 2007 + * Copyright (c) 2016 Red Hat, Inc. + * + * Author: + * Xie Yongji + * Anthony Liguori + * Marc-Andr=C3=A9 Lureau + * Victor Kaplansky + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +#include "include/atomic.h" +#include "standard-headers/linux/vhost_types.h" +#include "standard-headers/linux/vduse.h" +#include "libvduse.h" + +#define VIRTQUEUE_MAX_SIZE 1024 +#define VDUSE_VQ_ALIGN 4096 +#define MAX_IOVA_REGIONS 256 + +/* Round number down to multiple */ +#define ALIGN_DOWN(n, m) ((n) / (m) * (m)) + +/* Round number up to multiple */ +#define ALIGN_UP(n, m) ALIGN_DOWN((n) + (m) - 1, (m)) + +#ifndef unlikely +#define unlikely(x) __builtin_expect(!!(x), 0) +#endif + +typedef struct VduseRing { + unsigned int num; + uint64_t desc_addr; + uint64_t avail_addr; + uint64_t used_addr; + struct vring_desc *desc; + struct vring_avail *avail; + struct vring_used *used; +} VduseRing; + +struct VduseVirtq { + VduseRing vring; + uint16_t last_avail_idx; + uint16_t shadow_avail_idx; + uint16_t used_idx; + uint16_t signalled_used; + bool signalled_used_valid; + int index; + int inuse; + bool ready; + int fd; + VduseDev *dev; +}; + +typedef struct VduseIovaRegion { + uint64_t iova; + uint64_t size; + uint64_t mmap_offset; + uint64_t mmap_addr; +} VduseIovaRegion; + +struct VduseDev { + VduseVirtq *vqs; + VduseIovaRegion regions[MAX_IOVA_REGIONS]; + int num_regions; + char *name; + uint32_t device_id; + uint32_t vendor_id; + uint16_t num_queues; + uint16_t queue_size; + uint64_t features; + const VduseOps *ops; + int fd; + int ctrl_fd; + void *priv; +}; + +static inline bool has_feature(uint64_t features, unsigned int fbit) +{ + assert(fbit < 64); + return !!(features & (1ULL << fbit)); +} + +static inline bool vduse_dev_has_feature(VduseDev *dev, unsigned int fbit) +{ + return has_feature(dev->features, fbit); +} + +VduseDev *vduse_queue_get_dev(VduseVirtq *vq) +{ + return vq->dev; +} + +int vduse_queue_get_fd(VduseVirtq *vq) +{ + return vq->fd; +} + +void *vduse_dev_get_priv(VduseDev *dev) +{ + return dev->priv; +} + +VduseVirtq *vduse_dev_get_queue(VduseDev *dev, int index) +{ + return &dev->vqs[index]; +} + +int vduse_dev_get_fd(VduseDev *dev) +{ + return dev->fd; +} + +static int vduse_inject_irq(VduseDev *dev, int index) +{ + return ioctl(dev->fd, VDUSE_VQ_INJECT_IRQ, &index); +} + +static void vduse_iova_remove_region(VduseDev *dev, uint64_t start, + uint64_t last) +{ + int i; + + if (last =3D=3D start) { + return; + } + + for (i =3D 0; i < MAX_IOVA_REGIONS; i++) { + if (!dev->regions[i].mmap_addr) { + continue; + } + + if (start <=3D dev->regions[i].iova && + last >=3D (dev->regions[i].iova + dev->regions[i].size - 1)) { + munmap((void *)dev->regions[i].mmap_addr, + dev->regions[i].mmap_offset + dev->regions[i].size); + dev->regions[i].mmap_addr =3D 0; + dev->num_regions--; + } + } +} + +static int vduse_iova_add_region(VduseDev *dev, int fd, + uint64_t offset, uint64_t start, + uint64_t last, int prot) +{ + int i; + uint64_t size =3D last - start + 1; + void *mmap_addr =3D mmap(0, size + offset, prot, MAP_SHARED, fd, 0); + + if (mmap_addr =3D=3D MAP_FAILED) { + return -EINVAL; + } + + for (i =3D 0; i < MAX_IOVA_REGIONS; i++) { + if (!dev->regions[i].mmap_addr) { + dev->regions[i].mmap_addr =3D (uint64_t)(uintptr_t)mmap_addr; + dev->regions[i].mmap_offset =3D offset; + dev->regions[i].iova =3D start; + dev->regions[i].size =3D size; + dev->num_regions++; + break; + } + } + close(fd); + + return 0; +} + +static int perm_to_prot(uint8_t perm) +{ + int prot =3D 0; + + switch (perm) { + case VDUSE_ACCESS_WO: + prot |=3D PROT_WRITE; + break; + case VDUSE_ACCESS_RO: + prot |=3D PROT_READ; + break; + case VDUSE_ACCESS_RW: + prot |=3D PROT_READ | PROT_WRITE; + break; + default: + break; + } + + return prot; +} + +static inline void *iova_to_va(VduseDev *dev, uint64_t *plen, uint64_t iov= a) +{ + int i, ret; + struct vduse_iotlb_entry entry; + + for (i =3D 0; i < MAX_IOVA_REGIONS; i++) { + VduseIovaRegion *r =3D &dev->regions[i]; + + if (!r->mmap_addr) { + continue; + } + + if ((iova >=3D r->iova) && (iova < (r->iova + r->size))) { + if ((iova + *plen) > (r->iova + r->size)) { + *plen =3D r->iova + r->size - iova; + } + return (void *)(uintptr_t)(iova - r->iova + + r->mmap_addr + r->mmap_offset); + } + } + + entry.start =3D iova; + entry.last =3D iova + 1; + ret =3D ioctl(dev->fd, VDUSE_IOTLB_GET_FD, &entry); + if (ret < 0) { + return NULL; + } + + if (!vduse_iova_add_region(dev, ret, entry.offset, entry.start, + entry.last, perm_to_prot(entry.perm))) { + return iova_to_va(dev, plen, iova); + } + + return NULL; +} + +static inline uint16_t vring_avail_flags(VduseVirtq *vq) +{ + return le16toh(vq->vring.avail->flags); +} + +static inline uint16_t vring_avail_idx(VduseVirtq *vq) +{ + vq->shadow_avail_idx =3D le16toh(vq->vring.avail->idx); + + return vq->shadow_avail_idx; +} + +static inline uint16_t vring_avail_ring(VduseVirtq *vq, int i) +{ + return le16toh(vq->vring.avail->ring[i]); +} + +static inline uint16_t vring_get_used_event(VduseVirtq *vq) +{ + return vring_avail_ring(vq, vq->vring.num); +} + +static bool vduse_queue_get_head(VduseVirtq *vq, unsigned int idx, + unsigned int *head) +{ + /* + * Grab the next descriptor number they're advertising, and increment + * the index we've seen. + */ + *head =3D vring_avail_ring(vq, idx % vq->vring.num); + + /* If their number is silly, that's a fatal mistake. */ + if (*head >=3D vq->vring.num) { + fprintf(stderr, "Guest says index %u is available\n", *head); + return false; + } + + return true; +} + +static int +vduse_queue_read_indirect_desc(VduseDev *dev, struct vring_desc *desc, + uint64_t addr, size_t len) +{ + struct vring_desc *ori_desc; + uint64_t read_len; + + if (len > (VIRTQUEUE_MAX_SIZE * sizeof(struct vring_desc))) { + return -1; + } + + if (len =3D=3D 0) { + return -1; + } + + while (len) { + read_len =3D len; + ori_desc =3D iova_to_va(dev, &read_len, addr); + if (!ori_desc) { + return -1; + } + + memcpy(desc, ori_desc, read_len); + len -=3D read_len; + addr +=3D read_len; + desc +=3D read_len; + } + + return 0; +} + +enum { + VIRTQUEUE_READ_DESC_ERROR =3D -1, + VIRTQUEUE_READ_DESC_DONE =3D 0, /* end of chain */ + VIRTQUEUE_READ_DESC_MORE =3D 1, /* more buffers in chain */ +}; + +static int vduse_queue_read_next_desc(struct vring_desc *desc, int i, + unsigned int max, unsigned int *next) +{ + /* If this descriptor says it doesn't chain, we're done. */ + if (!(le16toh(desc[i].flags) & VRING_DESC_F_NEXT)) { + return VIRTQUEUE_READ_DESC_DONE; + } + + /* Check they're not leading us off end of descriptors. */ + *next =3D desc[i].next; + /* Make sure compiler knows to grab that: we don't want it changing! */ + smp_wmb(); + + if (*next >=3D max) { + fprintf(stderr, "Desc next is %u\n", *next); + return VIRTQUEUE_READ_DESC_ERROR; + } + + return VIRTQUEUE_READ_DESC_MORE; +} + +/* + * Fetch avail_idx from VQ memory only when we really need to know if + * guest has added some buffers. + */ +static bool vduse_queue_empty(VduseVirtq *vq) +{ + if (unlikely(!vq->vring.avail)) { + return true; + } + + if (vq->shadow_avail_idx !=3D vq->last_avail_idx) { + return false; + } + + return vring_avail_idx(vq) =3D=3D vq->last_avail_idx; +} + +static bool vduse_queue_should_notify(VduseVirtq *vq) +{ + VduseDev *dev =3D vq->dev; + uint16_t old, new; + bool v; + + /* We need to expose used array entries before checking used event. */ + smp_mb(); + + /* Always notify when queue is empty (when feature acknowledge) */ + if (vduse_dev_has_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY) && + !vq->inuse && vduse_queue_empty(vq)) { + return true; + } + + if (!vduse_dev_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) { + return !(vring_avail_flags(vq) & VRING_AVAIL_F_NO_INTERRUPT); + } + + v =3D vq->signalled_used_valid; + vq->signalled_used_valid =3D true; + old =3D vq->signalled_used; + new =3D vq->signalled_used =3D vq->used_idx; + return !v || vring_need_event(vring_get_used_event(vq), new, old); +} + +void vduse_queue_notify(VduseVirtq *vq) +{ + VduseDev *dev =3D vq->dev; + + if (unlikely(!vq->vring.avail)) { + return; + } + + if (!vduse_queue_should_notify(vq)) { + return; + } + + if (vduse_inject_irq(dev, vq->index) < 0) { + fprintf(stderr, "Error inject irq for vq %d: %s\n", + vq->index, strerror(errno)); + } +} + +static inline void vring_used_flags_set_bit(VduseVirtq *vq, int mask) +{ + uint16_t *flags; + + flags =3D (uint16_t *)((char*)vq->vring.used + + offsetof(struct vring_used, flags)); + *flags =3D htole16(le16toh(*flags) | mask); +} + +static inline void vring_used_flags_unset_bit(VduseVirtq *vq, int mask) +{ + uint16_t *flags; + + flags =3D (uint16_t *)((char*)vq->vring.used + + offsetof(struct vring_used, flags)); + *flags =3D htole16(le16toh(*flags) & ~mask); +} + +static inline void vring_set_avail_event(VduseVirtq *vq, uint16_t val) +{ + *((uint16_t *)&vq->vring.used->ring[vq->vring.num]) =3D htole16(val); +} + +static bool vduse_queue_map_single_desc(VduseVirtq *vq, unsigned int *p_nu= m_sg, + struct iovec *iov, unsigned int max_num= _sg, + bool is_write, uint64_t pa, size_t sz) +{ + unsigned num_sg =3D *p_num_sg; + VduseDev *dev =3D vq->dev; + + assert(num_sg <=3D max_num_sg); + + if (!sz) { + fprintf(stderr, "virtio: zero sized buffers are not allowed\n"); + return false; + } + + while (sz) { + uint64_t len =3D sz; + + if (num_sg =3D=3D max_num_sg) { + fprintf(stderr, + "virtio: too many descriptors in indirect table\n"); + return false; + } + + iov[num_sg].iov_base =3D iova_to_va(dev, &len, pa); + if (iov[num_sg].iov_base =3D=3D NULL) { + fprintf(stderr, "virtio: invalid address for buffers\n"); + return false; + } + iov[num_sg++].iov_len =3D len; + sz -=3D len; + pa +=3D len; + } + + *p_num_sg =3D num_sg; + return true; +} + +static void *vduse_queue_alloc_element(size_t sz, unsigned out_num, + unsigned in_num) +{ + VduseVirtqElement *elem; + size_t in_sg_ofs =3D ALIGN_UP(sz, __alignof__(elem->in_sg[0])); + size_t out_sg_ofs =3D in_sg_ofs + in_num * sizeof(elem->in_sg[0]); + size_t out_sg_end =3D out_sg_ofs + out_num * sizeof(elem->out_sg[0]); + + assert(sz >=3D sizeof(VduseVirtqElement)); + elem =3D malloc(out_sg_end); + elem->out_num =3D out_num; + elem->in_num =3D in_num; + elem->in_sg =3D (void *)elem + in_sg_ofs; + elem->out_sg =3D (void *)elem + out_sg_ofs; + return elem; +} + +static void *vduse_queue_map_desc(VduseVirtq *vq, unsigned int idx, size_t= sz) +{ + struct vring_desc *desc =3D vq->vring.desc; + VduseDev *dev =3D vq->dev; + uint64_t desc_addr, read_len; + unsigned int desc_len; + unsigned int max =3D vq->vring.num; + unsigned int i =3D idx; + VduseVirtqElement *elem; + struct iovec iov[VIRTQUEUE_MAX_SIZE]; + struct vring_desc desc_buf[VIRTQUEUE_MAX_SIZE]; + unsigned int out_num =3D 0, in_num =3D 0; + int rc; + + if (le16toh(desc[i].flags) & VRING_DESC_F_INDIRECT) { + if (le32toh(desc[i].len) % sizeof(struct vring_desc)) { + fprintf(stderr, "Invalid size for indirect buffer table\n"); + return NULL; + } + + /* loop over the indirect descriptor table */ + desc_addr =3D le64toh(desc[i].addr); + desc_len =3D le32toh(desc[i].len); + max =3D desc_len / sizeof(struct vring_desc); + read_len =3D desc_len; + desc =3D iova_to_va(dev, &read_len, desc_addr); + if (unlikely(desc && read_len !=3D desc_len)) { + /* Failed to use zero copy */ + desc =3D NULL; + if (!vduse_queue_read_indirect_desc(dev, desc_buf, + desc_addr, + desc_len)) { + desc =3D desc_buf; + } + } + if (!desc) { + fprintf(stderr, "Invalid indirect buffer table\n"); + return NULL; + } + i =3D 0; + } + + /* Collect all the descriptors */ + do { + if (le16toh(desc[i].flags) & VRING_DESC_F_WRITE) { + if (!vduse_queue_map_single_desc(vq, &in_num, iov + out_num, + VIRTQUEUE_MAX_SIZE - out_num, + true, le64toh(desc[i].addr), + le32toh(desc[i].len))) { + return NULL; + } + } else { + if (in_num) { + fprintf(stderr, "Incorrect order for descriptors\n"); + return NULL; + } + if (!vduse_queue_map_single_desc(vq, &out_num, iov, + VIRTQUEUE_MAX_SIZE, false, + le64toh(desc[i].addr), + le32toh(desc[i].len))) { + return NULL; + } + } + + /* If we've got too many, that implies a descriptor loop. */ + if ((in_num + out_num) > max) { + fprintf(stderr, "Looped descriptor\n"); + return NULL; + } + rc =3D vduse_queue_read_next_desc(desc, i, max, &i); + } while (rc =3D=3D VIRTQUEUE_READ_DESC_MORE); + + if (rc =3D=3D VIRTQUEUE_READ_DESC_ERROR) { + fprintf(stderr, "read descriptor error\n"); + return NULL; + } + + /* Now copy what we have collected and mapped */ + elem =3D vduse_queue_alloc_element(sz, out_num, in_num); + elem->index =3D idx; + for (i =3D 0; i < out_num; i++) { + elem->out_sg[i] =3D iov[i]; + } + for (i =3D 0; i < in_num; i++) { + elem->in_sg[i] =3D iov[out_num + i]; + } + + return elem; +} + +void *vduse_queue_pop(VduseVirtq *vq, size_t sz) +{ + unsigned int head; + VduseVirtqElement *elem; + VduseDev *dev =3D vq->dev; + + if (unlikely(!vq->vring.avail)) { + return NULL; + } + + if (vduse_queue_empty(vq)) { + return NULL; + } + /* Needed after virtio_queue_empty() */ + smp_rmb(); + + if (vq->inuse >=3D vq->vring.num) { + fprintf(stderr, "Virtqueue size exceeded: %d\n", vq->inuse); + return NULL; + } + + if (!vduse_queue_get_head(vq, vq->last_avail_idx++, &head)) { + return NULL; + } + + if (vduse_dev_has_feature(dev, VIRTIO_RING_F_EVENT_IDX)) { + vring_set_avail_event(vq, vq->last_avail_idx); + } + + elem =3D vduse_queue_map_desc(vq, head, sz); + + if (!elem) { + return NULL; + } + + vq->inuse++; + + return elem; +} + +static inline void vring_used_write(VduseVirtq *vq, + struct vring_used_elem *uelem, int i) +{ + struct vring_used *used =3D vq->vring.used; + + used->ring[i] =3D *uelem; +} + +static void vduse_queue_fill(VduseVirtq *vq, const VduseVirtqElement *elem, + unsigned int len, unsigned int idx) +{ + struct vring_used_elem uelem; + + if (unlikely(!vq->vring.used)) { + return; + } + + idx =3D (idx + vq->used_idx) % vq->vring.num; + + uelem.id =3D htole32(elem->index); + uelem.len =3D htole32(len); + vring_used_write(vq, &uelem, idx); +} + +static inline void vring_used_idx_set(VduseVirtq *vq, uint16_t val) +{ + vq->vring.used->idx =3D htole16(val); + vq->used_idx =3D val; +} + +static void vduse_queue_flush(VduseVirtq *vq, unsigned int count) +{ + uint16_t old, new; + + if (unlikely(!vq->vring.used)) { + return; + } + + /* Make sure buffer is written before we update index. */ + smp_wmb(); + + old =3D vq->used_idx; + new =3D old + count; + vring_used_idx_set(vq, new); + vq->inuse -=3D count; + if (unlikely((int16_t)(new - vq->signalled_used) < (uint16_t)(new - ol= d))) { + vq->signalled_used_valid =3D false; + } +} + +void vduse_queue_push(VduseVirtq *vq, const VduseVirtqElement *elem, + unsigned int len) +{ + vduse_queue_fill(vq, elem, len, 0); + vduse_queue_flush(vq, 1); +} + +static int vduse_queue_update_vring(VduseVirtq *vq, uint64_t desc_addr, + uint64_t avail_addr, uint64_t used_add= r) +{ + struct VduseDev *dev =3D vq->dev; + uint64_t len; + + len =3D sizeof(struct vring_desc); + vq->vring.desc =3D iova_to_va(dev, &len, desc_addr); + assert(len =3D=3D sizeof(struct vring_desc)); + + len =3D sizeof(struct vring_avail); + vq->vring.avail =3D iova_to_va(dev, &len, avail_addr); + assert(len =3D=3D sizeof(struct vring_avail)); + + len =3D sizeof(struct vring_used); + vq->vring.used =3D iova_to_va(dev, &len, used_addr); + assert(len =3D=3D sizeof(struct vring_used)); + + if (!vq->vring.desc || !vq->vring.avail || !vq->vring.used) { + fprintf(stderr, "Failed to get vq[%d] iova mapping\n", vq->index); + return -EINVAL; + } + + return 0; +} + +static void vduse_queue_enable(VduseVirtq *vq) +{ + struct VduseDev *dev =3D vq->dev; + struct vduse_vq_info vq_info; + struct vduse_vq_eventfd vq_eventfd; + int fd; + + vq_info.index =3D vq->index; + if (ioctl(dev->fd, VDUSE_VQ_GET_INFO, &vq_info)) { + fprintf(stderr, "Failed to get vq[%d] info: %s\n", + vq->index, strerror(errno)); + return; + } + + if (!vq_info.ready) { + return; + } + + vq->vring.num =3D vq_info.num; + vq->vring.desc_addr =3D vq_info.desc_addr; + vq->vring.avail_addr =3D vq_info.driver_addr; + vq->vring.used_addr =3D vq_info.device_addr; + + if (vduse_queue_update_vring(vq, vq_info.desc_addr, + vq_info.driver_addr, vq_info.device_addr)= ) { + fprintf(stderr, "Failed to update vring for vq[%d]\n", vq->index); + return; + } + + fd =3D eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); + if (fd < 0) { + fprintf(stderr, "Failed to init eventfd for vq[%d]\n", vq->index); + return; + } + + vq_eventfd.index =3D vq->index; + vq_eventfd.fd =3D fd; + if (ioctl(dev->fd, VDUSE_VQ_SETUP_KICKFD, &vq_eventfd)) { + fprintf(stderr, "Failed to setup kick fd for vq[%d]\n", vq->index); + close(fd); + return; + } + + vq->fd =3D fd; + vq->shadow_avail_idx =3D vq->last_avail_idx =3D vq_info.split.avail_in= dex; + vq->inuse =3D 0; + vq->used_idx =3D 0; + vq->signalled_used_valid =3D false; + vq->ready =3D true; + + dev->ops->enable_queue(dev, vq); +} + +static void vduse_queue_disable(VduseVirtq *vq) +{ + struct VduseDev *dev =3D vq->dev; + struct vduse_vq_eventfd eventfd; + + if (!vq->ready) { + return; + } + + dev->ops->disable_queue(dev, vq); + + eventfd.index =3D vq->index; + eventfd.fd =3D VDUSE_EVENTFD_DEASSIGN; + ioctl(dev->fd, VDUSE_VQ_SETUP_KICKFD, &eventfd); + close(vq->fd); + + assert(vq->inuse =3D=3D 0); + + vq->vring.num =3D 0; + vq->vring.desc_addr =3D 0; + vq->vring.avail_addr =3D 0; + vq->vring.used_addr =3D 0; + vq->vring.desc =3D 0; + vq->vring.avail =3D 0; + vq->vring.used =3D 0; + vq->ready =3D false; + vq->fd =3D -1; +} + +static void vduse_dev_start_dataplane(VduseDev *dev) +{ + int i; + + if (ioctl(dev->fd, VDUSE_DEV_GET_FEATURES, &dev->features)) { + fprintf(stderr, "Failed to get features: %s\n", strerror(errno)); + return; + } + + for (i =3D 0; i < dev->num_queues; i++) { + vduse_queue_enable(&dev->vqs[i]); + } +} + +static void vduse_dev_stop_dataplane(VduseDev *dev) +{ + int i; + + for (i =3D 0; i < dev->num_queues; i++) { + vduse_queue_disable(&dev->vqs[i]); + } + dev->features =3D 0; + vduse_iova_remove_region(dev, 0, ULONG_MAX); +} + +int vduse_dev_handler(VduseDev *dev) +{ + struct vduse_dev_request req; + struct vduse_dev_response resp =3D { 0 }; + VduseVirtq *vq; + int i, ret; + + ret =3D read(dev->fd, &req, sizeof(req)); + if (ret !=3D sizeof(req)) { + fprintf(stderr, "Read request error [%d]: %s\n", + ret, strerror(errno)); + return -errno; + } + resp.request_id =3D req.request_id; + + switch (req.type) { + case VDUSE_GET_VQ_STATE: + vq =3D &dev->vqs[req.vq_state.index]; + resp.vq_state.split.avail_index =3D vq->last_avail_idx; + resp.result =3D VDUSE_REQ_RESULT_OK; + break; + case VDUSE_SET_STATUS: + if (req.s.status & VIRTIO_CONFIG_S_DRIVER_OK) { + vduse_dev_start_dataplane(dev); + } else if (req.s.status =3D=3D 0) { + vduse_dev_stop_dataplane(dev); + } + resp.result =3D VDUSE_REQ_RESULT_OK; + break; + case VDUSE_UPDATE_IOTLB: + /* The iova will be updated by iova_to_va() later, so just remove = it */ + vduse_iova_remove_region(dev, req.iova.start, req.iova.last); + for (i =3D 0; i < dev->num_queues; i++) { + VduseVirtq *vq =3D &dev->vqs[i]; + if (vq->ready) { + if (vduse_queue_update_vring(vq, vq->vring.desc_addr, + vq->vring.avail_addr, + vq->vring.used_addr)) { + fprintf(stderr, "Failed to update vring for vq[%d]\n", + vq->index); + } + } + } + resp.result =3D VDUSE_REQ_RESULT_OK; + break; + default: + resp.result =3D VDUSE_REQ_RESULT_FAILED; + break; + } + + ret =3D write(dev->fd, &resp, sizeof(resp)); + if (ret !=3D sizeof(resp)) { + fprintf(stderr, "Write request %d error [%d]: %s\n", + req.type, ret, strerror(errno)); + return -errno; + } + return 0; +} + +int vduse_dev_update_config(VduseDev *dev, uint32_t size, + uint32_t offset, char *buffer) +{ + int ret; + struct vduse_config_data *data; + + data =3D malloc(offsetof(struct vduse_config_data, buffer) + size); + if (!data) { + return -ENOMEM; + } + + data->offset =3D offset; + data->length =3D size; + memcpy(data->buffer, buffer, size); + + ret =3D ioctl(dev->fd, VDUSE_DEV_SET_CONFIG, data); + free(data); + + if (ret) { + return -errno; + } + + if (ioctl(dev->fd, VDUSE_DEV_INJECT_CONFIG_IRQ)) { + return -errno; + } + + return 0; +} + +int vduse_dev_setup_queue(VduseDev *dev, int index, int max_size) +{ + VduseVirtq *vq =3D &dev->vqs[index]; + struct vduse_vq_config vq_config =3D { 0 }; + + vq_config.index =3D vq->index; + vq_config.max_size =3D max_size; + + if (ioctl(dev->fd, VDUSE_VQ_SETUP, &vq_config)) { + return -errno; + } + + return 0; +} + +VduseDev *vduse_dev_create(const char *name, uint32_t device_id, + uint32_t vendor_id, uint64_t features, + uint16_t num_queues, uint32_t config_size, + char *config, const VduseOps *ops, void *priv) +{ + VduseDev *dev; + int i, ret, ctrl_fd, fd =3D -1; + uint64_t version; + char dev_path[VDUSE_NAME_MAX + 16]; + VduseVirtq *vqs =3D NULL; + struct vduse_dev_config *dev_config =3D NULL; + size_t size =3D offsetof(struct vduse_dev_config, config); + + if (!name || strlen(name) > VDUSE_NAME_MAX || !config || + !config_size || !ops || !ops->enable_queue || !ops->disable_queue)= { + fprintf(stderr, "Invalid parameter for vduse\n"); + return NULL; + } + + dev =3D malloc(sizeof(VduseDev)); + if (!dev) { + fprintf(stderr, "Failed to allocate vduse device\n"); + return NULL; + } + memset(dev, 0, sizeof(VduseDev)); + + ctrl_fd =3D open("/dev/vduse/control", O_RDWR); + if (ctrl_fd < 0) { + fprintf(stderr, "Failed to open /dev/vduse/control: %s\n", + strerror(errno)); + goto err_ctrl; + } + + version =3D VDUSE_API_VERSION; + if (ioctl(ctrl_fd, VDUSE_SET_API_VERSION, &version)) { + fprintf(stderr, "Failed to set api version %lu: %s\n", + version, strerror(errno)); + goto err_dev; + } + + dev_config =3D malloc(size + config_size); + if (!dev_config) { + fprintf(stderr, "Failed to allocate config space\n"); + goto err_dev; + } + memset(dev_config, 0, size + config_size); + + strcpy(dev_config->name, name); + dev_config->device_id =3D device_id; + dev_config->vendor_id =3D vendor_id; + dev_config->features =3D features; + dev_config->vq_num =3D num_queues; + dev_config->vq_align =3D VDUSE_VQ_ALIGN; + dev_config->config_size =3D config_size; + memcpy(dev_config->config, config, config_size); + + ret =3D ioctl(ctrl_fd, VDUSE_CREATE_DEV, dev_config); + free(dev_config); + if (ret < 0) { + fprintf(stderr, "Failed to create vduse dev %s: %s\n", + name, strerror(errno)); + goto err_dev; + } + + sprintf(dev_path, "/dev/vduse/%s", name); + fd =3D open(dev_path, O_RDWR); + if (fd < 0) { + fprintf(stderr, "Failed to open vduse dev %s: %s\n", + name, strerror(errno)); + goto err; + } + + vqs =3D calloc(sizeof(VduseVirtq), num_queues); + if (!vqs) { + fprintf(stderr, "Failed to allocate virtqueues\n"); + goto err; + } + + for (i =3D 0; i < num_queues; i++) { + vqs[i].index =3D i; + vqs[i].dev =3D dev; + vqs[i].fd =3D -1; + } + + dev->vqs =3D vqs; + dev->name =3D strdup(name); + dev->num_queues =3D num_queues; + dev->ops =3D ops; + dev->ctrl_fd =3D ctrl_fd; + dev->fd =3D fd; + dev->priv =3D priv; + + return dev; +err: + if (fd > 0) { + close(fd); + } + ioctl(ctrl_fd, VDUSE_DESTROY_DEV, name); +err_dev: + close(ctrl_fd); +err_ctrl: + free(dev); + + return NULL; +} + +void vduse_dev_destroy(VduseDev *dev) +{ + free(dev->vqs); + close(dev->fd); + dev->fd =3D -1; + ioctl(dev->ctrl_fd, VDUSE_DESTROY_DEV, dev->name); + free(dev->name); + close(dev->ctrl_fd); + dev->ctrl_fd =3D -1; + free(dev); +} diff --git a/subprojects/libvduse/libvduse.h b/subprojects/libvduse/libvdus= e.h new file mode 100644 index 0000000000..f6bcb51b5a --- /dev/null +++ b/subprojects/libvduse/libvduse.h @@ -0,0 +1,193 @@ +/* + * VDUSE (vDPA Device in Userspace) library + * + * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights res= erved. + * + * Author: + * Xie Yongji + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#ifndef LIBVDUSE_H +#define LIBVDUSE_H + +#include +#include + +/* VDUSE device structure */ +typedef struct VduseDev VduseDev; + +/* Virtqueue structure */ +typedef struct VduseVirtq VduseVirtq; + +/* Some operation of VDUSE backend */ +typedef struct VduseOps { + /* Called when virtqueue can be processed */ + void (*enable_queue)(VduseDev *dev, VduseVirtq *vq); + /* Called when virtqueue processing should be stopped */ + void (*disable_queue)(VduseDev *dev, VduseVirtq *vq); +} VduseOps; + +/* Describing elements of the I/O buffer */ +typedef struct VduseVirtqElement { + /* Virtqueue index */ + unsigned int index; + /* Number of physically-contiguous device-readable descriptors */ + unsigned int out_num; + /* Number of physically-contiguous device-writable descriptors */ + unsigned int in_num; + /* Array to store physically-contiguous device-writable descriptors */ + struct iovec *in_sg; + /* Array to store physically-contiguous device-readable descriptors */ + struct iovec *out_sg; +} VduseVirtqElement; + +/** + * vduse_queue_get_dev: + * @vq: specified virtqueue + * + * Get corresponding VDUSE device from the virtqueue. + * + * Returns: a pointer to VDUSE device on success, NULL on failure. + */ +VduseDev *vduse_queue_get_dev(VduseVirtq *vq); + +/** + * vduse_queue_get_fd: + * @vq: specified virtqueue + * + * Get the kick fd for the virtqueue. + * + * Returns: file descriptor on success, -1 on failure. + */ +int vduse_queue_get_fd(VduseVirtq *vq); + +/** + * vduse_queue_pop: + * @vq: specified virtqueue + * @sz: the size of struct to return (must be >=3D VduseVirtqElement) + * + * Pop an element from virtqueue available ring. + * + * Returns: a pointer to a structure containing VduseVirtqElement on succe= ss, + * NULL on failure. + */ +void *vduse_queue_pop(VduseVirtq *vq, size_t sz); + +/** + * vduse_queue_push: + * @vq: specified virtqueue + * @elem: pointer to VduseVirtqElement returned by vduse_queue_pop() + * @len: length in bytes to write + * + * Push an element to virtqueue used ring. + */ +void vduse_queue_push(VduseVirtq *vq, const VduseVirtqElement *elem, + unsigned int len); +/** + * vduse_queue_notify: + * @vq: specified virtqueue + * + * Request to notify the queue. + */ +void vduse_queue_notify(VduseVirtq *vq); + +/** + * vduse_dev_get_priv: + * @dev: VDUSE device + * + * Get the private pointer passed to vduse_dev_create(). + * + * Returns: private pointer on success, NULL on failure. + */ +void *vduse_dev_get_priv(VduseDev *dev); + +/** + * vduse_dev_get_queue: + * @dev: VDUSE device + * @index: virtqueue index + * + * Get the specified virtqueue. + * + * Returns: a pointer to the virtqueue on success, NULL on failure. + */ +VduseVirtq *vduse_dev_get_queue(VduseDev *dev, int index); + +/** + * vduse_dev_get_fd: + * @dev: VDUSE device + * + * Get the control message fd for the VDUSE device. + * + * Returns: file descriptor on success, -1 on failure. + */ +int vduse_dev_get_fd(VduseDev *dev); + +/** + * vduse_dev_handler: + * @dev: VDUSE device + * + * Used to process the control message. + * + * Returns: file descriptor on success, -errno on failure. + */ +int vduse_dev_handler(VduseDev *dev); + +/** + * vduse_dev_update_config: + * @dev: VDUSE device + * @size: the size to write to configuration space + * @offset: the offset from the beginning of configuration space + * @buffer: the buffer used to write from + * + * Update device configuration space and inject a config interrupt. + * + * Returns: 0 on success, -errno on failure. + */ +int vduse_dev_update_config(VduseDev *dev, uint32_t size, + uint32_t offset, char *buffer); + +/** + * vduse_dev_setup_queue: + * @dev: VDUSE device + * @index: virtqueue index + * @max_size: the max size of virtqueue + * + * Setup the specified virtqueue. + * + * Returns: 0 on success, -errno on failure. + */ +int vduse_dev_setup_queue(VduseDev *dev, int index, int max_size); + +/** + * vduse_dev_create: + * @name: VDUSE device name + * @device_id: virtio device id + * @vendor_id: virtio vendor id + * @features: virtio features + * @num_queues: the number of virtqueues + * @config_size: the size of the configuration space + * @config: the buffer of the configuration space + * @ops: the operation of VDUSE backend + * @priv: private pointer + * + * Create VDUSE device. + * + * Returns: pointer to VDUSE device on success, NULL on failure. + */ +VduseDev *vduse_dev_create(const char *name, uint32_t device_id, + uint32_t vendor_id, uint64_t features, + uint16_t num_queues, uint32_t config_size, + char *config, const VduseOps *ops, void *priv); + +/** + * vduse_dev_destroy: + * @dev: VDUSE device + * + * Destroy the VDUSE device. + */ +void vduse_dev_destroy(VduseDev *dev); + +#endif diff --git a/subprojects/libvduse/meson.build b/subprojects/libvduse/meson.= build new file mode 100644 index 0000000000..ba08f5ee1a --- /dev/null +++ b/subprojects/libvduse/meson.build @@ -0,0 +1,10 @@ +project('libvduse', 'c', + license: 'GPL-2.0-or-later', + default_options: ['c_std=3Dgnu99']) + +libvduse =3D static_library('vduse', + files('libvduse.c'), + c_args: '-D_GNU_SOURCE') + +libvduse_dep =3D declare_dependency(link_with: libvduse, + include_directories: include_directories= ('.')) diff --git a/subprojects/libvduse/standard-headers/linux b/subprojects/libv= duse/standard-headers/linux new file mode 120000 index 0000000000..c416f068ac --- /dev/null +++ b/subprojects/libvduse/standard-headers/linux @@ -0,0 +1 @@ +../../../include/standard-headers/linux/ \ No newline at end of file --=20 2.20.1 From nobody Mon May 6 21:09:16 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=bytedance.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1643120702680721.7151496331965; Tue, 25 Jan 2022 06:25:02 -0800 (PST) Received: from localhost ([::1]:52456 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nCMkn-0005TP-Qe for importer@patchew.org; Tue, 25 Jan 2022 09:25:01 -0500 Received: from eggs.gnu.org ([209.51.188.92]:42356) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCLie-0007SX-8q for qemu-devel@nongnu.org; Tue, 25 Jan 2022 08:18:44 -0500 Received: from [2607:f8b0:4864:20::1035] (port=51096 helo=mail-pj1-x1035.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nCLia-0004yD-0r for qemu-devel@nongnu.org; Tue, 25 Jan 2022 08:18:42 -0500 Received: by mail-pj1-x1035.google.com with SMTP id o11so2752208pjf.0 for ; Tue, 25 Jan 2022 05:18:38 -0800 (PST) Received: from localhost ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id x7sm14701622pgr.87.2022.01.25.05.18.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Jan 2022 05:18:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=BmtGQA5DpO0+f6kPv+xipJRYB/H3tEHgZCOeDu1ZZok=; b=pAC5zWnwDMXv+wLQiLiuO5vR71AaVjhhtIEQZUCm85mQEV//V414U7MMnxHaNlnDS7 9elObsqjtBy3zZobkzwXv8Oi4MwI2uMei6TzjZ3nPyx0oQDK9NIv4beo/bv/Py2BrUQu /08pDNDb/AC8nEsXqGivGgqACiYricNAN39CejFlPQtBXeGJ3hITyqY00JAMtNRqOxzn NNDxoujhE2HWHXueJQqOjfIpKMjoB97EteWEQ+K4XhxuMzc20uYXklNb3vDM4yBQ1YdN Vynp3egLY4HF0sJN+vBVDcj1bCIzen77Qldp5pfEW2FoltNli1jSMTahPeLPR+a6so5Q LDhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=BmtGQA5DpO0+f6kPv+xipJRYB/H3tEHgZCOeDu1ZZok=; b=SNE8JF6uqZVIh6fYsyXujd8A7Q8YGnEtOVyGZuoFXqo7Q3yPHMt1zvGKib0k8huBm/ 9W4rI+foIIBS/GWr+LMshlhKjNSBpHEL0YNq5TFQDQ6mpPbjaPmptSURdqQOtMBe9KUg xBPSvmFdPsiDCglVffOd1nUzXPYTALMLj+GSBfoXrZdbA5IL9zGSDDBqP2EFryF8OVPr YIA2vh5ehqqlfBTkrpPEMZ83cxHJsTMA/0yRz8JgtgADePrfUxrVVSoz3kZ3IRjGPkvO 8s6Bf7kVWzrxwL55r//kJBQQOO2+5UBiAQCbnogq9/dcYfGFO19vGdSKh08jnJoId5A4 lPKQ== X-Gm-Message-State: AOAM5308AUEA//t+9wu7iY5MXsemjUnTqL+Ef3aATrMn1Z2b+q6QuDkh ujMeUCmOWPKcD/mgvNzEqABV X-Google-Smtp-Source: ABdhPJxTvzKUVhGH4oidubWWySWvI4f0jtLAuesidDCBIdDigN7cRkmpkbgTeTcJbifvxi1kCvF/jA== X-Received: by 2002:a17:902:720b:b0:14b:81e8:e9ce with SMTP id ba11-20020a170902720b00b0014b81e8e9cemr787966plb.155.1643116717127; Tue, 25 Jan 2022 05:18:37 -0800 (PST) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, kwolf@redhat.com, mreitz@redhat.com, mlureau@redhat.com Subject: [PATCH 3/5] vduse-blk: implements vduse-blk export Date: Tue, 25 Jan 2022 21:17:58 +0800 Message-Id: <20220125131800.91-4-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220125131800.91-1-xieyongji@bytedance.com> References: <20220125131800.91-1-xieyongji@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Host-Lookup-Failed: Reverse DNS lookup failed for 2607:f8b0:4864:20::1035 (failed) Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::1035; envelope-from=xieyongji@bytedance.com; helo=mail-pj1-x1035.google.com X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, PDS_HP_HELO_NORDNS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1643120706637100001 Content-Type: text/plain; charset="utf-8" This implements a VDUSE block backends based on the libvduse library. We can use it to export the BDSs for both VM and container (host) usage. The new command-line syntax is: $ qemu-storage-daemon \ --blockdev file,node-name=3Ddrive0,filename=3Dtest.img \ --export vduse-blk,node-name=3Ddrive0,id=3Dvduse-export0,writable=3Don After the qemu-storage-daemon started, we need to use the "vdpa" command to attach the device to vDPA bus: $ vdpa dev add name vduse-export0 mgmtdev vduse Also the device must be removed via the "vdpa" command before we stop the qemu-storage-daemon. Signed-off-by: Xie Yongji --- block/export/export.c | 6 + block/export/meson.build | 5 + block/export/vduse-blk.c | 427 ++++++++++++++++++++++++++++++++++ block/export/vduse-blk.h | 20 ++ meson.build | 13 ++ meson_options.txt | 2 + qapi/block-export.json | 24 +- scripts/meson-buildoptions.sh | 4 + 8 files changed, 499 insertions(+), 2 deletions(-) create mode 100644 block/export/vduse-blk.c create mode 100644 block/export/vduse-blk.h diff --git a/block/export/export.c b/block/export/export.c index 6d3b9964c8..00dd505540 100644 --- a/block/export/export.c +++ b/block/export/export.c @@ -26,6 +26,9 @@ #ifdef CONFIG_VHOST_USER_BLK_SERVER #include "vhost-user-blk-server.h" #endif +#ifdef CONFIG_VDUSE_BLK_EXPORT +#include "vduse-blk.h" +#endif =20 static const BlockExportDriver *blk_exp_drivers[] =3D { &blk_exp_nbd, @@ -35,6 +38,9 @@ static const BlockExportDriver *blk_exp_drivers[] =3D { #ifdef CONFIG_FUSE &blk_exp_fuse, #endif +#ifdef CONFIG_VDUSE_BLK_EXPORT + &blk_exp_vduse_blk, +#endif }; =20 /* Only accessed from the main thread */ diff --git a/block/export/meson.build b/block/export/meson.build index 0a08e384c7..cf311d2b1b 100644 --- a/block/export/meson.build +++ b/block/export/meson.build @@ -5,3 +5,8 @@ if have_vhost_user_blk_server endif =20 blockdev_ss.add(when: fuse, if_true: files('fuse.c')) + +if have_vduse_blk_export + blockdev_ss.add(files('vduse-blk.c')) + blockdev_ss.add(libvduse) +endif diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c new file mode 100644 index 0000000000..5a8d289685 --- /dev/null +++ b/block/export/vduse-blk.c @@ -0,0 +1,427 @@ +/* + * Export QEMU block device via VDUSE + * + * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights res= erved. + * Portions of codes and concepts borrowed from vhost-user-blk-server.c,= so: + * Copyright (c) 2020 Red Hat, Inc. + * + * Author: + * Xie Yongji + * Coiby Xu + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#include + +#include "qemu/osdep.h" +#include "qapi/error.h" +#include "sysemu/block-backend.h" +#include "block/export.h" +#include "qemu/error-report.h" +#include "util/block-helpers.h" +#include "subprojects/libvduse/libvduse.h" + +#include "standard-headers/linux/virtio_ring.h" +#include "standard-headers/linux/virtio_blk.h" + +#define VIRTIO_BLK_SECTOR_BITS 9 +#define VIRTIO_BLK_SECTOR_SIZE (1ULL << VIRTIO_BLK_SECTOR_BITS) + +#define VDUSE_DEFAULT_NUM_QUEUE 1 +#define VDUSE_DEFAULT_QUEUE_SIZE 128 + +typedef struct VduseBlkExport { + BlockExport export; + VduseDev *dev; + uint16_t num_queues; + uint32_t blk_size; + bool writable; +} VduseBlkExport; + +struct virtio_blk_inhdr { + unsigned char status; +}; + +typedef struct VduseBlkReq { + VduseVirtqElement elem; + int64_t sector_num; + size_t in_len; + struct virtio_blk_inhdr *in; + struct virtio_blk_outhdr out; + VduseVirtq *vq; +} VduseBlkReq; + +static void vduse_blk_req_complete(VduseBlkReq *req) +{ + vduse_queue_push(req->vq, &req->elem, req->in_len); + vduse_queue_notify(req->vq); + + free(req); +} + +static bool vduse_blk_sect_range_ok(VduseBlkExport *vblk_exp, + uint64_t sector, size_t size) +{ + uint64_t nb_sectors; + uint64_t total_sectors; + + if (size % VIRTIO_BLK_SECTOR_SIZE) { + return false; + } + + nb_sectors =3D size >> VIRTIO_BLK_SECTOR_BITS; + + QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE !=3D VIRTIO_BLK_SECTOR_SIZE); + if (nb_sectors > BDRV_REQUEST_MAX_SECTORS) { + return false; + } + if ((sector << VIRTIO_BLK_SECTOR_BITS) % vblk_exp->blk_size) { + return false; + } + blk_get_geometry(vblk_exp->export.blk, &total_sectors); + if (sector > total_sectors || nb_sectors > total_sectors - sector) { + return false; + } + return true; +} + +static void coroutine_fn vduse_blk_virtio_process_req(void *opaque) +{ + VduseBlkReq *req =3D opaque; + VduseVirtq *vq =3D req->vq; + VduseDev *dev =3D vduse_queue_get_dev(vq); + VduseBlkExport *vblk_exp =3D vduse_dev_get_priv(dev); + BlockBackend *blk =3D vblk_exp->export.blk; + VduseVirtqElement *elem =3D &req->elem; + struct iovec *in_iov =3D elem->in_sg; + struct iovec *out_iov =3D elem->out_sg; + unsigned in_num =3D elem->in_num; + unsigned out_num =3D elem->out_num; + uint32_t type; + + if (elem->out_num < 1 || elem->in_num < 1) { + error_report("virtio-blk request missing headers"); + goto err; + } + + if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out, + sizeof(req->out)) !=3D sizeof(req->out))) { + error_report("virtio-blk request outhdr too short"); + goto err; + } + + iov_discard_front(&out_iov, &out_num, sizeof(req->out)); + + if (in_iov[in_num - 1].iov_len < sizeof(struct virtio_blk_inhdr)) { + error_report("virtio-blk request inhdr too short"); + goto err; + } + + /* We always touch the last byte, so just see how big in_iov is. */ + req->in_len =3D iov_size(in_iov, in_num); + req->in =3D (void *)in_iov[in_num - 1].iov_base + + in_iov[in_num - 1].iov_len + - sizeof(struct virtio_blk_inhdr); + iov_discard_back(in_iov, &in_num, sizeof(struct virtio_blk_inhdr)); + + type =3D le32_to_cpu(req->out.type); + switch (type & ~VIRTIO_BLK_T_BARRIER) { + case VIRTIO_BLK_T_IN: + case VIRTIO_BLK_T_OUT: { + QEMUIOVector qiov; + int64_t offset; + ssize_t ret =3D 0; + bool is_write =3D type & VIRTIO_BLK_T_OUT; + req->sector_num =3D le64_to_cpu(req->out.sector); + + if (is_write && !vblk_exp->writable) { + req->in->status =3D VIRTIO_BLK_S_IOERR; + break; + } + + if (is_write) { + qemu_iovec_init_external(&qiov, out_iov, out_num); + } else { + qemu_iovec_init_external(&qiov, in_iov, in_num); + } + + if (unlikely(!vduse_blk_sect_range_ok(vblk_exp, + req->sector_num, + qiov.size))) { + req->in->status =3D VIRTIO_BLK_S_IOERR; + break; + } + + offset =3D req->sector_num << VIRTIO_BLK_SECTOR_BITS; + + if (is_write) { + ret =3D blk_co_pwritev(blk, offset, qiov.size, &qiov, 0); + } else { + ret =3D blk_co_preadv(blk, offset, qiov.size, &qiov, 0); + } + if (ret >=3D 0) { + req->in->status =3D VIRTIO_BLK_S_OK; + } else { + req->in->status =3D VIRTIO_BLK_S_IOERR; + } + break; + } + case VIRTIO_BLK_T_FLUSH: + if (blk_co_flush(blk) =3D=3D 0) { + req->in->status =3D VIRTIO_BLK_S_OK; + } else { + req->in->status =3D VIRTIO_BLK_S_IOERR; + } + break; + case VIRTIO_BLK_T_GET_ID: { + size_t size =3D MIN(iov_size(&elem->in_sg[0], in_num), + VIRTIO_BLK_ID_BYTES); + snprintf(elem->in_sg[0].iov_base, size, "%s", vblk_exp->export.id); + req->in->status =3D VIRTIO_BLK_S_OK; + break; + } + default: + req->in->status =3D VIRTIO_BLK_S_UNSUPP; + break; + } + + vduse_blk_req_complete(req); + return; + +err: + free(req); +} + +static void vduse_blk_vq_handler(VduseDev *dev, VduseVirtq *vq) +{ + while (1) { + VduseBlkReq *req; + + req =3D vduse_queue_pop(vq, sizeof(VduseBlkReq)); + if (!req) { + break; + } + req->vq =3D vq; + + Coroutine *co =3D + qemu_coroutine_create(vduse_blk_virtio_process_req, req); + qemu_coroutine_enter(co); + } +} + +static void on_vduse_vq_kick(void *opaque) +{ + VduseVirtq *vq =3D opaque; + VduseDev *dev =3D vduse_queue_get_dev(vq); + int fd =3D vduse_queue_get_fd(vq); + eventfd_t kick_data; + + if (eventfd_read(fd, &kick_data) =3D=3D -1) { + error_report("failed to read data from eventfd"); + return; + } + + vduse_blk_vq_handler(dev, vq); +} + +static void vduse_blk_enable_queue(VduseDev *dev, VduseVirtq *vq) +{ + VduseBlkExport *vblk_exp =3D vduse_dev_get_priv(dev); + + aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq), + true, on_vduse_vq_kick, NULL, NULL, NULL, vq); +} + +static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq) +{ + VduseBlkExport *vblk_exp =3D vduse_dev_get_priv(dev); + + aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq), + true, NULL, NULL, NULL, NULL, NULL); +} + +static const VduseOps vduse_blk_ops =3D { + .enable_queue =3D vduse_blk_enable_queue, + .disable_queue =3D vduse_blk_disable_queue, +}; + +static void on_vduse_dev_kick(void *opaque) +{ + VduseDev *dev =3D opaque; + + vduse_dev_handler(dev); +} + +static void blk_aio_attached(AioContext *ctx, void *opaque) +{ + VduseBlkExport *vblk_exp =3D opaque; + int i; + + vblk_exp->export.ctx =3D ctx; + + aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->de= v), + true, on_vduse_dev_kick, NULL, NULL, NULL, + vblk_exp->dev); + + for (i =3D 0; i < vblk_exp->num_queues; i++) { + VduseVirtq *vq =3D vduse_dev_get_queue(vblk_exp->dev, i); + int fd =3D vduse_queue_get_fd(vq); + + if (fd < 0) { + continue; + } + aio_set_fd_handler(vblk_exp->export.ctx, fd, true, + on_vduse_vq_kick, NULL, NULL, NULL, vq); + } +} + +static void blk_aio_detach(void *opaque) +{ + VduseBlkExport *vblk_exp =3D opaque; + int i; + + for (i =3D 0; i < vblk_exp->num_queues; i++) { + VduseVirtq *vq =3D vduse_dev_get_queue(vblk_exp->dev, i); + int fd =3D vduse_queue_get_fd(vq); + + if (fd < 0) { + continue; + } + aio_set_fd_handler(vblk_exp->export.ctx, fd, + true, NULL, NULL, NULL, NULL, NULL); + } + aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->de= v), + true, NULL, NULL, NULL, NULL, NULL); + vblk_exp->export.ctx =3D NULL; +} + +static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts, + Error **errp) +{ + VduseBlkExport *vblk_exp =3D container_of(exp, VduseBlkExport, export); + BlockExportOptionsVduseBlk *vblk_opts =3D &opts->u.vduse_blk; + uint64_t logical_block_size =3D VIRTIO_BLK_SECTOR_SIZE; + uint16_t num_queues =3D VDUSE_DEFAULT_NUM_QUEUE; + uint16_t queue_size =3D VDUSE_DEFAULT_QUEUE_SIZE; + Error *local_err =3D NULL; + struct virtio_blk_config config; + uint64_t features; + int i; + + if (vblk_opts->has_num_queues) { + num_queues =3D vblk_opts->num_queues; + if (num_queues =3D=3D 0) { + error_setg(errp, "num-queues must be greater than 0"); + return -EINVAL; + } + } + + if (vblk_opts->has_queue_size) { + queue_size =3D vblk_opts->queue_size; + if (queue_size =3D=3D 0) { + error_setg(errp, "queue-size must be greater than 0"); + return -EINVAL; + } + } + + if (vblk_opts->has_logical_block_size) { + logical_block_size =3D vblk_opts->logical_block_size; + check_block_size(exp->id, "logical-block-size", logical_block_size, + &local_err); + if (local_err) { + error_propagate(errp, local_err); + return -EINVAL; + } + } + blk_set_guest_block_size(exp->blk, logical_block_size); + + vblk_exp->blk_size =3D logical_block_size; + vblk_exp->writable =3D opts->writable; + vblk_exp->num_queues =3D num_queues; + + config.capacity =3D + cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS); + config.seg_max =3D cpu_to_le32(queue_size - 2); + config.size_max =3D cpu_to_le32(0); + config.min_io_size =3D cpu_to_le16(1); + config.opt_io_size =3D cpu_to_le32(1); + config.num_queues =3D cpu_to_le16(num_queues); + config.blk_size =3D cpu_to_le32(logical_block_size); + + features =3D (1ULL << VIRTIO_F_IOMMU_PLATFORM) | + (1ULL << VIRTIO_F_VERSION_1) | + (1ULL << VIRTIO_RING_F_EVENT_IDX) | + (1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) | + (1ULL << VIRTIO_RING_F_INDIRECT_DESC) | + (1ULL << VIRTIO_BLK_F_SIZE_MAX) | + (1ULL << VIRTIO_BLK_F_SEG_MAX) | + (1ULL << VIRTIO_BLK_F_TOPOLOGY) | + (1ULL << VIRTIO_BLK_F_BLK_SIZE); + + if (num_queues > 1) { + features |=3D 1ULL << VIRTIO_BLK_F_MQ; + } + if (!vblk_exp->writable) { + features |=3D 1ULL << VIRTIO_BLK_F_RO; + } + + vblk_exp->dev =3D vduse_dev_create(exp->id, VIRTIO_ID_BLOCK, 0, + features, num_queues, + sizeof(struct virtio_blk_config), + (char *)&config, &vduse_blk_ops, + vblk_exp); + if (!vblk_exp->dev) { + error_setg(errp, "failed to create vduse device"); + return -ENOMEM; + } + + for (i =3D 0; i < num_queues; i++) { + vduse_dev_setup_queue(vblk_exp->dev, i, queue_size); + } + + aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev), true, + on_vduse_dev_kick, NULL, NULL, NULL, vblk_exp->dev); + + blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detac= h, + vblk_exp); + + return 0; +} + +static void vduse_blk_exp_delete(BlockExport *exp) +{ + VduseBlkExport *vblk_exp =3D container_of(exp, VduseBlkExport, export); + + vduse_dev_destroy(vblk_exp->dev); +} + +static void vduse_blk_exp_request_shutdown(BlockExport *exp) +{ + VduseBlkExport *vblk_exp =3D container_of(exp, VduseBlkExport, export); + int i; + + blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_de= tach, + vblk_exp); + + for (i =3D 0; i < vblk_exp->num_queues; i++) { + VduseVirtq *vq =3D vduse_dev_get_queue(vblk_exp->dev, i); + int fd =3D vduse_queue_get_fd(vq); + + if (fd < 0) { + continue; + } + aio_set_fd_handler(exp->ctx, fd, true, NULL, NULL, NULL, NULL, NUL= L); + } + aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev), + true, NULL, NULL, NULL, NULL, NULL); +} + +const BlockExportDriver blk_exp_vduse_blk =3D { + .type =3D BLOCK_EXPORT_TYPE_VDUSE_BLK, + .instance_size =3D sizeof(VduseBlkExport), + .create =3D vduse_blk_exp_create, + .delete =3D vduse_blk_exp_delete, + .request_shutdown =3D vduse_blk_exp_request_shutdown, +}; diff --git a/block/export/vduse-blk.h b/block/export/vduse-blk.h new file mode 100644 index 0000000000..c4eeb1b70e --- /dev/null +++ b/block/export/vduse-blk.h @@ -0,0 +1,20 @@ +/* + * Export QEMU block device via VDUSE + * + * Copyright (C) 2022 Bytedance Inc. and/or its affiliates. All rights res= erved. + * + * Author: + * Xie Yongji + * + * This work is licensed under the terms of the GNU GPL, version 2 or + * later. See the COPYING file in the top-level directory. + */ + +#ifndef VDUSE_BLK_H +#define VDUSE_BLK_H + +#include "block/export.h" + +extern const BlockExportDriver blk_exp_vduse_blk; + +#endif /* VDUSE_BLK_H */ diff --git a/meson.build b/meson.build index 864fb50ade..472e3947c6 100644 --- a/meson.build +++ b/meson.build @@ -1320,6 +1320,17 @@ if have_libvduse libvduse =3D libvduse_proj.get_variable('libvduse_dep') endif =20 +have_vduse_blk_export =3D (have_libvduse and targetos =3D=3D 'linux') +if get_option('vduse_blk_export').enabled() + if targetos !=3D 'linux' + error('vduse_blk_export requires linux') + elif have_libvduse + error('vduse_blk_export requires libvduse support') + endif +elif get_option('vduse_blk_export').disabled() + have_vduse_blk_export =3D false +endif + # libbpf libbpf =3D dependency('libbpf', required: get_option('bpf'), method: 'pkg-= config') if libbpf.found() and not cc.links(''' @@ -1514,6 +1525,7 @@ config_host_data.set('CONFIG_SNAPPY', snappy.found()) config_host_data.set('CONFIG_USB_LIBUSB', libusb.found()) config_host_data.set('CONFIG_VDE', vde.found()) config_host_data.set('CONFIG_VHOST_USER_BLK_SERVER', have_vhost_user_blk_s= erver) +config_host_data.set('CONFIG_VDUSE_BLK_EXPORT', have_vduse_blk_export) config_host_data.set('CONFIG_VNC', vnc.found()) config_host_data.set('CONFIG_VNC_JPEG', jpeg.found()) config_host_data.set('CONFIG_VNC_PNG', png.found()) @@ -3407,6 +3419,7 @@ if have_block summary_info +=3D {'qed support': config_host.has_key('CONFIG_QED'= )} summary_info +=3D {'parallels support': config_host.has_key('CONFIG_PARA= LLELS')} summary_info +=3D {'FUSE exports': fuse} + summary_info +=3D {'VDUSE block exports': have_vduse_blk_export} endif summary(summary_info, bool_yn: true, section: 'Block layer support') =20 diff --git a/meson_options.txt b/meson_options.txt index 16790d1814..be1682c4d2 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -197,6 +197,8 @@ option('virtiofsd', type: 'feature', value: 'auto', description: 'build virtiofs daemon (virtiofsd)') option('libvduse', type: 'feature', value: 'auto', description: 'build VDUSE Library') +option('vduse_blk_export', type: 'feature', value: 'auto', + description: 'VDUSE block export support') =20 option('capstone', type: 'combo', value: 'auto', choices: ['disabled', 'enabled', 'auto', 'system', 'internal'], diff --git a/qapi/block-export.json b/qapi/block-export.json index f9ce79a974..f88e90baab 100644 --- a/qapi/block-export.json +++ b/qapi/block-export.json @@ -170,6 +170,22 @@ '*allow-other': 'FuseExportAllowOther' }, 'if': 'CONFIG_FUSE' } =20 +## +# @BlockExportOptionsVduseBlk: +# +# A vduse-blk block export. +# +# @num-queues: the number of virtqueues. Defaults to 1. +# @queue-size: the size of virtqueue. Defaults to 128. +# @logical-block-size: Logical block size in bytes. Defaults to 512 bytes. +# +# Since: 7.0 +## +{ 'struct': 'BlockExportOptionsVduseBlk', + 'data': { '*num-queues': 'uint16', + '*queue-size': 'uint16', + '*logical-block-size': 'size'} } + ## # @NbdServerAddOptions: # @@ -273,13 +289,15 @@ # @nbd: NBD export # @vhost-user-blk: vhost-user-blk export (since 5.2) # @fuse: FUSE export (since: 6.0) +# @vduse-blk: vduse-blk export (since 7.0) # # Since: 4.2 ## { 'enum': 'BlockExportType', 'data': [ 'nbd', { 'name': 'vhost-user-blk', 'if': 'CONFIG_VHOST_USER_BLK_SERVE= R' }, - { 'name': 'fuse', 'if': 'CONFIG_FUSE' } ] } + { 'name': 'fuse', 'if': 'CONFIG_FUSE' }, + { 'name': 'vduse-blk', 'if': 'CONFIG_VDUSE_BLK_EXPORT' } ] } =20 ## # @BlockExportOptions: @@ -323,7 +341,9 @@ 'vhost-user-blk': { 'type': 'BlockExportOptionsVhostUserBlk', 'if': 'CONFIG_VHOST_USER_BLK_SERVER' }, 'fuse': { 'type': 'BlockExportOptionsFuse', - 'if': 'CONFIG_FUSE' } + 'if': 'CONFIG_FUSE' }, + 'vduse-blk': { 'type': 'BlockExportOptionsVduseBlk', + 'if': 'CONFIG_VDUSE_BLK_EXPORT' } } } =20 ## diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh index af5c75d758..615fc17ec3 100644 --- a/scripts/meson-buildoptions.sh +++ b/scripts/meson-buildoptions.sh @@ -86,6 +86,8 @@ meson_options_help() { printf "%s\n" ' u2f U2F emulation support' printf "%s\n" ' usb-redir libusbredir support' printf "%s\n" ' vde vde network backend support' + printf "%s\n" ' vduse-blk-export' + printf "%s\n" ' VDUSE block export support' printf "%s\n" ' vhost-user-blk-server' printf "%s\n" ' build vhost-user-blk server' printf "%s\n" ' virglrenderer virgl rendering support' @@ -254,6 +256,8 @@ _meson_option_parse() { --disable-usb-redir) printf "%s" -Dusb_redir=3Ddisabled ;; --enable-vde) printf "%s" -Dvde=3Denabled ;; --disable-vde) printf "%s" -Dvde=3Ddisabled ;; + --enable-vduse-blk-export) printf "%s" -Dvduse_blk_export=3Denabled ;; + --disable-vduse-blk-export) printf "%s" -Dvduse_blk_export=3Ddisabled = ;; --enable-vhost-user-blk-server) printf "%s" -Dvhost_user_blk_server=3D= enabled ;; --disable-vhost-user-blk-server) printf "%s" -Dvhost_user_blk_server= =3Ddisabled ;; --enable-virglrenderer) printf "%s" -Dvirglrenderer=3Denabled ;; --=20 2.20.1 From nobody Mon May 6 21:09:16 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=bytedance.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 16431235678731002.650010471736; Tue, 25 Jan 2022 07:12:47 -0800 (PST) Received: from localhost ([::1]:43026 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nCNV0-0000BA-FG for importer@patchew.org; Tue, 25 Jan 2022 10:12:46 -0500 Received: from eggs.gnu.org ([209.51.188.92]:42386) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCLig-0007UM-6M for qemu-devel@nongnu.org; Tue, 25 Jan 2022 08:18:46 -0500 Received: from [2607:f8b0:4864:20::42c] (port=43007 helo=mail-pf1-x42c.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nCLic-0004yq-4I for qemu-devel@nongnu.org; Tue, 25 Jan 2022 08:18:44 -0500 Received: by mail-pf1-x42c.google.com with SMTP id i65so19761598pfc.9 for ; Tue, 25 Jan 2022 05:18:41 -0800 (PST) Received: from localhost ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id t15sm449631pjy.17.2022.01.25.05.18.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Jan 2022 05:18:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ZhUP6tzgIUuEFUocSB3GImvp6pcvsV2mbRYLaepuFtU=; b=5FLH3dQfWEv+Z2mVjVhk9QXmMM6PSKT7Mt1A3FI2nYOFobv9NmqfYbKaxvn/2hrvl8 gt5gqcYpo/J2Hxi38xgUKPEx7dYHANClK7WMSe3GdpB1Rp55tRCwZsupje8STF8YDFCT YoKuAAkZRNORPSPQaufBpnwpvg82CU8FRLX0oiFvF7Av7JPf0qivjALQQqGZtKjDhExb rKO11O5xlUZrr4KxVTz3LNp6rvL6ebGhnXi8SbjQhkvnfM3CiCUwV1TuIW25o4Kz6XRW OKSmeEjcXkJJbnS7L0dsb5IwGAvNAk2asw8qyr/kgskl5V9u6dJ3TBE6Pfype/ioTdeF Q1Sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZhUP6tzgIUuEFUocSB3GImvp6pcvsV2mbRYLaepuFtU=; b=FF3ecOBcSwdfF9xFJd0TZ61P5jpKcURV0osOIxv2J1OLgaYnbBDTfhKz8xN0NS0y/f /l8XsuudsZv0LUuKAL7rcMRrhxVfJvIFSOqIYZpDB5R2lb3ceN1liFfhMW3HZ/9oE8tP GexbhK8EG3tCk7Xd4qA+bWnT88AB+WXxFPiMPzXrFc8e+wBIgfFStPgaMVKJ4FwkOMXl O9xePmYQmRyycRqzsZC7OH4P1eUvlMkWEfmb7EyBstuAyw5fylLVTJrTXUhVSdQ9B63U EOBBp1p3nPLhJtxgsCjegWvGM+Lntyne5+R/3bqnD4dPkTol7E6v0gIo37E0oRUD59s7 Cl1Q== X-Gm-Message-State: AOAM533oTfWH692YPd8zOjN4PnOAvT9slz1fpjv1Xw2xBmNE7wFlxfkM +ROEXyXbZ9xHl0cbhYa3Wx5y X-Google-Smtp-Source: ABdhPJwf3ZYtkgkd3CbCtSFOeyJcfnJ2T25Y50mTqai2wSv51thQxW1x8SbsEyDaPg+Sm1vCai9ZXA== X-Received: by 2002:a05:6a00:1a0c:b0:4cb:231:1981 with SMTP id g12-20020a056a001a0c00b004cb02311981mr1459472pfv.55.1643116720885; Tue, 25 Jan 2022 05:18:40 -0800 (PST) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, kwolf@redhat.com, mreitz@redhat.com, mlureau@redhat.com Subject: [PATCH 4/5] vduse-blk: Add vduse-blk resize support Date: Tue, 25 Jan 2022 21:17:59 +0800 Message-Id: <20220125131800.91-5-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220125131800.91-1-xieyongji@bytedance.com> References: <20220125131800.91-1-xieyongji@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Host-Lookup-Failed: Reverse DNS lookup failed for 2607:f8b0:4864:20::42c (failed) Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::42c; envelope-from=xieyongji@bytedance.com; helo=mail-pf1-x42c.google.com X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, PDS_HP_HELO_NORDNS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1643123569006100001 Content-Type: text/plain; charset="utf-8" To support block resize, this uses vduse_dev_update_config() to update the capacity field in configuration space and inject config interrupt on the block resize callback. Signed-off-by: Xie Yongji --- block/export/vduse-blk.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c index 5a8d289685..83845e9a9a 100644 --- a/block/export/vduse-blk.c +++ b/block/export/vduse-blk.c @@ -297,6 +297,23 @@ static void blk_aio_detach(void *opaque) vblk_exp->export.ctx =3D NULL; } =20 +static void vduse_blk_resize(void *opaque) +{ + BlockExport *exp =3D opaque; + VduseBlkExport *vblk_exp =3D container_of(exp, VduseBlkExport, export); + struct virtio_blk_config config; + + config.capacity =3D + cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS); + vduse_dev_update_config(vblk_exp->dev, sizeof(config.capacity), + offsetof(struct virtio_blk_config, capacity), + (char *)&config.capacity); +} + +static const BlockDevOps vduse_block_ops =3D { + .resize_cb =3D vduse_blk_resize, +}; + static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts, Error **errp) { @@ -387,6 +404,8 @@ static int vduse_blk_exp_create(BlockExport *exp, Block= ExportOptions *opts, blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detac= h, vblk_exp); =20 + blk_set_dev_ops(exp->blk, &vduse_block_ops, exp); + return 0; } =20 --=20 2.20.1 From nobody Mon May 6 21:09:16 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=bytedance.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1643120853474445.22531929703393; Tue, 25 Jan 2022 06:27:33 -0800 (PST) Received: from localhost ([::1]:58584 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nCMnE-0001Bq-It for importer@patchew.org; Tue, 25 Jan 2022 09:27:32 -0500 Received: from eggs.gnu.org ([209.51.188.92]:42446) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCLij-0007di-N0 for qemu-devel@nongnu.org; Tue, 25 Jan 2022 08:18:49 -0500 Received: from [2607:f8b0:4864:20::633] (port=40484 helo=mail-pl1-x633.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nCLif-0004zK-Vn for qemu-devel@nongnu.org; Tue, 25 Jan 2022 08:18:49 -0500 Received: by mail-pl1-x633.google.com with SMTP id y17so8903925plg.7 for ; Tue, 25 Jan 2022 05:18:45 -0800 (PST) Received: from localhost ([139.177.225.253]) by smtp.gmail.com with ESMTPSA id p42sm18916544pfw.71.2022.01.25.05.18.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Jan 2022 05:18:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=zApYLsB1i3ZwQTH/fWStN7FoAHvHQ3HnkL85Dpu2yYE=; b=hbCLsaC/qyzsaqNVEMa6LCjbE9P1lbL1aI2rClsTB9qI1UAQHUPzfwCWwqU8d/qGlj uUXLsP6tJc98Phdf+s0Soz3grnVdp+6vtEROqdoknoaz/nl905F1ZOpDfCo1hk2YXL7Z WwxmbW3f1rlVHLunOlC+vWa00eB0ACmwPB+//inS9FOBYJUw9085bGsG8O6dqNkszU75 N9381OQmPTtoSZP+aPUU4fNuOPBm4fybgAT9FdHiHzgtkUYJmdlwST066chCx46lWmFs 1bGuYGR0ZjM3WWWB4t/8fp0wtKMlrqb4FgiITYo1TwoAsjq35bje2bBrd4GZ62Xq1mdZ 3cjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=zApYLsB1i3ZwQTH/fWStN7FoAHvHQ3HnkL85Dpu2yYE=; b=A+yrwcoYBjBVHp7sqOXQ+rlM7ug8QNSQAcLEqzehzlsLhMu80+xsHqLJ/nLqhBLxwN jJa9eDA+PF9MGw3geHI1ZnKiH+dtMQOKZB/+1j8iBR+DLCdfNhEqwDa3DXZV31U+beU+ D8MBcksDnaX39Bj8rvxHYlIurQc9znD4oDBVmwMbSzXSSxH8p5u2DfXfEU+T51feFPwM jYmhGIawgKML3PlzU+HAMSQZ/Yl3uJFHJDSFKPdvE8wmkwkUowQCsh6F5yBEO5+WobDS /cbg4iI4sP2+Re37hLxbQArh808xkZ/fyZeHsM/lKGvXtkD+u5ARBqG3BNaAvQibJE2q /XWQ== X-Gm-Message-State: AOAM531Fnw6Mlaaey5ch9ho9mPQK1VMGbDedNFwGmjZfZtfePyju5y8M y9UYEJdyEQfWInALBPf9JECm X-Google-Smtp-Source: ABdhPJynu8e0f6XKbxb23K8Qp2VJAaxkBbJhL4a+3sy+fA5Cq7WXxD4vUO8yrSYXVI2ZBwscqMQFJw== X-Received: by 2002:a17:90b:1806:: with SMTP id lw6mr3502897pjb.82.1643116724762; Tue, 25 Jan 2022 05:18:44 -0800 (PST) From: Xie Yongji To: mst@redhat.com, jasowang@redhat.com, stefanha@redhat.com, sgarzare@redhat.com, kwolf@redhat.com, mreitz@redhat.com, mlureau@redhat.com Subject: [PATCH 5/5] libvduse: Add support for reconnecting Date: Tue, 25 Jan 2022 21:18:00 +0800 Message-Id: <20220125131800.91-6-xieyongji@bytedance.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220125131800.91-1-xieyongji@bytedance.com> References: <20220125131800.91-1-xieyongji@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Host-Lookup-Failed: Reverse DNS lookup failed for 2607:f8b0:4864:20::633 (failed) Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::633; envelope-from=xieyongji@bytedance.com; helo=mail-pl1-x633.google.com X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, PDS_HP_HELO_NORDNS=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1643120855435100001 Content-Type: text/plain; charset="utf-8" To support reconnecting after restart or crash, VDUSE backend might need to resubmit inflight I/Os. This stores the metadata such as the index of inflight I/O's descriptors to a shm file so that VDUSE backend can restore them during reconnecting. Signed-off-by: Xie Yongji --- block/export/vduse-blk.c | 4 +- subprojects/libvduse/libvduse.c | 254 +++++++++++++++++++++++++++++++- subprojects/libvduse/libvduse.h | 4 +- 3 files changed, 254 insertions(+), 8 deletions(-) diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c index 83845e9a9a..bc14fd798b 100644 --- a/block/export/vduse-blk.c +++ b/block/export/vduse-blk.c @@ -232,6 +232,8 @@ static void vduse_blk_enable_queue(VduseDev *dev, Vduse= Virtq *vq) =20 aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq), true, on_vduse_vq_kick, NULL, NULL, NULL, vq); + /* Make sure we don't miss any kick afer reconnecting */ + eventfd_write(vduse_queue_get_fd(vq), 1); } =20 static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq) @@ -388,7 +390,7 @@ static int vduse_blk_exp_create(BlockExport *exp, Block= ExportOptions *opts, features, num_queues, sizeof(struct virtio_blk_config), (char *)&config, &vduse_blk_ops, - vblk_exp); + g_get_tmp_dir(), vblk_exp); if (!vblk_exp->dev) { error_setg(errp, "failed to create vduse device"); return -ENOMEM; diff --git a/subprojects/libvduse/libvduse.c b/subprojects/libvduse/libvdus= e.c index 7671864bca..ce2f6c7949 100644 --- a/subprojects/libvduse/libvduse.c +++ b/subprojects/libvduse/libvduse.c @@ -41,6 +41,8 @@ #define VDUSE_VQ_ALIGN 4096 #define MAX_IOVA_REGIONS 256 =20 +#define LOG_ALIGNMENT 64 + /* Round number down to multiple */ #define ALIGN_DOWN(n, m) ((n) / (m) * (m)) =20 @@ -51,6 +53,31 @@ #define unlikely(x) __builtin_expect(!!(x), 0) #endif =20 +typedef struct VduseDescStateSplit { + uint8_t inflight; + uint8_t padding[5]; + uint16_t next; + uint64_t counter; +} VduseDescStateSplit; + +typedef struct VduseVirtqLogInflight { + uint64_t features; + uint16_t version; + uint16_t desc_num; + uint16_t last_batch_head; + uint16_t used_idx; + VduseDescStateSplit desc[]; +} VduseVirtqLogInflight; + +typedef struct VduseVirtqLog { + VduseVirtqLogInflight inflight; +} VduseVirtqLog; + +typedef struct VduseVirtqInflightDesc { + uint16_t index; + uint64_t counter; +} VduseVirtqInflightDesc; + typedef struct VduseRing { unsigned int num; uint64_t desc_addr; @@ -73,6 +100,10 @@ struct VduseVirtq { bool ready; int fd; VduseDev *dev; + VduseVirtqInflightDesc *resubmit_list; + uint16_t resubmit_num; + uint64_t counter; + VduseVirtqLog *log; }; =20 typedef struct VduseIovaRegion { @@ -96,8 +127,67 @@ struct VduseDev { int fd; int ctrl_fd; void *priv; + char *shm_log_dir; + void *log; + bool reconnect; }; =20 +static inline size_t vduse_vq_log_size(uint16_t queue_size) +{ + return ALIGN_UP(sizeof(VduseDescStateSplit) * queue_size + + sizeof(VduseVirtqLogInflight), LOG_ALIGNMENT); +} + +static void *vduse_log_get(const char *dir, const char *name, size_t size) +{ + void *ptr =3D MAP_FAILED; + char *path; + int fd; + + path =3D (char *)malloc(strlen(dir) + strlen(name) + + strlen("/vduse-log-") + 1); + if (!path) { + return ptr; + } + sprintf(path, "%s/vduse-log-%s", dir, name); + + fd =3D open(path, O_RDWR | O_CREAT, 0600); + if (fd =3D=3D -1) { + goto out; + } + + if (ftruncate(fd, size) =3D=3D -1) { + goto out; + } + + ptr =3D mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (ptr =3D=3D MAP_FAILED) { + goto out; + } +out: + if (fd > 0) { + close(fd); + } + free(path); + + return ptr; +} + +static void vduse_log_destroy(const char *dir, const char *name) +{ + char *path; + + path =3D (char *)malloc(strlen(dir) + strlen(name) + + strlen("/vduse-log-") + 1); + if (!path) { + return; + } + sprintf(path, "%s/vduse-log-%s", dir, name); + + unlink(path); + free(path); +} + static inline bool has_feature(uint64_t features, unsigned int fbit) { assert(fbit < 64); @@ -139,6 +229,98 @@ static int vduse_inject_irq(VduseDev *dev, int index) return ioctl(dev->fd, VDUSE_VQ_INJECT_IRQ, &index); } =20 +static int inflight_desc_compare(const void *a, const void *b) +{ + VduseVirtqInflightDesc *desc0 =3D (VduseVirtqInflightDesc *)a, + *desc1 =3D (VduseVirtqInflightDesc *)b; + + if (desc1->counter > desc0->counter && + (desc1->counter - desc0->counter) < VIRTQUEUE_MAX_SIZE * 2) { + return 1; + } + + return -1; +} + +static int vduse_queue_check_inflights(VduseVirtq *vq) +{ + int i =3D 0; + VduseDev *dev =3D vq->dev; + + vq->used_idx =3D vq->vring.used->idx; + vq->resubmit_num =3D 0; + vq->resubmit_list =3D NULL; + vq->counter =3D 0; + + if (unlikely(vq->log->inflight.used_idx !=3D vq->used_idx)) { + vq->log->inflight.desc[vq->log->inflight.last_batch_head].inflight= =3D 0; + + barrier(); + + vq->log->inflight.used_idx =3D vq->used_idx; + } + + for (i =3D 0; i < vq->log->inflight.desc_num; i++) { + if (vq->log->inflight.desc[i].inflight =3D=3D 1) { + vq->inuse++; + } + } + + vq->shadow_avail_idx =3D vq->last_avail_idx =3D vq->inuse + vq->used_i= dx; + + if (vq->inuse) { + vq->resubmit_list =3D calloc(vq->inuse, sizeof(VduseVirtqInflightD= esc)); + if (!vq->resubmit_list) { + return -1; + } + + for (i =3D 0; i < vq->log->inflight.desc_num; i++) { + if (vq->log->inflight.desc[i].inflight) { + vq->resubmit_list[vq->resubmit_num].index =3D i; + vq->resubmit_list[vq->resubmit_num].counter =3D + vq->log->inflight.desc[i].counter; + vq->resubmit_num++; + } + } + + if (vq->resubmit_num > 1) { + qsort(vq->resubmit_list, vq->resubmit_num, + sizeof(VduseVirtqInflightDesc), inflight_desc_compare); + } + vq->counter =3D vq->resubmit_list[0].counter + 1; + } + + vduse_inject_irq(dev, vq->index); + + return 0; +} + +static int vduse_queue_inflight_get(VduseVirtq *vq, int desc_idx) +{ + vq->log->inflight.desc[desc_idx].counter =3D vq->counter++; + vq->log->inflight.desc[desc_idx].inflight =3D 1; + + return 0; +} + +static int vduse_queue_inflight_pre_put(VduseVirtq *vq, int desc_idx) +{ + vq->log->inflight.last_batch_head =3D desc_idx; + + return 0; +} + +static int vduse_queue_inflight_post_put(VduseVirtq *vq, int desc_idx) +{ + vq->log->inflight.desc[desc_idx].inflight =3D 0; + + barrier(); + + vq->log->inflight.used_idx =3D vq->used_idx; + + return 0; +} + static void vduse_iova_remove_region(VduseDev *dev, uint64_t start, uint64_t last) { @@ -578,11 +760,24 @@ void *vduse_queue_pop(VduseVirtq *vq, size_t sz) unsigned int head; VduseVirtqElement *elem; VduseDev *dev =3D vq->dev; + int i; =20 if (unlikely(!vq->vring.avail)) { return NULL; } =20 + if (unlikely(vq->resubmit_list && vq->resubmit_num > 0)) { + i =3D (--vq->resubmit_num); + elem =3D vduse_queue_map_desc(vq, vq->resubmit_list[i].index, sz); + + if (!vq->resubmit_num) { + free(vq->resubmit_list); + vq->resubmit_list =3D NULL; + } + + return elem; + } + if (vduse_queue_empty(vq)) { return NULL; } @@ -610,6 +805,8 @@ void *vduse_queue_pop(VduseVirtq *vq, size_t sz) =20 vq->inuse++; =20 + vduse_queue_inflight_get(vq, head); + return elem; } =20 @@ -667,7 +864,9 @@ void vduse_queue_push(VduseVirtq *vq, const VduseVirtqE= lement *elem, unsigned int len) { vduse_queue_fill(vq, elem, len, 0); + vduse_queue_inflight_pre_put(vq, elem->index); vduse_queue_flush(vq, 1); + vduse_queue_inflight_post_put(vq, elem->index); } =20 static int vduse_queue_update_vring(VduseVirtq *vq, uint64_t desc_addr, @@ -740,12 +939,11 @@ static void vduse_queue_enable(VduseVirtq *vq) } =20 vq->fd =3D fd; - vq->shadow_avail_idx =3D vq->last_avail_idx =3D vq_info.split.avail_in= dex; - vq->inuse =3D 0; - vq->used_idx =3D 0; vq->signalled_used_valid =3D false; vq->ready =3D true; =20 + vduse_queue_check_inflights(vq); + dev->ops->enable_queue(dev, vq); } =20 @@ -903,13 +1101,18 @@ int vduse_dev_setup_queue(VduseDev *dev, int index, = int max_size) return -errno; } =20 + if (dev->reconnect) { + vduse_queue_enable(vq); + } + return 0; } =20 VduseDev *vduse_dev_create(const char *name, uint32_t device_id, uint32_t vendor_id, uint64_t features, uint16_t num_queues, uint32_t config_size, - char *config, const VduseOps *ops, void *priv) + char *config, const VduseOps *ops, + const char *shm_log_dir, void *priv) { VduseDev *dev; int i, ret, ctrl_fd, fd =3D -1; @@ -918,6 +1121,8 @@ VduseDev *vduse_dev_create(const char *name, uint32_t = device_id, VduseVirtq *vqs =3D NULL; struct vduse_dev_config *dev_config =3D NULL; size_t size =3D offsetof(struct vduse_dev_config, config); + size_t log_size =3D num_queues * vduse_vq_log_size(VIRTQUEUE_MAX_SIZE); + void *log =3D NULL; =20 if (!name || strlen(name) > VDUSE_NAME_MAX || !config || !config_size || !ops || !ops->enable_queue || !ops->disable_queue)= { @@ -932,6 +1137,15 @@ VduseDev *vduse_dev_create(const char *name, uint32_t= device_id, } memset(dev, 0, sizeof(VduseDev)); =20 + if (shm_log_dir) { + dev->log =3D log =3D vduse_log_get(shm_log_dir, name, log_size); + if (!log) { + fprintf(stderr, "Failed to get vduse log\n"); + goto err_ctrl; + } + dev->shm_log_dir =3D strdup(shm_log_dir); + } + ctrl_fd =3D open("/dev/vduse/control", O_RDWR); if (ctrl_fd < 0) { fprintf(stderr, "Failed to open /dev/vduse/control: %s\n", @@ -964,7 +1178,11 @@ VduseDev *vduse_dev_create(const char *name, uint32_t= device_id, =20 ret =3D ioctl(ctrl_fd, VDUSE_CREATE_DEV, dev_config); free(dev_config); - if (ret < 0) { + if (!ret && log) { + memset(log, 0, log_size); + } else if (errno =3D=3D EEXIST && log) { + dev->reconnect =3D true; + } else { fprintf(stderr, "Failed to create vduse dev %s: %s\n", name, strerror(errno)); goto err_dev; @@ -978,6 +1196,12 @@ VduseDev *vduse_dev_create(const char *name, uint32_t= device_id, goto err; } =20 + if (dev->reconnect && + ioctl(fd, VDUSE_DEV_GET_FEATURES, &dev->features)) { + fprintf(stderr, "Failed to get features: %s\n", strerror(errno)); + goto err; + } + vqs =3D calloc(sizeof(VduseVirtq), num_queues); if (!vqs) { fprintf(stderr, "Failed to allocate virtqueues\n"); @@ -988,6 +1212,12 @@ VduseDev *vduse_dev_create(const char *name, uint32_t= device_id, vqs[i].index =3D i; vqs[i].dev =3D dev; vqs[i].fd =3D -1; + if (log) { + vqs[i].log =3D log; + vqs[i].log->inflight.desc_num =3D VIRTQUEUE_MAX_SIZE; + log =3D (void *)((char *)log + + vduse_vq_log_size(VIRTQUEUE_MAX_SIZE)); + } } =20 dev->vqs =3D vqs; @@ -1008,16 +1238,28 @@ err_dev: close(ctrl_fd); err_ctrl: free(dev); + if (log) { + munmap(log, log_size); + } =20 return NULL; } =20 void vduse_dev_destroy(VduseDev *dev) { + size_t log_size =3D dev->num_queues * vduse_vq_log_size(VIRTQUEUE_MAX_= SIZE); + + if (dev->log) { + munmap(dev->log, log_size); + } free(dev->vqs); close(dev->fd); dev->fd =3D -1; - ioctl(dev->ctrl_fd, VDUSE_DESTROY_DEV, dev->name); + if (!ioctl(dev->ctrl_fd, VDUSE_DESTROY_DEV, dev->name) && + dev->shm_log_dir) { + vduse_log_destroy(dev->shm_log_dir, dev->name); + } + free(dev->shm_log_dir); free(dev->name); close(dev->ctrl_fd); dev->ctrl_fd =3D -1; diff --git a/subprojects/libvduse/libvduse.h b/subprojects/libvduse/libvdus= e.h index f6bcb51b5a..a46e71e0c2 100644 --- a/subprojects/libvduse/libvduse.h +++ b/subprojects/libvduse/libvduse.h @@ -171,6 +171,7 @@ int vduse_dev_setup_queue(VduseDev *dev, int index, int= max_size); * @config_size: the size of the configuration space * @config: the buffer of the configuration space * @ops: the operation of VDUSE backend + * @shm_log_dir: directory to store the metadata file for reconnect * @priv: private pointer * * Create VDUSE device. @@ -180,7 +181,8 @@ int vduse_dev_setup_queue(VduseDev *dev, int index, int= max_size); VduseDev *vduse_dev_create(const char *name, uint32_t device_id, uint32_t vendor_id, uint64_t features, uint16_t num_queues, uint32_t config_size, - char *config, const VduseOps *ops, void *priv); + char *config, const VduseOps *ops, + const char *shm_log_dir, void *priv); =20 /** * vduse_dev_destroy: --=20 2.20.1