From nobody Sun Apr 28 23:27:47 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=bytedance.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1630592035695913.1124274192443; Thu, 2 Sep 2021 07:13:55 -0700 (PDT) Received: from localhost ([::1]:59394 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mLnTW-0007Fn-Ky for importer@patchew.org; Thu, 02 Sep 2021 10:13:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:41836) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mLmRW-0001oM-Vs for qemu-devel@nongnu.org; Thu, 02 Sep 2021 09:07:47 -0400 Received: from mail-pl1-x630.google.com ([2607:f8b0:4864:20::630]:33577) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mLmRU-00019j-MH for qemu-devel@nongnu.org; Thu, 02 Sep 2021 09:07:46 -0400 Received: by mail-pl1-x630.google.com with SMTP id k17so1157156pls.0 for ; Thu, 02 Sep 2021 06:07:44 -0700 (PDT) Received: from C02FR1DUMD6V.bytedance.net ([139.177.225.225]) by smtp.gmail.com with ESMTPSA id d6sm2307415pfa.135.2021.09.02.06.07.36 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 02 Sep 2021 06:07:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1+KQApy27Pwh7068h0D2zGaCRSkYzXO0P+waVXIyCeQ=; b=MUpIX6ppBPvVdPOfLWIkJl6ntbdlNu1GiL7WMrkCWr8BKk65bUq311FCcHbJ3R9X4L 2AsR5gYCAuG33c2XKWtYU8Ykw5Z/s/2rZQij0cgcscxoBDVxgqjVah/edCdSugtZOVcz ZCOwLSaSiNk+tXO1XZdHmIctorGXdwMmFE1vd7MJrDBLZIsb2XZ4meLmbRMXV+i3z5WZ wXrXQDhoSy2FNuuhcWyEOr/y/3X9VVGUpowbXorqxnWeugMnvSuXP/EKQH1YCZjYb+Pp fAt0kW0b5spftaE9TEP37rhir28IJw68gvvk7OEh0G4Mb3cS5c7EWRyNP95HEvzhg6Jq Lozg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=1+KQApy27Pwh7068h0D2zGaCRSkYzXO0P+waVXIyCeQ=; b=BtY3qckTBfs+PdfqOjhNjg4VDAuAfL98rgINNVpZTaxvqGMfF6/k4hK0sViN6xg3NG XuH6cf9UYvQZxAZ4cAi3b67w/ozyZ4chNK6WmVbQSI1V/rlcwsTP2WjZeFJRm36yLM6J ToFg39wqKZAwsYng6AaHK6suUUcvXZrk+ibWQ51aAWb68PFtzsmLf/Q+lS6VNvhXV9Tx wtxc9BxipxMc/jlEE3s/wgRpgQHk/mTxCJtK7vyGzQTMyat/0l1GNvSI5JlSmI+RgAH8 +ilZPrWBEHzX2GiiMbLknpCVeWxI77aR3ZerS6DlBacgJ5h4NmggNcfsuLDZr+oZH3na k2+Q== X-Gm-Message-State: AOAM530iUCQGcUGOPDQW3RAJDpURENalo/KBmpvkx2qvxqVyuVkJS+uo F7AKgOIjEApnwzEJN2XFwnyY4w== X-Google-Smtp-Source: ABdhPJzXk5YzVIdD/wPxYI+0kFmEvMa+i4B89MvNEK7CEoXbjjmyKCfRO1C6RvAq4n6k7DAIU0bAfw== X-Received: by 2002:a17:902:b48c:b0:139:eec4:737c with SMTP id y12-20020a170902b48c00b00139eec4737cmr2896262plr.11.1630588061634; Thu, 02 Sep 2021 06:07:41 -0700 (PDT) From: Junji Wei To: dledford@redhat.com, jgg@ziepe.ca, mst@redhat.com, jasowang@redhat.com, yuval.shaia.ml@gmail.com, marcel.apfelbaum@gmail.com, cohuck@redhat.com, hare@suse.de Subject: [RFC 1/5] RDMA/virtio-rdma Introduce a new core cap prot Date: Thu, 2 Sep 2021 21:06:21 +0800 Message-Id: <20210902130625.25277-2-weijunji@bytedance.com> X-Mailer: git-send-email 2.30.1 (Apple Git-130) In-Reply-To: <20210902130625.25277-1-weijunji@bytedance.com> References: <20210902130625.25277-1-weijunji@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::630; envelope-from=weijunji@bytedance.com; helo=mail-pl1-x630.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Thu, 02 Sep 2021 10:08:37 -0400 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-rdma@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, xieyongji@bytedance.com, chaiwen.cc@bytedance.com, weijunji@bytedance.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1630592037189100001 Content-Type: text/plain; charset="utf-8" Introduce a new core cap prot RDMA_CORE_CAP_PROT_VIRTIO to support virtio-rdma Currently RDMA_CORE_CAP_PROT_VIRTIO is as same as RDMA_CORE_CAP_PROT_ROCE_UDP_ENCAP except rdma_query_gid, we need to get get gid from host device. Signed-off-by: Junji Wei --- drivers/infiniband/core/cache.c | 9 ++++++--- drivers/infiniband/core/cm.c | 4 ++-- drivers/infiniband/core/cma.c | 20 ++++++++++---------- drivers/infiniband/core/device.c | 4 ++-- drivers/infiniband/core/multicast.c | 2 +- drivers/infiniband/core/nldev.c | 2 ++ drivers/infiniband/core/roce_gid_mgmt.c | 3 ++- drivers/infiniband/core/ucma.c | 2 +- drivers/infiniband/core/verbs.c | 2 +- include/rdma/ib_verbs.h | 28 +++++++++++++++++++++++++--- 10 files changed, 52 insertions(+), 24 deletions(-) diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cach= e.c index c9e9fc81447e..3c0a0c9896b4 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -396,7 +396,7 @@ static void del_gid(struct ib_device *ib_dev, u32 port, /* * For non RoCE protocol, GID entry slot is ready to use. */ - if (!rdma_protocol_roce(ib_dev, port)) + if (!rdma_protocol_virtio_or_roce(ib_dev, port)) table->data_vec[ix] =3D NULL; write_unlock_irq(&table->rwlock); =20 @@ -448,7 +448,7 @@ static int add_modify_gid(struct ib_gid_table *table, if (!entry) return -ENOMEM; =20 - if (rdma_protocol_roce(attr->device, attr->port_num)) { + if (rdma_protocol_virtio_or_roce(attr->device, attr->port_num)) { ret =3D add_roce_gid(entry); if (ret) goto done; @@ -960,6 +960,9 @@ int rdma_query_gid(struct ib_device *device, u32 port_n= um, if (!rdma_is_port_valid(device, port_num)) return -EINVAL; =20 + if (rdma_protocol_virtio(device, port_num)) + return device->ops.query_gid(device, port_num, index, gid); + table =3D rdma_gid_table(device, port_num); read_lock_irqsave(&table->rwlock, flags); =20 @@ -1482,7 +1485,7 @@ ib_cache_update(struct ib_device *device, u32 port, b= ool update_gids, goto err; } =20 - if (!rdma_protocol_roce(device, port) && update_gids) { + if (!rdma_protocol_virtio_or_roce(device, port) && update_gids) { ret =3D config_non_roce_gid_cache(device, port, tprops->gid_tbl_len); if (ret) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index c903b74f46a4..a707f5de1c2e 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -3288,7 +3288,7 @@ static int cm_lap_handler(struct cm_work *work) /* Currently Alternate path messages are not supported for * RoCE link layer. */ - if (rdma_protocol_roce(work->port->cm_dev->ib_device, + if (rdma_protocol_virtio_or_roce(work->port->cm_dev->ib_device, work->port->port_num)) return -EINVAL; =20 @@ -3381,7 +3381,7 @@ static int cm_apr_handler(struct cm_work *work) /* Currently Alternate path messages are not supported for * RoCE link layer. */ - if (rdma_protocol_roce(work->port->cm_dev->ib_device, + if (rdma_protocol_virtio_or_roce(work->port->cm_dev->ib_device, work->port->port_num)) return -EINVAL; =20 diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 5d3b8b8d163d..5d29de352ed8 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -573,7 +573,7 @@ cma_validate_port(struct ib_device *device, u32 port, if ((dev_type !=3D ARPHRD_INFINIBAND) && rdma_protocol_ib(device, port)) return ERR_PTR(-ENODEV); =20 - if (dev_type =3D=3D ARPHRD_ETHER && rdma_protocol_roce(device, port)) { + if (dev_type =3D=3D ARPHRD_ETHER && rdma_protocol_virtio_or_roce(device, = port)) { ndev =3D dev_get_by_index(dev_addr->net, bound_if_index); if (!ndev) return ERR_PTR(-ENODEV); @@ -626,7 +626,7 @@ static int cma_acquire_dev_by_src_ip(struct rdma_id_pri= vate *id_priv) mutex_lock(&lock); list_for_each_entry(cma_dev, &dev_list, list) { rdma_for_each_port (cma_dev->device, port) { - gidp =3D rdma_protocol_roce(cma_dev->device, port) ? + gidp =3D rdma_protocol_virtio_or_roce(cma_dev->device, port) ? &iboe_gid : &gid; gid_type =3D cma_dev->default_gid_type[port - 1]; sgid_attr =3D cma_validate_port(cma_dev->device, port, @@ -669,7 +669,7 @@ static int cma_ib_acquire_dev(struct rdma_id_private *i= d_priv, id_priv->id.ps =3D=3D RDMA_PS_IPOIB) return -EINVAL; =20 - if (rdma_protocol_roce(req->device, req->port)) + if (rdma_protocol_virtio_or_roce(req->device, req->port)) rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.src_addr, &gid); else @@ -1525,7 +1525,7 @@ static struct net_device *cma_get_net_dev(const struc= t ib_cm_event *ib_event, if (err) return ERR_PTR(err); =20 - if (rdma_protocol_roce(req->device, req->port)) + if (rdma_protocol_virtio_or_roce(req->device, req->port)) net_dev =3D roce_get_net_dev_by_cm_event(ib_event); else net_dev =3D ib_get_net_dev_by_params(req->device, req->port, @@ -1583,7 +1583,7 @@ static bool cma_protocol_roce(const struct rdma_cm_id= *id) struct ib_device *device =3D id->device; const u32 port_num =3D id->port_num ?: rdma_start_port(device); =20 - return rdma_protocol_roce(device, port_num); + return rdma_protocol_virtio_or_roce(device, port_num); } =20 static bool cma_is_req_ipv6_ll(const struct cma_req_info *req) @@ -1813,7 +1813,7 @@ static void destroy_mc(struct rdma_id_private *id_pri= v, if (rdma_cap_ib_mcast(id_priv->id.device, id_priv->id.port_num)) ib_sa_free_multicast(mc->sa_mc); =20 - if (rdma_protocol_roce(id_priv->id.device, id_priv->id.port_num)) { + if (rdma_protocol_virtio_or_roce(id_priv->id.device, id_priv->id.port_num= )) { struct rdma_dev_addr *dev_addr =3D &id_priv->id.route.addr.dev_addr; struct net_device *ndev =3D NULL; @@ -2296,7 +2296,7 @@ void rdma_read_gids(struct rdma_cm_id *cm_id, union i= b_gid *sgid, return; } =20 - if (rdma_protocol_roce(cm_id->device, cm_id->port_num)) { + if (rdma_protocol_virtio_or_roce(cm_id->device, cm_id->port_num)) { if (sgid) rdma_ip2gid((struct sockaddr *)&addr->src_addr, sgid); if (dgid) @@ -2919,7 +2919,7 @@ int rdma_set_ib_path(struct rdma_cm_id *id, goto err; } =20 - if (rdma_protocol_roce(id->device, id->port_num)) { + if (rdma_protocol_virtio_or_roce(id->device, id->port_num)) { ndev =3D cma_iboe_set_path_rec_l2_fields(id_priv); if (!ndev) { ret =3D -ENODEV; @@ -3139,7 +3139,7 @@ int rdma_resolve_route(struct rdma_cm_id *id, unsigne= d long timeout_ms) cma_id_get(id_priv); if (rdma_cap_ib_sa(id->device, id->port_num)) ret =3D cma_resolve_ib_route(id_priv, timeout_ms); - else if (rdma_protocol_roce(id->device, id->port_num)) + else if (rdma_protocol_virtio_or_roce(id->device, id->port_num)) ret =3D cma_resolve_iboe_route(id_priv); else if (rdma_protocol_iwarp(id->device, id->port_num)) ret =3D cma_resolve_iw_route(id_priv); @@ -4766,7 +4766,7 @@ int rdma_join_multicast(struct rdma_cm_id *id, struct= sockaddr *addr, mc->id_priv =3D id_priv; mc->join_state =3D join_state; =20 - if (rdma_protocol_roce(id->device, id->port_num)) { + if (rdma_protocol_virtio_or_roce(id->device, id->port_num)) { ret =3D cma_iboe_join_multicast(id_priv, mc); if (ret) goto out_err; diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/dev= ice.c index fa20b1824fb8..fadf17246574 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -2297,7 +2297,7 @@ void ib_enum_roce_netdev(struct ib_device *ib_dev, u32 port; =20 rdma_for_each_port (ib_dev, port) - if (rdma_protocol_roce(ib_dev, port)) { + if (rdma_protocol_virtio_or_roce(ib_dev, port)) { struct net_device *idev =3D ib_device_get_netdev(ib_dev, port); =20 @@ -2429,7 +2429,7 @@ int ib_modify_port(struct ib_device *device, rc =3D device->ops.modify_port(device, port_num, port_modify_mask, port_modify); - else if (rdma_protocol_roce(device, port_num) && + else if (rdma_protocol_virtio_or_roce(device, port_num) && ((port_modify->set_port_cap_mask & ~IB_PORT_CM_SUP) =3D=3D 0 || (port_modify->clr_port_cap_mask & ~IB_PORT_CM_SUP) =3D=3D 0)) rc =3D 0; diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/= multicast.c index a236532a9026..eaeea1002177 100644 --- a/drivers/infiniband/core/multicast.c +++ b/drivers/infiniband/core/multicast.c @@ -745,7 +745,7 @@ int ib_init_ah_from_mcmember(struct ib_device *device, = u32 port_num, */ if (rdma_protocol_ib(device, port_num)) ndev =3D NULL; - else if (!rdma_protocol_roce(device, port_num)) + else if (!rdma_protocol_virtio_or_roce(device, port_num)) return -EINVAL; =20 sgid_attr =3D rdma_find_gid_by_port(device, &rec->port_gid, diff --git a/drivers/infiniband/core/nldev.c b/drivers/infiniband/core/nlde= v.c index e9b4b2cccaa0..e41cbf6bef0b 100644 --- a/drivers/infiniband/core/nldev.c +++ b/drivers/infiniband/core/nldev.c @@ -296,6 +296,8 @@ static int fill_dev_info(struct sk_buff *msg, struct ib= _device *device) ret =3D nla_put_string(msg, RDMA_NLDEV_ATTR_DEV_PROTOCOL, "iw"); else if (rdma_protocol_roce(device, port)) ret =3D nla_put_string(msg, RDMA_NLDEV_ATTR_DEV_PROTOCOL, "roce"); + else if (rdma_protocol_virtio(device, port)) + ret =3D nla_put_string(msg, RDMA_NLDEV_ATTR_DEV_PROTOCOL, "virtio"); else if (rdma_protocol_usnic(device, port)) ret =3D nla_put_string(msg, RDMA_NLDEV_ATTR_DEV_PROTOCOL, "usnic"); diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/c= ore/roce_gid_mgmt.c index 68197e576433..5ea87b89dae6 100644 --- a/drivers/infiniband/core/roce_gid_mgmt.c +++ b/drivers/infiniband/core/roce_gid_mgmt.c @@ -75,6 +75,7 @@ static const struct { } PORT_CAP_TO_GID_TYPE[] =3D { {rdma_protocol_roce_eth_encap, IB_GID_TYPE_ROCE}, {rdma_protocol_roce_udp_encap, IB_GID_TYPE_ROCE_UDP_ENCAP}, + {rdma_protocol_virtio, IB_GID_TYPE_ROCE_UDP_ENCAP}, }; =20 #define CAP_TO_GID_TABLE_SIZE ARRAY_SIZE(PORT_CAP_TO_GID_TYPE) @@ -84,7 +85,7 @@ unsigned long roce_gid_type_mask_support(struct ib_device= *ib_dev, u32 port) int i; unsigned int ret_flags =3D 0; =20 - if (!rdma_protocol_roce(ib_dev, port)) + if (!rdma_protocol_virtio_or_roce(ib_dev, port)) return 1UL << IB_GID_TYPE_IB; =20 for (i =3D 0; i < CAP_TO_GID_TABLE_SIZE; i++) diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index 2b72c4fa9550..f748db3f0414 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -849,7 +849,7 @@ static ssize_t ucma_query_route(struct ucma_file *file, =20 if (rdma_cap_ib_sa(ctx->cm_id->device, ctx->cm_id->port_num)) ucma_copy_ib_route(&resp, &ctx->cm_id->route); - else if (rdma_protocol_roce(ctx->cm_id->device, ctx->cm_id->port_num)) + else if (rdma_protocol_virtio_or_roce(ctx->cm_id->device, ctx->cm_id->por= t_num)) ucma_copy_iboe_route(&resp, &ctx->cm_id->route); else if (rdma_protocol_iwarp(ctx->cm_id->device, ctx->cm_id->port_num)) ucma_copy_iw_route(&resp, &ctx->cm_id->route); diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verb= s.c index 7036967e4c0b..f5037ff0c2e5 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -822,7 +822,7 @@ int ib_init_ah_attr_from_wc(struct ib_device *device, u= 32 port_num, rdma_ah_set_sl(ah_attr, wc->sl); rdma_ah_set_port_num(ah_attr, port_num); =20 - if (rdma_protocol_roce(device, port_num)) { + if (rdma_protocol_virtio_or_roce(device, port_num)) { u16 vlan_id =3D wc->wc_flags & IB_WC_WITH_VLAN ? wc->vlan_id : 0xffff; =20 diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 371df1c80aeb..779d4d09aec1 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -623,6 +623,7 @@ static inline struct rdma_hw_stats *rdma_alloc_hw_stats= _struct( #define RDMA_CORE_CAP_PROT_ROCE_UDP_ENCAP 0x00800000 #define RDMA_CORE_CAP_PROT_RAW_PACKET 0x01000000 #define RDMA_CORE_CAP_PROT_USNIC 0x02000000 +#define RDMA_CORE_CAP_PROT_VIRTIO 0x04000000 =20 #define RDMA_CORE_PORT_IB_GRH_REQUIRED (RDMA_CORE_CAP_IB_GRH_REQUIRED \ | RDMA_CORE_CAP_PROT_ROCE \ @@ -654,6 +655,14 @@ static inline struct rdma_hw_stats *rdma_alloc_hw_stat= s_struct( =20 #define RDMA_CORE_PORT_USNIC (RDMA_CORE_CAP_PROT_USNIC) =20 +/* in most time, RDMA_CORE_PORT_VIRTIO is same as RDMA_CORE_PORT_IBA_ROCE_= UDP_ENCAP */ +#define RDMA_CORE_PORT_VIRTIO \ + (RDMA_CORE_CAP_PROT_VIRTIO \ + | RDMA_CORE_CAP_IB_MAD \ + | RDMA_CORE_CAP_IB_CM \ + | RDMA_CORE_CAP_AF_IB \ + | RDMA_CORE_CAP_ETH_AH) + struct ib_port_attr { u64 subnet_prefix; enum ib_port_state state; @@ -3031,6 +3040,18 @@ static inline bool rdma_protocol_ib(const struct ib_= device *device, RDMA_CORE_CAP_PROT_IB; } =20 +static inline bool rdma_protocol_virtio(const struct ib_device *device, u8= port_num) +{ + return device->port_data[port_num].immutable.core_cap_flags & + RDMA_CORE_CAP_PROT_VIRTIO; +} + +static inline bool rdma_protocol_virtio_or_roce(const struct ib_device *de= vice, u8 port_num) +{ + return device->port_data[port_num].immutable.core_cap_flags & + (RDMA_CORE_CAP_PROT_VIRTIO | RDMA_CORE_CAP_PROT_ROCE | RDMA_CORE_C= AP_PROT_ROCE_UDP_ENCAP); +} + static inline bool rdma_protocol_roce(const struct ib_device *device, u32 port_num) { @@ -3063,7 +3084,8 @@ static inline bool rdma_ib_or_roce(const struct ib_de= vice *device, u32 port_num) { return rdma_protocol_ib(device, port_num) || - rdma_protocol_roce(device, port_num); + rdma_protocol_roce(device, port_num) || + rdma_protocol_virtio(device, port_num); } =20 static inline bool rdma_protocol_raw_packet(const struct ib_device *device, @@ -3322,7 +3344,7 @@ static inline size_t rdma_max_mad_size(const struct i= b_device *device, static inline bool rdma_cap_roce_gid_table(const struct ib_device *device, u32 port_num) { - return rdma_protocol_roce(device, port_num) && + return rdma_protocol_virtio_or_roce(device, port_num) && device->ops.add_gid && device->ops.del_gid; } =20 @@ -4502,7 +4524,7 @@ void rdma_move_ah_attr(struct rdma_ah_attr *dest, str= uct rdma_ah_attr *src); static inline enum rdma_ah_attr_type rdma_ah_find_type(struct ib_device *d= ev, u32 port_num) { - if (rdma_protocol_roce(dev, port_num)) + if (rdma_protocol_virtio_or_roce(dev, port_num)) return RDMA_AH_ATTR_TYPE_ROCE; if (rdma_protocol_ib(dev, port_num)) { if (rdma_cap_opa_ah(dev, port_num)) --=20 2.11.0 From nobody Sun Apr 28 23:27:47 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=bytedance.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1630591807356890.526370799071; Thu, 2 Sep 2021 07:10:07 -0700 (PDT) Received: from localhost ([::1]:48218 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mLnPq-0007nW-6U for importer@patchew.org; Thu, 02 Sep 2021 10:10:06 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:41852) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mLmRo-0002JV-3g for qemu-devel@nongnu.org; Thu, 02 Sep 2021 09:08:05 -0400 Received: from mail-pg1-x533.google.com ([2607:f8b0:4864:20::533]:33401) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mLmRi-0001AB-DJ for qemu-devel@nongnu.org; Thu, 02 Sep 2021 09:08:03 -0400 Received: by mail-pg1-x533.google.com with SMTP id c17so1913358pgc.0 for ; Thu, 02 Sep 2021 06:07:58 -0700 (PDT) Received: from C02FR1DUMD6V.bytedance.net ([139.177.225.225]) by smtp.gmail.com with ESMTPSA id d6sm2307415pfa.135.2021.09.02.06.07.50 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 02 Sep 2021 06:07:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=jIepp6moNEehCsIAylmybhXbHhrJlJqh1tXTDh948Jc=; b=aBHKdfsLFoxmaj5KnugVpUQBTavb5jiUAcZNbWQQx8nV9M4JpZctDkIxyfA9oOHQX9 9J2t0hXaWk6ave2iMQVQeCxeGkKkNQPcEkKkMalv6ySFoHEo45BytTN8PmUpGQhr5Qll ftCUK1Q+zI4ppX60m5jzTvmw3Jg2+47TOn2iNx60GChe4aHYyWkurFzVTuF6JgFpII/r uu9htPh4xm36AzlMJMVBGDCOvUT11fhVgrVBQk0XZnbBk9w58JPv1U6GO1+Olqrk5qWU ngmGlAoDDG3DQAIcnVjsq98MVTHW4EAALNxHaqedemD3yJ29xlPoZUiEOQMI/alp4TBg PR+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jIepp6moNEehCsIAylmybhXbHhrJlJqh1tXTDh948Jc=; b=Y6GW2JcxvzzHjdYVoCBhxPjoGx8+XSbe2nHEdnW4s8iyBP0mF6dn6kuPTMgCtvp+8j sTW9/oe8TGfeFDvxkezyXvHWDPsCo286EYHILO59lqWc/NEh7mdgba9b8IlnrgY6NLLr ububeHset6dICQ7drOBBwrf4V81EKYaK4ISO0dXR5cEndH7Jf23UG2E84F/4Cs9I5NPQ Gfw5jCqy+BFKWxgqdHSISDC9PCvZrkNbm/LmM8AY9699M0mZNPU/LzNTV3QQpBC+Fyi7 6E0k7Xilw/X2Zoc6kgkXOzokM0GR1XUICRvXG46CdKTlCbxXEhch3uywS3oud/8wtbYV GGVw== X-Gm-Message-State: AOAM531nhfl5WxIOclp7fVgYYcd783z3zmxN525dQ64N+y7V08PWVt1d sa2EqHGcVbg3LmGJjyfk8gszVw== X-Google-Smtp-Source: ABdhPJwT7NpBW1/LyLjIXq8mENPYNZCj4bzJd8WGYYbTlWmutomJPJ5Vc30hoSRA6WiiShB63pMM6A== X-Received: by 2002:a63:ef01:: with SMTP id u1mr3138177pgh.336.1630588076154; Thu, 02 Sep 2021 06:07:56 -0700 (PDT) From: Junji Wei To: dledford@redhat.com, jgg@ziepe.ca, mst@redhat.com, jasowang@redhat.com, yuval.shaia.ml@gmail.com, marcel.apfelbaum@gmail.com, cohuck@redhat.com, hare@suse.de Subject: [RFC 2/5] RDMA/virtio-rdma: VirtIO RDMA driver Date: Thu, 2 Sep 2021 21:06:22 +0800 Message-Id: <20210902130625.25277-3-weijunji@bytedance.com> X-Mailer: git-send-email 2.30.1 (Apple Git-130) In-Reply-To: <20210902130625.25277-1-weijunji@bytedance.com> References: <20210902130625.25277-1-weijunji@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::533; envelope-from=weijunji@bytedance.com; helo=mail-pg1-x533.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Thu, 02 Sep 2021 10:08:37 -0400 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-rdma@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, xieyongji@bytedance.com, chaiwen.cc@bytedance.com, weijunji@bytedance.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1630591808128100001 Content-Type: text/plain; charset="utf-8" This is based on Yuval Shaia's [RFC 3/3] [ Junji Wei: Implement simple date path and complete control path. ] Signed-off-by: Yuval Shaia Signed-off-by: Junji Wei --- drivers/infiniband/Kconfig | 1 + drivers/infiniband/hw/Makefile | 1 + drivers/infiniband/hw/virtio/Kconfig | 6 + drivers/infiniband/hw/virtio/Makefile | 4 + drivers/infiniband/hw/virtio/virtio_rdma.h | 67 + drivers/infiniband/hw/virtio/virtio_rdma_dev_api.h | 285 ++++ drivers/infiniband/hw/virtio/virtio_rdma_device.c | 144 ++ drivers/infiniband/hw/virtio/virtio_rdma_device.h | 32 + drivers/infiniband/hw/virtio/virtio_rdma_ib.c | 1695 ++++++++++++++++= ++++ drivers/infiniband/hw/virtio/virtio_rdma_ib.h | 237 +++ drivers/infiniband/hw/virtio/virtio_rdma_main.c | 152 ++ drivers/infiniband/hw/virtio/virtio_rdma_netdev.c | 68 + drivers/infiniband/hw/virtio/virtio_rdma_netdev.h | 29 + include/uapi/linux/virtio_ids.h | 1 + 14 files changed, 2722 insertions(+) create mode 100644 drivers/infiniband/hw/virtio/Kconfig create mode 100644 drivers/infiniband/hw/virtio/Makefile create mode 100644 drivers/infiniband/hw/virtio/virtio_rdma.h create mode 100644 drivers/infiniband/hw/virtio/virtio_rdma_dev_api.h create mode 100644 drivers/infiniband/hw/virtio/virtio_rdma_device.c create mode 100644 drivers/infiniband/hw/virtio/virtio_rdma_device.h create mode 100644 drivers/infiniband/hw/virtio/virtio_rdma_ib.c create mode 100644 drivers/infiniband/hw/virtio/virtio_rdma_ib.h create mode 100644 drivers/infiniband/hw/virtio/virtio_rdma_main.c create mode 100644 drivers/infiniband/hw/virtio/virtio_rdma_netdev.c create mode 100644 drivers/infiniband/hw/virtio/virtio_rdma_netdev.h diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index 33d3ce9c888e..ca201ed6a350 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -92,6 +92,7 @@ source "drivers/infiniband/hw/hns/Kconfig" source "drivers/infiniband/hw/bnxt_re/Kconfig" source "drivers/infiniband/hw/hfi1/Kconfig" source "drivers/infiniband/hw/qedr/Kconfig" +source "drivers/infiniband/hw/virtio/Kconfig" source "drivers/infiniband/sw/rdmavt/Kconfig" source "drivers/infiniband/sw/rxe/Kconfig" source "drivers/infiniband/sw/siw/Kconfig" diff --git a/drivers/infiniband/hw/Makefile b/drivers/infiniband/hw/Makefile index fba0b3be903e..e2290bd9808c 100644 --- a/drivers/infiniband/hw/Makefile +++ b/drivers/infiniband/hw/Makefile @@ -13,3 +13,4 @@ obj-$(CONFIG_INFINIBAND_HFI1) +=3D hfi1/ obj-$(CONFIG_INFINIBAND_HNS) +=3D hns/ obj-$(CONFIG_INFINIBAND_QEDR) +=3D qedr/ obj-$(CONFIG_INFINIBAND_BNXT_RE) +=3D bnxt_re/ +obj-$(CONFIG_INFINIBAND_VIRTIO_RDMA) +=3D virtio/ diff --git a/drivers/infiniband/hw/virtio/Kconfig b/drivers/infiniband/hw/v= irtio/Kconfig new file mode 100644 index 000000000000..116620d49851 --- /dev/null +++ b/drivers/infiniband/hw/virtio/Kconfig @@ -0,0 +1,6 @@ +config INFINIBAND_VIRTIO_RDMA + tristate "VirtIO Paravirtualized RDMA Driver" + depends on NETDEVICES && ETHERNET && PCI && INET && VIRTIO + help + This driver provides low-level support for VirtIO Paravirtual + RDMA adapter. diff --git a/drivers/infiniband/hw/virtio/Makefile b/drivers/infiniband/hw/= virtio/Makefile new file mode 100644 index 000000000000..fb637e467167 --- /dev/null +++ b/drivers/infiniband/hw/virtio/Makefile @@ -0,0 +1,4 @@ +obj-$(CONFIG_INFINIBAND_VIRTIO_RDMA) +=3D virtio_rdma.o + +virtio_rdma-y :=3D virtio_rdma_main.o virtio_rdma_device.o virtio_rdma_ib.= o \ + virtio_rdma_netdev.o diff --git a/drivers/infiniband/hw/virtio/virtio_rdma.h b/drivers/infiniban= d/hw/virtio/virtio_rdma.h new file mode 100644 index 000000000000..e637f879e069 --- /dev/null +++ b/drivers/infiniband/hw/virtio/virtio_rdma.h @@ -0,0 +1,67 @@ +/* + * Virtio RDMA device: Driver main data types + * + * Copyright (C) 2019 Yuval Shaia Oracle Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 U= SA + */ + +#ifndef __VIRTIO_RDMA__ +#define __VIRTIO_RDMA__ + +#include +#include +#include + +#include "virtio_rdma_ib.h" + +struct virtio_rdma_dev { + struct ib_device ib_dev; + struct virtio_device *vdev; + struct virtqueue *ctrl_vq; + + /* To protect the vq operations for the controlq */ + spinlock_t ctrl_lock; + + // wait_queue_head_t acked; /* arm on send to host, release on recv */ + struct net_device *netdev; + + struct virtio_rdma_vq* cq_vqs; + struct virtio_rdma_cq** cqs; + + struct virtio_rdma_vq* qp_vqs; + int *qp_vq_using; + spinlock_t qp_using_lock; + + atomic_t num_qp; + atomic_t num_cq; + atomic_t num_ah; + + // only for modify_port ? + struct mutex port_mutex; + u32 port_cap_mask; + // TODO: check ib_active before operations + bool ib_active; +}; + +static inline struct virtio_rdma_dev *to_vdev(struct ib_device *ibdev) +{ + return container_of(ibdev, struct virtio_rdma_dev, ib_dev); +} + +#define virtio_rdma_dbg(ibdev, fmt, ...) = \ + ibdev_dbg(ibdev, "%s: " fmt, __func__, ##__VA_ARGS__) + +#endif diff --git a/drivers/infiniband/hw/virtio/virtio_rdma_dev_api.h b/drivers/i= nfiniband/hw/virtio/virtio_rdma_dev_api.h new file mode 100644 index 000000000000..4a668ddfcd64 --- /dev/null +++ b/drivers/infiniband/hw/virtio/virtio_rdma_dev_api.h @@ -0,0 +1,285 @@ +/* + * Virtio RDMA device: Virtio communication message + * + * Copyright (C) 2019 Junji Wei Bytedance Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 U= SA + */ +#ifndef __VIRTIO_RDMA_DEV_API__ +#define __VIRTIO_RDMA_DEV_API__ + +#include +#include +#include + +#define VIRTIO_RDMA_CTRL_OK 0 +#define VIRTIO_RDMA_CTRL_ERR 1 + +struct control_buf { + __u8 cmd; + __u8 status; +}; + +enum { + VIRTIO_CMD_QUERY_DEVICE =3D 10, + VIRTIO_CMD_QUERY_PORT, + VIRTIO_CMD_CREATE_CQ, + VIRTIO_CMD_DESTROY_CQ, + VIRTIO_CMD_CREATE_PD, + VIRTIO_CMD_DESTROY_PD, + VIRTIO_CMD_GET_DMA_MR, + VIRTIO_CMD_CREATE_MR, + VIRTIO_CMD_MAP_MR_SG, + VIRTIO_CMD_REG_USER_MR, + VIRTIO_CMD_DEREG_MR, + VIRTIO_CMD_CREATE_QP, + VIRTIO_CMD_MODIFY_QP, + VIRTIO_CMD_QUERY_QP, + VIRTIO_CMD_DESTROY_QP, + VIRTIO_CMD_QUERY_GID, + VIRTIO_CMD_CREATE_UC, + VIRTIO_CMD_DEALLOC_UC, + VIRTIO_CMD_QUERY_PKEY, +}; + +const char* cmd_name[] =3D { + [VIRTIO_CMD_QUERY_DEVICE] =3D "VIRTIO_CMD_QUERY_DEVICE", + [VIRTIO_CMD_QUERY_PORT] =3D "VIRTIO_CMD_QUERY_PORT", + [VIRTIO_CMD_CREATE_CQ] =3D "VIRTIO_CMD_CREATE_CQ", + [VIRTIO_CMD_DESTROY_CQ] =3D "VIRTIO_CMD_DESTROY_CQ", + [VIRTIO_CMD_CREATE_PD] =3D "VIRTIO_CMD_CREATE_PD", + [VIRTIO_CMD_DESTROY_PD] =3D "VIRTIO_CMD_DESTROY_PD", + [VIRTIO_CMD_GET_DMA_MR] =3D "VIRTIO_CMD_GET_DMA_MR", + [VIRTIO_CMD_CREATE_MR] =3D "VIRTIO_CMD_CREATE_MR", + [VIRTIO_CMD_MAP_MR_SG] =3D "VIRTIO_CMD_MAP_MR_SG", + [VIRTIO_CMD_REG_USER_MR] =3D "VIRTIO_CMD_REG_USER_MR", + [VIRTIO_CMD_DEREG_MR] =3D "VIRTIO_CMD_DEREG_MR", + [VIRTIO_CMD_CREATE_QP] =3D "VIRTIO_CMD_CREATE_QP", + [VIRTIO_CMD_MODIFY_QP] =3D "VIRTIO_CMD_MODIFY_QP", + [VIRTIO_CMD_DESTROY_QP] =3D "VIRTIO_CMD_DESTROY_QP", + [VIRTIO_CMD_QUERY_GID] =3D "VIRTIO_CMD_QUERY_GID", + [VIRTIO_CMD_CREATE_UC] =3D "VIRTIO_CMD_CREATE_UC", + [VIRTIO_CMD_DEALLOC_UC] =3D "VIRTIO_CMD_DEALLOC_UC", + [VIRTIO_CMD_QUERY_PKEY] =3D "VIRTIO_CMD_QUERY_PKEY", +}; + +struct cmd_query_port { + __u8 port; +}; + +struct cmd_create_cq { + __u32 cqe; +}; + +struct rsp_create_cq { + __u32 cqn; +}; + +struct cmd_destroy_cq { + __u32 cqn; +}; + +struct rsp_destroy_cq { + __u32 cqn; +}; + +struct cmd_create_pd { + __u32 ctx_handle; +}; + +struct rsp_create_pd { + __u32 pdn; +}; + +struct cmd_destroy_pd { + __u32 pdn; +}; + +struct rsp_destroy_pd { + __u32 pdn; +}; + +struct cmd_create_mr { + __u32 pdn; + __u32 access_flags; + + __u32 max_num_sg; +}; + +struct rsp_create_mr { + __u32 mrn; + __u32 lkey; + __u32 rkey; +}; + +struct cmd_map_mr_sg { + __u32 mrn; + __u64 start; + __u32 npages; + + __u64 pages; +}; + +struct rsp_map_mr_sg { + __u32 npages; +}; + +struct cmd_reg_user_mr { + __u32 pdn; + __u32 access_flags; + __u64 start; + __u64 length; + + __u64 pages; + __u32 npages; +}; + +struct rsp_reg_user_mr { + __u32 mrn; + __u32 lkey; + __u32 rkey; +}; + +struct cmd_dereg_mr { + __u32 mrn; + + __u8 is_user_mr; +}; + +struct rsp_dereg_mr { + __u32 mrn; +}; + +struct cmd_create_qp { + __u32 pdn; + __u8 qp_type; + __u32 max_send_wr; + __u32 max_send_sge; + __u32 send_cqn; + __u32 max_recv_wr; + __u32 max_recv_sge; + __u32 recv_cqn; + __u8 is_srq; + __u32 srq_handle; +}; + +struct rsp_create_qp { + __u32 qpn; +}; + +struct cmd_modify_qp { + __u32 qpn; + __u32 attr_mask; + struct virtio_rdma_qp_attr attrs; +}; + +struct rsp_modify_qp { + __u32 qpn; +}; + +struct cmd_destroy_qp { + __u32 qpn; +}; + +struct rsp_destroy_qp { + __u32 qpn; +}; + +struct cmd_query_qp { + __u32 qpn; + __u32 attr_mask; +}; + +struct rsp_query_qp { + struct virtio_rdma_qp_attr attr; +}; + +struct cmd_query_gid { + __u8 port; + __u32 index; +}; + +struct cmd_create_uc { + __u64 pfn; +}; + +struct rsp_create_uc { + __u32 ctx_handle; +}; + +struct cmd_dealloc_uc { + __u32 ctx_handle; +}; + +struct rsp_dealloc_uc { + __u32 ctx_handle; +}; + +struct cmd_query_pkey { + __u8 port; + __u16 index; +}; + +struct rsp_query_pkey { + __u16 pkey; +}; + +struct cmd_post_send { + __u32 qpn; + __u32 is_kernel; + __u32 num_sge; + + int send_flags; + enum ib_wr_opcode opcode; + __u64 wr_id; + + union { + __be32 imm_data; + __u32 invalidate_rkey; + } ex; +=09 + union { + struct { + __u64 remote_addr; + __u32 rkey; + } rdma; + struct { + __u64 remote_addr; + __u64 compare_add; + __u64 swap; + __u32 rkey; + } atomic; + struct { + __u32 remote_qpn; + __u32 remote_qkey; + __u32 ahn; + } ud; + struct { + __u32 mrn; + __u32 key; + int access; + } reg; + } wr; +}; + +struct cmd_post_recv { + __u32 qpn; + __u32 is_kernel; + + __u32 num_sge; + __u64 wr_id; +}; + +#endif diff --git a/drivers/infiniband/hw/virtio/virtio_rdma_device.c b/drivers/in= finiband/hw/virtio/virtio_rdma_device.c new file mode 100644 index 000000000000..89b636a32140 --- /dev/null +++ b/drivers/infiniband/hw/virtio/virtio_rdma_device.c @@ -0,0 +1,144 @@ +/* + * Virtio RDMA device: Device related functions and data + * + * Copyright (C) 2019 Yuval Shaia Oracle Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 U= SA + */ + +#include + +#include "virtio_rdma.h" +/* +static void rdma_ctrl_ack(struct virtqueue *vq) +{ + struct virtio_rdma_dev *dev =3D vq->vdev->priv; + + wake_up(&dev->acked); + + printk("%s\n", __func__); +} +*/ +struct virtio_rdma_config { + int32_t max_cq; +}; + +int init_device(struct virtio_rdma_dev *dev) +{ + int rc =3D -ENOMEM, i, cur_vq =3D 1, total_vqs =3D 1; // first for ctrl_vq + struct virtqueue **vqs; + vq_callback_t **cbs; + const char **names; + int max_cq, max_qp; + + // init cq virtqueue + virtio_cread(dev->vdev, struct virtio_rdma_config, max_cq, &max_cq); + max_cq =3D 64; // TODO: remove this, qemu only support 1024 virtqueue + dev->ib_dev.attrs.max_cq =3D max_cq; + dev->ib_dev.attrs.max_qp =3D 64; // TODO: read from host + dev->ib_dev.attrs.max_ah =3D 64; // TODO: read from host + dev->ib_dev.attrs.max_cqe =3D 64; // TODO: read from host, size of virtqu= eue + pr_info("Device max cq %d\n", dev->ib_dev.attrs.max_cq); + total_vqs +=3D max_cq; + + dev->cq_vqs =3D kcalloc(max_cq, sizeof(*dev->cq_vqs), GFP_ATOMIC); + dev->cqs =3D kcalloc(max_cq, sizeof(*dev->cqs), GFP_ATOMIC); + + // init qp virtqueue + max_qp =3D 64; // TODO: read max qp from device + dev->ib_dev.attrs.max_qp =3D max_qp; + total_vqs +=3D max_qp * 2; + + dev->qp_vqs =3D kcalloc(max_qp * 2, sizeof(*dev->qp_vqs), GFP_ATOMIC); + + dev->qp_vq_using =3D kzalloc(max_qp * sizeof(*dev->qp_vq_using), GFP_ATOM= IC); + for (i =3D 0; i < max_qp; i++) { + dev->qp_vq_using[i] =3D -1; + } + spin_lock_init(&dev->qp_using_lock); + + vqs =3D kmalloc_array(total_vqs, sizeof(*vqs), GFP_ATOMIC); + if (!vqs) + goto err_vq; + =09 + cbs =3D kmalloc_array(total_vqs, sizeof(*cbs), GFP_ATOMIC); + if (!cbs) + goto err_callback; + + names =3D kmalloc_array(total_vqs, sizeof(*names), GFP_ATOMIC); + if (!names) + goto err_names; + + names[0] =3D "ctrl"; + // cbs[0] =3D rdma_ctrl_ack; + cbs[0] =3D NULL; + + for (i =3D 0; i < max_cq; i++, cur_vq++) { + sprintf(dev->cq_vqs[i].name, "cq.%d", i); + names[cur_vq] =3D dev->cq_vqs[i].name; + cbs[cur_vq] =3D virtio_rdma_cq_ack; + } + + for (i =3D 0; i < max_qp * 2; i +=3D 2, cur_vq +=3D 2) { + sprintf(dev->cq_vqs[i].name, "wqp.%d", i); + sprintf(dev->cq_vqs[i+1].name, "rqp.%d", i); + names[cur_vq] =3D dev->cq_vqs[i].name; + names[cur_vq+1] =3D dev->cq_vqs[i+1].name; + cbs[cur_vq] =3D NULL; + cbs[cur_vq+1] =3D NULL; + } + + rc =3D virtio_find_vqs(dev->vdev, total_vqs, vqs, cbs, names, NULL); + if (rc) { + pr_info("error: %d\n", rc); + goto err; + } + + dev->ctrl_vq =3D vqs[0]; + cur_vq =3D 1; + for (i =3D 0; i < max_cq; i++, cur_vq++) { + dev->cq_vqs[i].vq =3D vqs[cur_vq]; + dev->cq_vqs[i].idx =3D i; + spin_lock_init(&dev->cq_vqs[i].lock); + } + + for (i =3D 0; i < max_qp * 2; i +=3D 2, cur_vq +=3D 2) { + dev->qp_vqs[i].vq =3D vqs[cur_vq]; + dev->qp_vqs[i+1].vq =3D vqs[cur_vq+1]; + dev->qp_vqs[i].idx =3D i / 2; + dev->qp_vqs[i+1].idx =3D i / 2; + spin_lock_init(&dev->qp_vqs[i].lock); + spin_lock_init(&dev->qp_vqs[i+1].lock); + } + pr_info("VIRTIO-RDMA INIT qp_vqs %d\n", dev->qp_vqs[max_qp * 2 - 1].vq->i= ndex); + + mutex_init(&dev->port_mutex); + dev->ib_active =3D true; + +err: + kfree(names); +err_names: + kfree(cbs); +err_callback: + kfree(vqs); +err_vq: + return rc; +} + +void fini_device(struct virtio_rdma_dev *dev) +{ + dev->vdev->config->reset(dev->vdev); + dev->vdev->config->del_vqs(dev->vdev); +} diff --git a/drivers/infiniband/hw/virtio/virtio_rdma_device.h b/drivers/in= finiband/hw/virtio/virtio_rdma_device.h new file mode 100644 index 000000000000..ca2be23128c7 --- /dev/null +++ b/drivers/infiniband/hw/virtio/virtio_rdma_device.h @@ -0,0 +1,32 @@ +/* + * Virtio RDMA device: Device related functions and data + * + * Copyright (C) 2019 Yuval Shaia Oracle Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 U= SA + */ + +#ifndef __VIRTIO_RDMA_DEVICE__ +#define __VIRTIO_RDMA_DEVICE__ + +#define VIRTIO_RDMA_BOARD_ID 1 +#define VIRTIO_RDMA_HW_NAME "virtio-rdma" +#define VIRTIO_RDMA_HW_REV 1 +#define VIRTIO_RDMA_DRIVER_VER "1.0" + +int init_device(struct virtio_rdma_dev *dev); +void fini_device(struct virtio_rdma_dev *dev); + +#endif diff --git a/drivers/infiniband/hw/virtio/virtio_rdma_ib.c b/drivers/infini= band/hw/virtio/virtio_rdma_ib.c new file mode 100644 index 000000000000..27ba8990baf9 --- /dev/null +++ b/drivers/infiniband/hw/virtio/virtio_rdma_ib.c @@ -0,0 +1,1695 @@ +/* + * Virtio RDMA device: IB related functions and data + * + * Copyright (C) 2019 Yuval Shaia Oracle Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 U= SA + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "virtio_rdma.h" +#include "virtio_rdma_device.h" +#include "virtio_rdma_ib.h" +#include "virtio_rdma_dev_api.h" + +#include "../../core/core_priv.h" + +static void ib_qp_cap_to_virtio_rdma(struct virtio_rdma_qp_cap *dst, const= struct ib_qp_cap *src) +{ + dst->max_send_wr =3D src->max_send_wr; + dst->max_recv_wr =3D src->max_recv_wr; + dst->max_send_sge =3D src->max_send_sge; + dst->max_recv_sge =3D src->max_recv_sge; + dst->max_inline_data =3D src->max_inline_data; +} + +static void virtio_rdma_to_ib_qp_cap(struct ib_qp_cap *dst, const struct v= irtio_rdma_qp_cap *src) +{ + dst->max_send_wr =3D src->max_send_wr; + dst->max_recv_wr =3D src->max_recv_wr; + dst->max_send_sge =3D src->max_send_sge; + dst->max_recv_sge =3D src->max_recv_sge; + dst->max_inline_data =3D src->max_inline_data; +} + +void ib_global_route_to_virtio_rdma(struct virtio_rdma_global_route *dst, + const struct ib_global_route *src) +{ + dst->dgid =3D src->dgid; + dst->flow_label =3D src->flow_label; + dst->sgid_index =3D src->sgid_index; + dst->hop_limit =3D src->hop_limit; + dst->traffic_class =3D src->traffic_class; +} + +void virtio_rdma_to_ib_global_route(struct ib_global_route *dst, + const struct virtio_rdma_global_route *src) +{ + dst->dgid =3D src->dgid; + dst->flow_label =3D src->flow_label; + dst->sgid_index =3D src->sgid_index; + dst->hop_limit =3D src->hop_limit; + dst->traffic_class =3D src->traffic_class; +} + +void rdma_ah_attr_to_virtio_rdma(struct virtio_rdma_ah_attr *dst, + const struct rdma_ah_attr *src) +{ + ib_global_route_to_virtio_rdma(&dst->grh, rdma_ah_read_grh(src)); + // FIXME: this should be roce->dmac + dst->dlid =3D rdma_ah_get_dlid(src); + dst->sl =3D rdma_ah_get_sl(src); + dst->src_path_bits =3D rdma_ah_get_path_bits(src); + dst->static_rate =3D rdma_ah_get_static_rate(src); + dst->port_num =3D rdma_ah_get_port_num(src); +} + +void virtio_rdma_to_rdma_ah_attr(struct rdma_ah_attr *dst, + const struct virtio_rdma_ah_attr *src) +{ + virtio_rdma_to_ib_global_route(rdma_ah_retrieve_grh(dst), &src->grh); + rdma_ah_set_dlid(dst, src->dlid); + rdma_ah_set_sl(dst, src->sl); + rdma_ah_set_path_bits(dst, src->src_path_bits); + rdma_ah_set_static_rate(dst, src->static_rate); + rdma_ah_set_port_num(dst, src->port_num); +} + +/* TODO: For the scope fof the RFC i'm utilizing ib*_*_attr structures */ + +static int virtio_rdma_exec_cmd(struct virtio_rdma_dev *di, int cmd, + struct scatterlist *in, struct scatterlist *out) +{ + struct scatterlist *sgs[4], hdr, status; + struct control_buf *ctrl; + unsigned tmp; + int rc; + unsigned long flags; + + pr_info("%s: cmd %d %s\n", __func__, cmd, cmd_name[cmd]); + spin_lock_irqsave(&di->ctrl_lock, flags); + + ctrl =3D kmalloc(sizeof(*ctrl), GFP_ATOMIC); + ctrl->cmd =3D cmd; + ctrl->status =3D ~0; + + sg_init_one(&hdr, &ctrl->cmd, sizeof(ctrl->cmd)); + sgs[0] =3D &hdr; + sgs[1] =3D in; + sgs[2] =3D out; + sg_init_one(&status, &ctrl->status, sizeof(ctrl->status)); + sgs[3] =3D &status; + + rc =3D virtqueue_add_sgs(di->ctrl_vq, sgs, 2, 2, di, GFP_ATOMIC); + if (rc) + goto out; + + if (unlikely(!virtqueue_kick(di->ctrl_vq))) { + goto out_with_status; + } + + while (!virtqueue_get_buf(di->ctrl_vq, &tmp) && + !virtqueue_is_broken(di->ctrl_vq)) + cpu_relax(); + +out_with_status: + pr_info("EXEC cmd %d %s, status %d\n", ctrl->cmd, cmd_name[ctrl->cmd], ct= rl->status); + rc =3D ctrl->status =3D=3D VIRTIO_RDMA_CTRL_OK ? 0 : 1; + +out: + spin_unlock_irqrestore(&di->ctrl_lock, flags); + kfree(ctrl); + return rc; +} + +static struct scatterlist* init_sg(void* buf, unsigned long nbytes) { + struct scatterlist* sg; + + if (is_vmalloc_addr(buf)) { + int num_page =3D 1; + int i, off; + unsigned int len =3D nbytes; + // pr_info("vmalloc address %px\n", buf); + + off =3D offset_in_page(buf); + if (off + nbytes > (int)PAGE_SIZE) { + num_page +=3D (nbytes + off - PAGE_SIZE) / PAGE_SIZE; + len =3D PAGE_SIZE - off; + } + + sg =3D kmalloc(sizeof(*sg) * num_page, GFP_ATOMIC); + if (!sg) + return NULL; + + sg_init_table(sg, num_page); + + for (i =3D 0; i < num_page; i++) { + sg_set_page(sg + i, vmalloc_to_page(buf), len, off); + // pr_info("sg_set_page: addr %px len %d off %d\n", vmalloc_to_page(buf= ), len, off); + + nbytes -=3D len; + buf +=3D len; + off =3D 0; + len =3D min(nbytes, PAGE_SIZE); + } + } else { + sg =3D kmalloc(sizeof(*sg), GFP_ATOMIC); + if (!sg) + return NULL; + sg_init_one(sg, buf, nbytes); + } + + return sg; +} + +static int virtio_rdma_port_immutable(struct ib_device *ibdev, u8 port_num, + struct ib_port_immutable *immutable) +{ + struct ib_port_attr attr; + int rc; + + rc =3D ib_query_port(ibdev, port_num, &attr); + if (rc) + return rc; + + immutable->core_cap_flags =3D RDMA_CORE_PORT_VIRTIO; + immutable->pkey_tbl_len =3D attr.pkey_tbl_len; + immutable->gid_tbl_len =3D attr.gid_tbl_len; + immutable->max_mad_size =3D IB_MGMT_MAD_SIZE; + + return 0; +} + +static int virtio_rdma_query_device(struct ib_device *ibdev, + struct ib_device_attr *props, + struct ib_udata *uhw) +{ + struct scatterlist* data; + int offs; + int rc; + + if (uhw->inlen || uhw->outlen) + return -EINVAL; + + /* We start with sys_image_guid because of inconsistency beween ib_ + * and ibv_ */ + offs =3D offsetof(struct ib_device_attr, sys_image_guid); + + data =3D init_sg((void *)props + offs, sizeof(*props) - offs); + if (!data) + return -ENOMEM; + + rc =3D virtio_rdma_exec_cmd(to_vdev(ibdev), VIRTIO_CMD_QUERY_DEVICE, NULL, + data); + + // TODO: more attrs + props->max_cq =3D ibdev->attrs.max_cq; + props->max_cqe =3D ibdev->attrs.max_cqe; + + kfree(data); + return rc; +} + +static int virtio_rdma_query_port(struct ib_device *ibdev, u8 port, + struct ib_port_attr *props) +{ + struct scatterlist in, *out; + struct cmd_query_port *cmd; + int offs; + int rc; + + cmd =3D kmalloc(sizeof(*cmd), GFP_ATOMIC); + if (!cmd) + return -ENOMEM; + + offs =3D offsetof(struct ib_port_attr, state); + + out =3D init_sg((void *)props + offs, sizeof(*props) - offs); + if (!out) { + kfree(cmd); + return -ENOMEM; + } + + cmd->port =3D port; + sg_init_one(&in, cmd, sizeof(*cmd)); + + rc =3D virtio_rdma_exec_cmd(to_vdev(ibdev), VIRTIO_CMD_QUERY_PORT, &in, + out); + + kfree(out); + kfree(cmd); + + return rc; +} + +static struct net_device *virtio_rdma_get_netdev(struct ib_device *ibdev, + u8 port_num) +{ + struct virtio_rdma_dev *ri =3D to_vdev(ibdev); + return ri->netdev; +} + +static bool virtio_rdma_cq_notify_now(struct virtio_rdma_cq *cq, uint32_t = flags) +{ + uint32_t cq_notify; + + if (!cq->ibcq.comp_handler) + return false; + + /* Read application shared notification state */ + cq_notify =3D READ_ONCE(cq->notify_flags); + + if ((cq_notify & VIRTIO_RDMA_NOTIFY_NEXT_COMPLETION) || + ((cq_notify & VIRTIO_RDMA_NOTIFY_SOLICITED) && + (flags & IB_SEND_SOLICITED))) { + /* + * CQ notification is one-shot: Since the + * current CQE causes user notification, + * the CQ gets dis-aremd and must be re-aremd + * by the user for a new notification. + */ + WRITE_ONCE(cq->notify_flags, VIRTIO_RDMA_NOTIFY_NOT); + + return true; + } + return false; +} + +void virtio_rdma_cq_ack(struct virtqueue *vq) +{ + unsigned tmp; + struct virtio_rdma_cq *vcq; + struct scatterlist sg; + bool notify; + + virtqueue_disable_cb(vq); + while ((vcq =3D virtqueue_get_buf(vq, &tmp))) { + atomic_inc(&vcq->cqe_cnt); + vcq->cqe_put++; + + notify =3D virtio_rdma_cq_notify_now(vcq, vcq->queue[vcq->cqe_put % vcq-= >num_cqe].wc_flags); + + sg_init_one(&sg, &vcq->queue[vcq->cqe_enqueue % vcq->num_cqe], sizeof(*v= cq->queue)); + virtqueue_add_inbuf(vcq->vq->vq, &sg, 1, vcq, GFP_KERNEL); + vcq->cqe_enqueue++; + + if (notify) { + vcq->ibcq.comp_handler(&vcq->ibcq, + vcq->ibcq.cq_context); + } + } + virtqueue_enable_cb(vq); +} + +static int virtio_rdma_create_cq(struct ib_cq *ibcq, + const struct ib_cq_init_attr *attr, + struct ib_udata *udata) +{ + struct scatterlist in, out; + struct virtio_rdma_cq *vcq =3D to_vcq(ibcq); + struct virtio_rdma_dev *vdev =3D to_vdev(ibcq->device); + struct cmd_create_cq *cmd; + struct rsp_create_cq *rsp; + struct scatterlist sg; + int rc, i, fill; + int entries =3D attr->cqe; + + if (!atomic_add_unless(&vdev->num_cq, 1, ibcq->device->attrs.max_cq)) + return -ENOMEM; + + // size should be power of 2, to avoid idx overflow cause an invalid idx + entries =3D roundup_pow_of_two(entries); + vcq->queue =3D kcalloc(entries, sizeof(*vcq->queue), GFP_KERNEL); + if (!vcq->queue) + return -ENOMEM; + + cmd =3D kmalloc(sizeof(*cmd), GFP_ATOMIC); + if (!cmd) { + kfree(vcq->queue); + return -ENOMEM; + } + + rsp =3D kmalloc(sizeof(*rsp), GFP_ATOMIC); + if (!rsp) { + kfree(vcq->queue); + kfree(cmd); + return -ENOMEM; + } + + cmd->cqe =3D attr->cqe; + sg_init_one(&in, cmd, sizeof(*cmd)); + sg_init_one(&out, rsp, sizeof(*rsp)); + + rc =3D virtio_rdma_exec_cmd(vdev, VIRTIO_CMD_CREATE_CQ, &in, + &out); + if (rc) { + kfree(vcq->queue); + goto out; + } + + vcq->cq_handle =3D rsp->cqn; + vcq->ibcq.cqe =3D entries; + vcq->vq =3D &vdev->cq_vqs[rsp->cqn]; + vcq->num_cqe =3D entries; + vcq->cqe_enqueue =3D 0; + vcq->cqe_put =3D 0; + vcq->cqe_get =3D 0; + atomic_set(&vcq->cqe_cnt, 0); + + vdev->cqs[rsp->cqn] =3D vcq; + + fill =3D min(entries, vdev->ib_dev.attrs.max_cqe); + for(i =3D 0; i < fill; i++) { + sg_init_one(&sg, vcq->queue + i, sizeof(*vcq->queue)); + virtqueue_add_inbuf(vcq->vq->vq, &sg, 1, vcq, GFP_KERNEL); + vcq->cqe_enqueue++; + } + + spin_lock_init(&vcq->lock); + +out: + kfree(rsp); + kfree(cmd); + return rc; +} + +void virtio_rdma_destroy_cq(struct ib_cq *cq, struct ib_udata *udata) +{ + struct virtio_rdma_cq *vcq; + struct scatterlist in, out; + struct cmd_destroy_cq *cmd; + struct rsp_destroy_cq *rsp; + unsigned tmp; + + cmd =3D kmalloc(sizeof(*cmd), GFP_ATOMIC); + if (!cmd) + return; + + rsp =3D kmalloc(sizeof(*rsp), GFP_ATOMIC); + if (!rsp) { + kfree(cmd); + return; + } + + vcq =3D to_vcq(cq); + + cmd->cqn =3D vcq->cq_handle; + sg_init_one(&in, cmd, sizeof(*cmd)); + sg_init_one(&out, rsp, sizeof(*rsp)); + + virtqueue_disable_cb(vcq->vq->vq); + + virtio_rdma_exec_cmd(to_vdev(cq->device), VIRTIO_CMD_DESTROY_CQ, + &in, &out); + + /* pop all from virtqueue, after host call virtqueue_drop_all, + * prepare for next use. + */ + while(virtqueue_get_buf(vcq->vq->vq, &tmp)); + + atomic_dec(&to_vdev(cq->device)->num_cq); + virtqueue_enable_cb(vcq->vq->vq); + + pr_debug("cqp_cnt %d %u %u %u\n", atomic_read(&vcq->cqe_cnt), vcq->cqe_en= queue, vcq->cqe_get, vcq->cqe_put); + + kfree(cmd); + kfree(rsp); +} + +int virtio_rdma_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata) +{ + struct virtio_rdma_pd *pd =3D to_vpd(ibpd); + struct ib_device *ibdev =3D ibpd->device; + struct cmd_create_pd *cmd; + struct rsp_create_pd *rsp; + struct scatterlist out, in; + int rc; + struct virtio_rdma_ucontext *context =3D rdma_udata_to_drv_context( + udata, struct virtio_rdma_ucontext, ibucontext); + + // TODO: Check MAX_PD + cmd =3D kmalloc(sizeof(*cmd), GFP_ATOMIC); + if (!cmd) + return -ENOMEM; + + rsp =3D kmalloc(sizeof(*rsp), GFP_ATOMIC); + if (!rsp) { + kfree(cmd); + return -ENOMEM; + } + + cmd->ctx_handle =3D context ? context->ctx_handle : 0; + sg_init_one(&in, cmd, sizeof(*cmd)); + + sg_init_one(&out, rsp, sizeof(*rsp)); + + rc =3D virtio_rdma_exec_cmd(to_vdev(ibdev), VIRTIO_CMD_CREATE_PD, &in, + &out); + if (rc) + goto out; + + pd->pd_handle =3D rsp->pdn; + + printk("%s: pd_handle=3D%d\n", __func__, pd->pd_handle); + +out: + kfree(rsp); + kfree(cmd); + + printk("%s: rc=3D%d\n", __func__, rc); + return rc; +} + +void virtio_rdma_dealloc_pd(struct ib_pd *pd, struct ib_udata *udata) +{ + struct virtio_rdma_pd *vpd =3D to_vpd(pd); + struct ib_device *ibdev =3D pd->device; + struct cmd_destroy_pd *cmd; + struct rsp_destroy_pd *rsp; + struct scatterlist in, out; + + pr_debug("%s:\n", __func__); + + cmd =3D kmalloc(sizeof(*cmd), GFP_ATOMIC); + if (!cmd) + return; + rsp =3D kmalloc(sizeof(*rsp), GFP_ATOMIC); + if (!rsp) { + kfree(rsp); + return; + } + + cmd->pdn =3D vpd->pd_handle; + sg_init_one(&in, cmd, sizeof(*cmd)); + sg_init_one(&out, rsp, sizeof(*rsp)); + + virtio_rdma_exec_cmd(to_vdev(ibdev), VIRTIO_CMD_DESTROY_PD, &in, &out); + + kfree(cmd); + kfree(rsp); +} + +struct ib_mr *virtio_rdma_get_dma_mr(struct ib_pd *pd, int flags) +{ + struct virtio_rdma_mr *mr; + struct scatterlist in, out; + struct cmd_create_mr *cmd; + struct rsp_create_mr *rsp; + int rc; + + mr =3D kzalloc(sizeof(*mr), GFP_ATOMIC); + if (!mr) + return ERR_PTR(-ENOMEM); + + cmd =3D kmalloc(sizeof(*cmd), GFP_ATOMIC); + if (!cmd) { + kfree(mr); + return ERR_PTR(-ENOMEM); + } + + rsp =3D kmalloc(sizeof(*rsp), GFP_ATOMIC); + if (!cmd) { + kfree(mr); + kfree(cmd); + return ERR_PTR(-ENOMEM); + } + + cmd->pdn =3D to_vpd(pd)->pd_handle; + cmd->access_flags =3D flags; + sg_init_one(&in, cmd, sizeof(*cmd)); + sg_init_one(&out, rsp, sizeof(*rsp)); + + pr_warn("Not support DMA mr now\n"); + + rc =3D virtio_rdma_exec_cmd(to_vdev(pd->device), VIRTIO_CMD_GET_DMA_MR, + &in, &out); + pr_info("%s: mr_handle=3D0x%x\n", __func__, rsp->mrn); + if (rc) { + kfree(rsp); + kfree(mr); + kfree(cmd); + return ERR_PTR(rc); + } + + mr->mr_handle =3D rsp->mrn; + mr->ibmr.lkey =3D rsp->lkey; + mr->ibmr.rkey =3D rsp->rkey; + mr->type =3D VIRTIO_RDMA_TYPE_KERNEL; + to_vpd(pd)->type =3D VIRTIO_RDMA_TYPE_KERNEL; + + kfree(cmd); + kfree(rsp); + + return &mr->ibmr; +} + +struct ib_mr *virtio_rdma_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_ty= pe, + u32 max_num_sg, struct ib_udata *udata) +{ + struct virtio_rdma_dev *dev =3D to_vdev(pd->device); + struct virtio_rdma_pd *vpd =3D to_vpd(pd); + struct virtio_rdma_mr *mr; + struct scatterlist in, out; + struct cmd_create_mr *cmd; + struct rsp_create_mr *rsp; + struct ib_mr *ret =3D ERR_PTR(-ENOMEM); + int rc; + + pr_info("%s: mr_type %d, max_num_sg %d\n", __func__, mr_type, + max_num_sg); + + if (mr_type !=3D IB_MR_TYPE_MEM_REG) + return ERR_PTR(-EINVAL); + + mr =3D kzalloc(sizeof(*mr), GFP_ATOMIC); + if (!mr) + goto err; + + cmd =3D kmalloc(sizeof(*cmd), GFP_ATOMIC); + if (!cmd) + goto err_cmd; + + rsp =3D kmalloc(sizeof(*rsp), GFP_ATOMIC); + if (!cmd) + goto err_rsp; + + // FIXME: only support PAGE_SIZE/8 sg; + mr->pages =3D dma_alloc_coherent(dev->vdev->dev.parent, PAGE_SIZE, &mr->d= ma_pages, GFP_KERNEL); + if (!mr->pages) { + pr_err("dma alloc pages failed\n"); + goto err_pages; + } + mr->max_pages =3D max_num_sg; + mr->npages =3D 0; + + memset(cmd, 0, sizeof(*cmd)); + cmd->pdn =3D to_vpd(pd)->pd_handle; + cmd->max_num_sg =3D max_num_sg; + + sg_init_one(&in, cmd, sizeof(*cmd)); + sg_init_one(&out, rsp, sizeof(*rsp)); + + rc =3D virtio_rdma_exec_cmd(to_vdev(pd->device), VIRTIO_CMD_CREATE_MR, + &in, &out); + + if (rc) { + kfree(rsp); + kfree(mr); + kfree(cmd); + return ERR_PTR(rc); + } + + mr->mr_handle =3D rsp->mrn; + mr->ibmr.lkey =3D rsp->lkey; + mr->ibmr.rkey =3D rsp->rkey; + mr->type =3D VIRTIO_RDMA_TYPE_KERNEL; + vpd->type =3D VIRTIO_RDMA_TYPE_KERNEL; + + pr_info("%s: mr_handle=3D0x%x\n", __func__, mr->mr_handle); + + kfree(cmd); + kfree(rsp); + + return &mr->ibmr; + +err_pages: + kfree(rsp); +err_rsp: + kfree(cmd); +err_cmd: + kfree(mr); +err: + return ret; +} + +static int virtio_rdma_set_page(struct ib_mr *ibmr, u64 addr) +{ + struct virtio_rdma_mr *mr =3D to_vmr(ibmr); + + if (mr->npages =3D=3D mr->max_pages) + return -ENOMEM; + + if (is_vmalloc_addr((void*)addr)) { + pr_err("vmalloc addr is not support\n"); + return -EINVAL; + } + mr->pages[mr->npages++] =3D virt_to_phys((void*)addr); + return 0; +} + +int virtio_rdma_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, + int sg_nents, unsigned int *sg_offset) +{ + struct virtio_rdma_mr *mr =3D to_vmr(ibmr); + struct cmd_map_mr_sg *cmd; + struct rsp_map_mr_sg *rsp; + struct scatterlist in, out; + int rc; + + cmd =3D kmalloc(sizeof(*cmd), GFP_KERNEL); + if (!cmd) + return -ENOMEM; + rsp =3D kmalloc(sizeof(*rsp), GFP_KERNEL); + if (!rsp) { + rc =3D -ENOMEM; + goto out_rsp; + } + + mr->npages =3D 0; + + rc =3D ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, virtio_rdma_set_page= ); + if (rc < 0) { + pr_err("could not map sg to pages\n"); + rc =3D -EINVAL; + goto out; + } + + pr_info("%s: start %llx npages %d\n", __func__, sg[0].dma_address, mr->np= ages); + + cmd->mrn =3D mr->mr_handle; + cmd->start =3D (uint64_t)phys_to_virt(mr->pages[0]); + cmd->npages =3D mr->npages; + cmd->pages =3D mr->dma_pages; + + sg_init_one(&in, cmd, sizeof(*cmd)); + sg_init_one(&out, rsp, sizeof(*rsp)); + + rc =3D virtio_rdma_exec_cmd(to_vdev(ibmr->device), VIRTIO_CMD_MAP_MR_SG, + &in, &out); + + if (rc) + rc =3D -EIO; + +out: + kfree(rsp); +out_rsp: + kfree(cmd); + return rc; +} + +struct ib_mr *virtio_rdma_reg_user_mr(struct ib_pd *pd, u64 start, u64 len= gth, + u64 virt_addr, int access_flags, + struct ib_udata *udata) +{ + struct virtio_rdma_dev *dev =3D to_vdev(pd->device); + struct virtio_rdma_pd *vpd =3D to_vpd(pd); + struct virtio_rdma_mr *mr; + struct ib_umem *umem; + struct ib_mr *ret =3D ERR_PTR(-ENOMEM); + struct sg_dma_page_iter sg_iter; + struct scatterlist in, out; + struct cmd_reg_user_mr *cmd; + struct rsp_reg_user_mr *rsp; + int rc; + uint32_t npages; + + pr_info("%s: start %llu, len %llu, addr %llu\n", __func__, start, length,= virt_addr); + + cmd =3D kmalloc(sizeof(*cmd), GFP_ATOMIC); + if (!cmd)=20 + return ERR_PTR(-ENOMEM); + + rsp =3D kmalloc(sizeof(*rsp), GFP_ATOMIC); + if (!cmd) + goto err_rsp; + + umem =3D ib_umem_get(udata, start, length, access_flags, 0); + if (IS_ERR(umem)) { + pr_err("could not get umem for mem region\n"); + ret =3D ERR_CAST(umem); + goto err; + } + + npages =3D ib_umem_num_pages(umem); + if (npages < 0) { + pr_err("npages < 0"); + ret =3D ERR_PTR(-EINVAL); + goto err; + } + + mr =3D kzalloc(sizeof(*mr), GFP_ATOMIC); + if (!mr) { + ret =3D ERR_PTR(-ENOMEM); + goto err; + } + + // TODO: change page size to needed + mr->pages =3D dma_alloc_coherent(dev->vdev->dev.parent, PAGE_SIZE, &mr->d= ma_pages, GFP_KERNEL); + if (!mr->pages) { + pr_err("dma alloc pages failed\n"); + goto err; + } + + mr->max_pages =3D npages; + mr->iova =3D virt_addr; + mr->size =3D length; + mr->umem =3D umem; + + // TODO: test pages + mr->npages =3D 0; + for_each_sg_dma_page(umem->sg_head.sgl, &sg_iter, umem->nmap, 0) { + dma_addr_t addr =3D sg_page_iter_dma_address(&sg_iter); + mr->pages[mr->npages] =3D addr; + mr->npages++; + } + + cmd->pdn =3D to_vpd(pd)->pd_handle; + cmd->access_flags =3D access_flags; + cmd->start =3D start; + cmd->length =3D length; + cmd->pages =3D mr->dma_pages; + cmd->npages =3D npages; + + sg_init_one(&in, cmd, sizeof(*cmd)); + sg_init_one(&out, rsp, sizeof(*rsp)); + + rc =3D virtio_rdma_exec_cmd(to_vdev(pd->device), VIRTIO_CMD_REG_USER_MR, + &in, &out); + + if (rc) { + ib_umem_release(umem); + kfree(rsp); + kfree(mr); + kfree(cmd); + return ERR_PTR(rc); + } + + mr->mr_handle =3D rsp->mrn; + mr->ibmr.lkey =3D rsp->lkey; + mr->ibmr.rkey =3D rsp->rkey; + mr->type =3D VIRTIO_RDMA_TYPE_USER; + vpd->type =3D VIRTIO_RDMA_TYPE_USER; + + printk("%s: mr_handle=3D0x%x\n", __func__, mr->mr_handle); + + ret =3D &mr->ibmr; + +err: + kfree(cmd); +err_rsp: + kfree(rsp); + return ret; +} + +int virtio_rdma_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata) +{ + struct virtio_rdma_mr *mr =3D to_vmr(ibmr); + struct scatterlist in, out; + struct cmd_dereg_mr *cmd; + struct rsp_dereg_mr *rsp; + int rc =3D -ENOMEM; + + cmd =3D kmalloc(sizeof(*cmd), GFP_ATOMIC); + if (!cmd) + return -ENOMEM; + + rsp =3D kmalloc(sizeof(*rsp), GFP_ATOMIC); + if (!rsp) + goto out_rsp; + + cmd->mrn =3D mr->mr_handle; + cmd->is_user_mr =3D mr->type =3D=3D VIRTIO_RDMA_TYPE_USER; + + sg_init_one(&in, cmd, sizeof(*cmd)); + sg_init_one(&out, rsp, sizeof(*rsp)); + + rc =3D virtio_rdma_exec_cmd(to_vdev(ibmr->device), VIRTIO_CMD_DEREG_MR, + &in, &out); + if (rc) { + rc =3D -EIO; + goto out; + } + + dma_free_coherent(to_vdev(ibmr->device)->vdev->dev.parent, PAGE_SIZE, &mr= ->pages, GFP_KERNEL); + if (mr->type =3D=3D VIRTIO_RDMA_TYPE_USER) + ib_umem_release(mr->umem); +out: + kfree(rsp); +out_rsp: + kfree(cmd); + return rc; +} + +static int find_qp_vq(struct virtio_rdma_dev *dev, uint32_t qpn) { + int rc =3D -1, i; + unsigned long flags; + uint32_t max_qp =3D dev->ib_dev.attrs.max_qp; + + spin_lock_irqsave(&dev->qp_using_lock, flags); + for(i =3D 0; i < max_qp; i++) { + if (dev->qp_vq_using[i] =3D=3D -1) { + rc =3D i; + dev->qp_vq_using[i] =3D qpn; + goto found; + } + } +found: + spin_unlock_irqrestore(&dev->qp_using_lock, flags); + return rc; +} + +struct ib_qp *virtio_rdma_create_qp(struct ib_pd *ibpd, + struct ib_qp_init_attr *attr, + struct ib_udata *udata) +{ + struct scatterlist in, out; + struct virtio_rdma_dev *vdev =3D to_vdev(ibpd->device); + struct virtio_rdma_pd *vpd =3D to_vpd(ibpd); + struct cmd_create_qp *cmd; + struct rsp_create_qp *rsp; + struct virtio_rdma_qp *vqp; + int rc, vqn; + + if (!atomic_add_unless(&vdev->num_cq, 1, vdev->ib_dev.attrs.max_qp)) + return ERR_PTR(-ENOMEM); + + cmd =3D kmalloc(sizeof(*cmd), GFP_ATOMIC); + if (!cmd) + return ERR_PTR(-ENOMEM); + + rsp =3D kmalloc(sizeof(*rsp), GFP_ATOMIC); + if (!rsp) { + kfree(cmd); + return ERR_PTR(-ENOMEM); + } + + vqp =3D kzalloc(sizeof(*vqp), GFP_ATOMIC); + if (!vqp) { + kfree(cmd); + kfree(rsp); + return ERR_PTR(-ENOMEM); + } + + cmd->pdn =3D to_vpd(ibpd)->pd_handle; + cmd->qp_type =3D attr->qp_type; + cmd->max_send_wr =3D attr->cap.max_send_wr; + cmd->max_send_sge =3D attr->cap.max_send_sge; + cmd->send_cqn =3D to_vcq(attr->send_cq)->cq_handle; + cmd->max_recv_wr =3D attr->cap.max_recv_wr; + cmd->max_recv_sge =3D attr->cap.max_recv_sge; + cmd->recv_cqn =3D to_vcq(attr->recv_cq)->cq_handle; + cmd->is_srq =3D !!attr->srq; + cmd->srq_handle =3D 0; // Not support srq now + + sg_init_one(&in, cmd, sizeof(*cmd)); + printk("%s: pdn %d\n", __func__, cmd->pdn); + + sg_init_one(&out, rsp, sizeof(*rsp)); + + rc =3D virtio_rdma_exec_cmd(vdev, VIRTIO_CMD_CREATE_QP, &in, + &out); + if (rc) { + kfree(vqp); + kfree(rsp); + kfree(cmd); + return ERR_PTR(-EIO); + } + + vqp->type =3D vpd->type; + vqp->port =3D attr->port_num; + vqp->qp_handle =3D rsp->qpn; + vqp->ibqp.qp_num =3D rsp->qpn; +=09 + vqn =3D find_qp_vq(vdev, vqp->qp_handle); + vqp->sq =3D &vdev->qp_vqs[vqn * 2]; + vqp->rq =3D &vdev->qp_vqs[vqn * 2 + 1]; + vqp->s_cmd =3D kmalloc(sizeof(*vqp->s_cmd), GFP_ATOMIC); + vqp->r_cmd =3D kmalloc(sizeof(*vqp->r_cmd), GFP_ATOMIC); + + pr_info("%s: qpn 0x%x wq %d rq %d\n", __func__, rsp->qpn, + vqp->sq->vq->index, vqp->rq->vq->index); +=09 + kfree(rsp); + kfree(cmd); + return &vqp->ibqp; +} + +int virtio_rdma_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata) +{ + struct virtio_rdma_dev *vdev =3D to_vdev(ibqp->device); + struct virtio_rdma_qp *vqp =3D to_vqp(ibqp); + struct scatterlist in, out; + struct cmd_destroy_qp *cmd; + struct rsp_destroy_qp *rsp; + int rc; + + pr_info("%s: qpn %d\n", __func__, vqp->qp_handle); + + cmd =3D kmalloc(sizeof(*cmd), GFP_ATOMIC); + if (!cmd) + return -ENOMEM; + + rsp =3D kmalloc(sizeof(*rsp), GFP_ATOMIC); + if (!rsp) { + kfree(cmd); + return -ENOMEM; + } + + cmd->qpn =3D vqp->qp_handle; + + sg_init_one(&in, cmd, sizeof(*cmd)); + sg_init_one(&out, rsp, sizeof(*rsp)); + + rc =3D virtio_rdma_exec_cmd(vdev, VIRTIO_CMD_DESTROY_QP, + &in, &out); +=09 + atomic_dec(&vdev->num_qp); + // FIXME: need lock ? + smp_store_mb(vdev->qp_vq_using[vqp->sq->idx / 2], -1); + + kfree(vqp->s_cmd); + kfree(vqp->r_cmd); + + kfree(rsp); + kfree(cmd); + return rc; +} + +int virtio_rdma_query_gid(struct ib_device *ibdev, u8 port, int index, + union ib_gid *gid) +{ + struct scatterlist in, *data; + struct cmd_query_gid *cmd; + struct ib_gid_attr gid_attr; + int rc; + + printk("%s: port %d, index %d\n", __func__, port, index); + + cmd =3D kmalloc(sizeof(*cmd), GFP_ATOMIC); + if (!cmd) + return -ENOMEM; + + data =3D init_sg(gid, sizeof(*gid)); + if (!data) { + kfree(cmd); + return -ENOMEM; + } + + cmd->port =3D port; + cmd->index =3D index; + sg_init_one(&in, cmd, sizeof(*cmd)); + + rc =3D virtio_rdma_exec_cmd(to_vdev(ibdev), VIRTIO_CMD_QUERY_GID, &in, + data); + + if (!rc) { + gid_attr.ndev =3D to_vdev(ibdev)->netdev; + gid_attr.gid_type =3D IB_GID_TYPE_ROCE; + ib_cache_gid_add(ibdev, port, gid, &gid_attr); + } + + kfree(data); + kfree(cmd); + return rc; +} + +static int virtio_rdma_add_gid(const struct ib_gid_attr *attr, void **cont= ext) +{ + printk("%s: gid index %d\n", __func__, attr->index); + + return 0; +} + +static int virtio_rdma_del_gid(const struct ib_gid_attr *attr, void **cont= ext) +{ + printk("%s:\n", __func__); + + return 0; +} + +int virtio_rdma_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *= udata) +{ + struct scatterlist in, out; + struct cmd_create_uc *cmd; + struct rsp_create_uc *rsp; + struct virtio_rdma_ucontext *vuc =3D to_vucontext(uctx); + int rc; +=09 + cmd =3D kmalloc(sizeof(*cmd), GFP_ATOMIC); + if (!cmd) + return -ENOMEM; + + rsp =3D kmalloc(sizeof(*rsp), GFP_ATOMIC); + if (!rsp) { + kfree(cmd); + return -ENOMEM; + } + + // TODO: init uar & set cmd->pfn + sg_init_one(&in, cmd, sizeof(*cmd)); + sg_init_one(&out, rsp, sizeof(*rsp)); + + rc =3D virtio_rdma_exec_cmd(to_vdev(uctx->device), VIRTIO_CMD_CREATE_UC, = &in, + &out); + + if (rc) { + rc =3D -EIO; + goto out; + } + + vuc->ctx_handle =3D rsp->ctx_handle; + +out: + kfree(rsp); + kfree(cmd); + return rc; +} + +void virtio_rdma_dealloc_ucontext(struct ib_ucontext *ibcontext) +{ + struct scatterlist in, out; + struct cmd_dealloc_uc *cmd; + struct rsp_dealloc_uc *rsp; + struct virtio_rdma_ucontext *vuc =3D to_vucontext(ibcontext); +=09 + cmd =3D kmalloc(sizeof(*cmd), GFP_ATOMIC); + if (!cmd) + return; + + rsp =3D kmalloc(sizeof(*rsp), GFP_ATOMIC); + if (!rsp) { + kfree(cmd); + return; + } + + cmd->ctx_handle =3D vuc->ctx_handle; + sg_init_one(&in, cmd, sizeof(*cmd)); + sg_init_one(&out, rsp, sizeof(*rsp)); + + virtio_rdma_exec_cmd(to_vdev(ibcontext->device), VIRTIO_CMD_DEALLOC_UC, &= in, + &out); + + kfree(rsp); + kfree(cmd); +} + +int virtio_rdma_create_ah(struct ib_ah *ibah, + struct rdma_ah_attr *ah_attr, u32 flags, + struct ib_udata *udata) +{ + struct virtio_rdma_dev *vdev =3D to_vdev(ibah->device); + struct virtio_rdma_ah *ah =3D to_vah(ibah); + const struct ib_global_route *grh; + u8 port_num =3D rdma_ah_get_port_num(ah_attr); + + if (!(rdma_ah_get_ah_flags(ah_attr) & IB_AH_GRH)) + return -EINVAL; + + grh =3D rdma_ah_read_grh(ah_attr); + if ((ah_attr->type !=3D RDMA_AH_ATTR_TYPE_ROCE) || + rdma_is_multicast_addr((struct in6_addr *)grh->dgid.raw)) + return -EINVAL; + + if (!atomic_add_unless(&vdev->num_ah, 1, vdev->ib_dev.attrs.max_ah)) + return -ENOMEM; + + ah->av.port_pd =3D to_vpd(ibah->pd)->pd_handle | (port_num << 24); + ah->av.src_path_bits =3D rdma_ah_get_path_bits(ah_attr); + ah->av.src_path_bits |=3D 0x80; + ah->av.gid_index =3D grh->sgid_index; + ah->av.hop_limit =3D grh->hop_limit; + ah->av.sl_tclass_flowlabel =3D (grh->traffic_class << 20) | + grh->flow_label; + memcpy(ah->av.dgid, grh->dgid.raw, 16); + memcpy(ah->av.dmac, ah_attr->roce.dmac, ETH_ALEN); + + return 0; +} + +void virtio_rdma_destroy_ah(struct ib_ah *ah, u32 flags) +{ + struct virtio_rdma_dev *vdev =3D to_vdev(ah->device); + + printk("%s:\n", __func__); + atomic_dec(&vdev->num_ah); +} + +static void virtio_rdma_get_fw_ver_str(struct ib_device *device, char *str) +{ + snprintf(str, IB_FW_VERSION_NAME_MAX, "%d.%d.%d\n", 1, 0, 0); +} + +enum rdma_link_layer virtio_rdma_port_link_layer(struct ib_device *ibdev, + u8 port) +{ + return IB_LINK_LAYER_ETHERNET; +} + +int virtio_rdma_mmap(struct ib_ucontext *ibcontext, struct vm_area_struct = *vma) +{ + printk("%s:\n", __func__); + + return 0; +} + +int virtio_rdma_modify_port(struct ib_device *ibdev, u8 port, int mask, + struct ib_port_modify *props) +{ + struct ib_port_attr attr; + struct virtio_rdma_dev *vdev =3D to_vdev(ibdev); + int ret; + + if (mask & ~IB_PORT_SHUTDOWN) { + pr_warn("unsupported port modify mask %#x\n", mask); + return -EOPNOTSUPP; + } + + mutex_lock(&vdev->port_mutex); + ret =3D ib_query_port(ibdev, port, &attr); + if (ret) + goto out; + + vdev->port_cap_mask |=3D props->set_port_cap_mask; + vdev->port_cap_mask &=3D ~props->clr_port_cap_mask; + + if (mask & IB_PORT_SHUTDOWN) + vdev->ib_active =3D false; + +out: + mutex_unlock(&vdev->port_mutex); + return ret; +} + +int virtio_rdma_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, + int attr_mask, struct ib_udata *udata) +{ + struct scatterlist in, out; + struct cmd_modify_qp *cmd; + struct rsp_modify_qp *rsp; + int rc; + + pr_info("%s: qpn %d\n", __func__, to_vqp(ibqp)->qp_handle); + + cmd =3D kzalloc(sizeof(*cmd), GFP_ATOMIC); + if (!cmd) + return -ENOMEM; + + rsp =3D kmalloc(sizeof(*rsp), GFP_ATOMIC); + if (!rsp) { + kfree(cmd); + return -ENOMEM; + } + + cmd->qpn =3D to_vqp(ibqp)->qp_handle; + cmd->attr_mask =3D attr_mask & ((1 << 21) - 1); + + // TODO: copy based on attr_mask + cmd->attrs.qp_state =3D attr->qp_state; + cmd->attrs.cur_qp_state =3D attr->cur_qp_state; + cmd->attrs.path_mtu =3D attr->path_mtu; + cmd->attrs.path_mig_state =3D attr->path_mig_state; + cmd->attrs.qkey =3D attr->qkey; + cmd->attrs.rq_psn =3D attr->rq_psn; + cmd->attrs.sq_psn =3D attr->sq_psn; + cmd->attrs.dest_qp_num =3D attr->dest_qp_num; + cmd->attrs.qp_access_flags =3D attr->qp_access_flags; + cmd->attrs.pkey_index =3D attr->pkey_index; + cmd->attrs.alt_pkey_index =3D attr->alt_pkey_index; + cmd->attrs.en_sqd_async_notify =3D attr->en_sqd_async_notify; + cmd->attrs.sq_draining =3D attr->sq_draining; + cmd->attrs.max_rd_atomic =3D attr->max_rd_atomic; + cmd->attrs.max_dest_rd_atomic =3D attr->max_dest_rd_atomic; + cmd->attrs.min_rnr_timer =3D attr->min_rnr_timer; + cmd->attrs.port_num =3D attr->port_num; + cmd->attrs.timeout =3D attr->timeout; + cmd->attrs.retry_cnt =3D attr->retry_cnt; + cmd->attrs.rnr_retry =3D attr->rnr_retry; + cmd->attrs.alt_port_num =3D attr->alt_port_num; + cmd->attrs.alt_timeout =3D attr->alt_timeout; + cmd->attrs.rate_limit =3D attr->rate_limit; + ib_qp_cap_to_virtio_rdma(&cmd->attrs.cap, &attr->cap); + rdma_ah_attr_to_virtio_rdma(&cmd->attrs.ah_attr, &attr->ah_attr); + rdma_ah_attr_to_virtio_rdma(&cmd->attrs.alt_ah_attr, &attr->alt_ah_attr); + + sg_init_one(&in, cmd, sizeof(*cmd)); + sg_init_one(&out, rsp, sizeof(*rsp)); + + rc =3D virtio_rdma_exec_cmd(to_vdev(ibqp->device), VIRTIO_CMD_MODIFY_QP, + &in, &out); + + kfree(rsp); + kfree(cmd); + return rc; +} + +int virtio_rdma_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, + int attr_mask, struct ib_qp_init_attr *init_attr) +{ + struct scatterlist in, out; + struct virtio_rdma_qp *vqp =3D to_vqp(ibqp); + struct cmd_query_qp *cmd; + struct rsp_query_qp *rsp; + int rc; + + cmd =3D kzalloc(sizeof(*cmd), GFP_ATOMIC); + if (!cmd) + return -ENOMEM; + + rsp =3D kmalloc(sizeof(*rsp), GFP_ATOMIC); + if (!rsp) { + kfree(cmd); + return -ENOMEM; + } + + cmd->qpn =3D vqp->qp_handle; + cmd->attr_mask =3D attr_mask; + + sg_init_one(&in, cmd, sizeof(*cmd)); + sg_init_one(&out, rsp, sizeof(*rsp)); + rc =3D virtio_rdma_exec_cmd(to_vdev(ibqp->device), VIRTIO_CMD_QUERY_QP, + &in, &out); + + if (rc) + goto out; + + attr->qp_state =3D rsp->attr.qp_state; + attr->cur_qp_state =3D rsp->attr.cur_qp_state; + attr->path_mtu =3D rsp->attr.path_mtu; + attr->path_mig_state =3D rsp->attr.path_mig_state; + attr->qkey =3D rsp->attr.qkey; + attr->rq_psn =3D rsp->attr.rq_psn; + attr->sq_psn =3D rsp->attr.sq_psn; + attr->dest_qp_num =3D rsp->attr.dest_qp_num; + attr->qp_access_flags =3D rsp->attr.qp_access_flags; + attr->pkey_index =3D rsp->attr.pkey_index; + attr->alt_pkey_index =3D rsp->attr.alt_pkey_index; + attr->en_sqd_async_notify =3D rsp->attr.en_sqd_async_notify; + attr->sq_draining =3D rsp->attr.sq_draining; + attr->max_rd_atomic =3D rsp->attr.max_rd_atomic; + attr->max_dest_rd_atomic =3D rsp->attr.max_dest_rd_atomic; + attr->min_rnr_timer =3D rsp->attr.min_rnr_timer; + attr->port_num =3D rsp->attr.port_num; + attr->timeout =3D rsp->attr.timeout; + attr->retry_cnt =3D rsp->attr.retry_cnt; + attr->rnr_retry =3D rsp->attr.rnr_retry; + attr->alt_port_num =3D rsp->attr.alt_port_num; + attr->alt_timeout =3D rsp->attr.alt_timeout; + attr->rate_limit =3D rsp->attr.rate_limit; + virtio_rdma_to_ib_qp_cap(&attr->cap, &rsp->attr.cap); + virtio_rdma_to_rdma_ah_attr(&attr->ah_attr, &rsp->attr.ah_attr); + virtio_rdma_to_rdma_ah_attr(&attr->alt_ah_attr, &rsp->attr.alt_ah_attr); + +out: + init_attr->event_handler =3D vqp->ibqp.event_handler; + init_attr->qp_context =3D vqp->ibqp.qp_context; + init_attr->send_cq =3D vqp->ibqp.send_cq; + init_attr->recv_cq =3D vqp->ibqp.recv_cq; + init_attr->srq =3D vqp->ibqp.srq; + init_attr->xrcd =3D NULL; + init_attr->cap =3D attr->cap; + init_attr->sq_sig_type =3D 0; + init_attr->qp_type =3D vqp->ibqp.qp_type; + init_attr->create_flags =3D 0; + init_attr->port_num =3D vqp->port; + + kfree(cmd); + kfree(rsp); + return rc; +} + +/* This verb is relevant only for InfiniBand */ +int virtio_rdma_query_pkey(struct ib_device *ibdev, u8 port, u16 index, + u16 *pkey) +{ + struct scatterlist in, out; + struct cmd_query_pkey *cmd; + struct rsp_query_pkey *rsp; + int rc; +=09 + cmd =3D kmalloc(sizeof(*cmd), GFP_ATOMIC); + if (!cmd) + return -ENOMEM; + + rsp =3D kmalloc(sizeof(*rsp), GFP_ATOMIC); + if (!rsp) { + kfree(cmd); + return -ENOMEM; + } + + cmd->port =3D port; + cmd->index =3D index; + + sg_init_one(&in, cmd, sizeof(*cmd)); + sg_init_one(&out, rsp, sizeof(*rsp)); + + rc =3D virtio_rdma_exec_cmd(to_vdev(ibdev), VIRTIO_CMD_QUERY_PKEY, + &in, &out); + + *pkey =3D rsp->pkey; +=09 + kfree(cmd); + kfree(rsp); + return rc; +} + +int virtio_rdma_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc = *wc) +{ + struct virtio_rdma_cq *vcq =3D to_vcq(ibcq); + struct virtio_rdma_cqe *cqe; + int i =3D 0; + unsigned long flags; + + spin_lock_irqsave(&vcq->lock, flags); + while (i < num_entries && vcq->cqe_get < vcq->cqe_put) { + cqe =3D &vcq->queue[vcq->cqe_get]; + + wc[i].wr_id =3D cqe->wr_id; + wc[i].status =3D cqe->status; + wc[i].opcode =3D cqe->opcode; + wc[i].vendor_err =3D cqe->vendor_err; + wc[i].byte_len =3D cqe->byte_len; + // TODO: wc[i].qp + wc[i].ex.imm_data =3D cqe->imm_data; + wc[i].src_qp =3D cqe->src_qp; + wc[i].slid =3D cqe->slid; + wc[i].wc_flags =3D cqe->wc_flags; + wc[i].pkey_index =3D cqe->pkey_index; + wc[i].sl =3D cqe->sl; + wc[i].dlid_path_bits =3D cqe->dlid_path_bits; + + vcq->cqe_get++; + i++; + } + spin_unlock_irqrestore(&vcq->lock, flags); + return i; +} + +int virtio_rdma_post_recv(struct ib_qp *ibqp, const struct ib_recv_wr *wr, + const struct ib_recv_wr **bad_wr) +{ + struct scatterlist *sgs[3], hdr, status_sg, *sge_sg; + struct virtio_rdma_qp *vqp =3D to_vqp(ibqp); + struct cmd_post_recv *cmd; + int *status, rc =3D 0; + unsigned tmp; + + // TODO: mad support + if (vqp->ibqp.qp_type =3D=3D IB_QPT_GSI || vqp->ibqp.qp_type =3D=3D IB_QP= T_SMI) + return 0; + + // TODO: more than one wr + // TODO: check bad wr + spin_lock(&vqp->rq->lock); + status =3D &vqp->r_status; + cmd =3D vqp->r_cmd; + + cmd->qpn =3D to_vqp(ibqp)->qp_handle; + cmd->is_kernel =3D vqp->type =3D=3D VIRTIO_RDMA_TYPE_KERNEL; + cmd->num_sge =3D wr->num_sge; + cmd->wr_id =3D wr->wr_id; + + sg_init_one(&hdr, cmd, sizeof(*cmd)); + sgs[0] =3D &hdr; + // TODO: num_sge is zero + sge_sg =3D init_sg(wr->sg_list, sizeof(*wr->sg_list) * wr->num_sge); + sgs[1] =3D sge_sg; + sg_init_one(&status_sg, status, sizeof(*status)); + sgs[2] =3D &status_sg; + + rc =3D virtqueue_add_sgs(vqp->rq->vq, sgs, 2, 1, vqp, GFP_ATOMIC); + if (rc) + goto out; + + if (unlikely(!virtqueue_kick(vqp->rq->vq))) { + goto out; + } + + while (!virtqueue_get_buf(vqp->rq->vq, &tmp) && + !virtqueue_is_broken(vqp->rq->vq)) + cpu_relax(); + +out: + spin_unlock(&vqp->rq->lock); + kfree(sge_sg); + return rc; +} + +int virtio_rdma_post_send(struct ib_qp *ibqp, const struct ib_send_wr *wr, + const struct ib_send_wr **bad_wr) +{ + struct scatterlist *sgs[3], hdr, status_sg, *sge_sg; + struct virtio_rdma_qp *vqp =3D to_vqp(ibqp); + struct cmd_post_send *cmd; + struct ib_sge dummy_sge; + int *status, rc =3D 0; + unsigned tmp; + + // TODO: support more than one wr + // TODO: check bad wr + if (vqp->type =3D=3D VIRTIO_RDMA_TYPE_KERNEL && + wr->opcode !=3D IB_WR_SEND && wr->opcode !=3D IB_WR_SEND_WITH_IMM && + wr->opcode !=3D IB_WR_REG_MR && + wr->opcode !=3D IB_WR_LOCAL_INV && wr->opcode !=3D IB_WR_SEND_WITH_INV) { + pr_warn("Only support op send in kernel\n"); + return -EINVAL; + } + + spin_lock(&vqp->sq->lock); + cmd =3D vqp->s_cmd; + status =3D &vqp->s_status; + + cmd->qpn =3D vqp->qp_handle; + cmd->is_kernel =3D vqp->type =3D=3D VIRTIO_RDMA_TYPE_KERNEL; + cmd->num_sge =3D wr->num_sge; + cmd->send_flags =3D wr->send_flags; + cmd->opcode =3D wr->opcode; + cmd->wr_id =3D wr->wr_id; + cmd->ex.imm_data =3D wr->ex.imm_data; + cmd->ex.invalidate_rkey =3D wr->ex.invalidate_rkey; + + switch (ibqp->qp_type) { + case IB_QPT_GSI: + case IB_QPT_UD: + pr_err("Not support UD now\n"); + rc =3D -EINVAL; + goto out; + break; + case IB_QPT_RC: + switch (wr->opcode) { + case IB_WR_RDMA_READ: + case IB_WR_RDMA_WRITE: + case IB_WR_RDMA_WRITE_WITH_IMM: + cmd->wr.rdma.remote_addr =3D + rdma_wr(wr)->remote_addr; + cmd->wr.rdma.rkey =3D rdma_wr(wr)->rkey; + break; + case IB_WR_LOCAL_INV: + case IB_WR_SEND_WITH_INV: + cmd->ex.invalidate_rkey =3D + wr->ex.invalidate_rkey; + break; + case IB_WR_ATOMIC_CMP_AND_SWP: + case IB_WR_ATOMIC_FETCH_AND_ADD: + cmd->wr.atomic.remote_addr =3D + atomic_wr(wr)->remote_addr; + cmd->wr.atomic.rkey =3D atomic_wr(wr)->rkey; + cmd->wr.atomic.compare_add =3D + atomic_wr(wr)->compare_add; + if (wr->opcode =3D=3D IB_WR_ATOMIC_CMP_AND_SWP) + cmd->wr.atomic.swap =3D + atomic_wr(wr)->swap; + break; + case IB_WR_REG_MR: + cmd->wr.reg.mrn =3D to_vmr(reg_wr(wr)->mr)->mr_handle; + cmd->wr.reg.key =3D reg_wr(wr)->key; + cmd->wr.reg.access =3D reg_wr(wr)->access; + break; + default: + break; + } + break; + default: + pr_err("Bad qp type\n"); + rc =3D -EINVAL; + *bad_wr =3D wr; + goto out; + } + + sg_init_one(&hdr, cmd, sizeof(*cmd)); + sgs[0] =3D &hdr; + /* while sg_list is null, use a dummy sge to avoid=20 + * "zero sized buffers are not allowed" + */ + if (wr->sg_list) + sge_sg =3D init_sg(wr->sg_list, sizeof(*wr->sg_list) * wr->num_sge); + else + sge_sg =3D init_sg(&dummy_sge, sizeof(dummy_sge)); + sgs[1] =3D sge_sg; + sg_init_one(&status_sg, status, sizeof(*status)); + sgs[2] =3D &status_sg; + + rc =3D virtqueue_add_sgs(vqp->sq->vq, sgs, 2, 1, vqp, GFP_ATOMIC); + if (rc) + goto out; + + if (unlikely(!virtqueue_kick(vqp->sq->vq))) { + goto out; + } + + while (!virtqueue_get_buf(vqp->sq->vq, &tmp) && + !virtqueue_is_broken(vqp->sq->vq)) + cpu_relax(); + +out: + spin_unlock(&vqp->sq->lock); + kfree(sge_sg); + return rc; +} + +int virtio_rdma_req_notify_cq(struct ib_cq *ibcq, + enum ib_cq_notify_flags flags) +{ + struct virtio_rdma_cq *vcq =3D to_vcq(ibcq); + + if ((flags & IB_CQ_SOLICITED_MASK) =3D=3D IB_CQ_SOLICITED) + /* + * Enable CQ event for next solicited completion. + * and make it visible to all associated producers. + */ + smp_store_mb(vcq->notify_flags, VIRTIO_RDMA_NOTIFY_SOLICITED); + else + /* + * Enable CQ event for any signalled completion. + * and make it visible to all associated producers. + */ + smp_store_mb(vcq->notify_flags, VIRTIO_RDMA_NOTIFY_ALL); + + if (flags & IB_CQ_REPORT_MISSED_EVENTS) + return vcq->cqe_put - vcq->cqe_get; + + return 0; +} + +static const struct ib_device_ops virtio_rdma_dev_ops =3D { + .owner =3D THIS_MODULE, + .driver_id =3D RDMA_DRIVER_VIRTIO, + + .get_port_immutable =3D virtio_rdma_port_immutable, + .query_device =3D virtio_rdma_query_device, + .query_port =3D virtio_rdma_query_port, + .get_netdev =3D virtio_rdma_get_netdev, + .create_cq =3D virtio_rdma_create_cq, + .destroy_cq =3D virtio_rdma_destroy_cq, + .alloc_pd =3D virtio_rdma_alloc_pd, + .dealloc_pd =3D virtio_rdma_dealloc_pd, + .get_dma_mr =3D virtio_rdma_get_dma_mr, + .create_qp =3D virtio_rdma_create_qp, + .query_gid =3D virtio_rdma_query_gid, + .add_gid =3D virtio_rdma_add_gid, + .alloc_mr =3D virtio_rdma_alloc_mr, + .alloc_ucontext =3D virtio_rdma_alloc_ucontext, + .create_ah =3D virtio_rdma_create_ah, + .dealloc_ucontext =3D virtio_rdma_dealloc_ucontext, + .del_gid =3D virtio_rdma_del_gid, + .dereg_mr =3D virtio_rdma_dereg_mr, + .destroy_ah =3D virtio_rdma_destroy_ah, + .destroy_qp =3D virtio_rdma_destroy_qp, + .get_dev_fw_str =3D virtio_rdma_get_fw_ver_str, + .get_link_layer =3D virtio_rdma_port_link_layer, + .map_mr_sg =3D virtio_rdma_map_mr_sg, + .mmap =3D virtio_rdma_mmap, + .modify_port =3D virtio_rdma_modify_port, + .modify_qp =3D virtio_rdma_modify_qp, + .poll_cq =3D virtio_rdma_poll_cq, + .post_recv =3D virtio_rdma_post_recv, + .post_send =3D virtio_rdma_post_send, + .query_device =3D virtio_rdma_query_device, + .query_pkey =3D virtio_rdma_query_pkey, + .query_qp =3D virtio_rdma_query_qp, + .reg_user_mr =3D virtio_rdma_reg_user_mr, + .req_notify_cq =3D virtio_rdma_req_notify_cq, + + INIT_RDMA_OBJ_SIZE(ib_ah, virtio_rdma_ah, ibah), + INIT_RDMA_OBJ_SIZE(ib_cq, virtio_rdma_cq, ibcq), + INIT_RDMA_OBJ_SIZE(ib_pd, virtio_rdma_pd, ibpd), + // INIT_RDMA_OBJ_SIZE(ib_srq, virtio_rdma_srq, base_srq), + INIT_RDMA_OBJ_SIZE(ib_ucontext, virtio_rdma_ucontext, ibucontext), +}; + +static ssize_t hca_type_show(struct device *device, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "VRDMA-%s\n", VIRTIO_RDMA_DRIVER_VER); +} +static DEVICE_ATTR_RO(hca_type); + +static ssize_t hw_rev_show(struct device *device, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%d\n", VIRTIO_RDMA_HW_REV); +} +static DEVICE_ATTR_RO(hw_rev); + +static ssize_t board_id_show(struct device *device, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "%d\n", VIRTIO_RDMA_BOARD_ID); +} +static DEVICE_ATTR_RO(board_id); + +static struct attribute *virtio_rdma_class_attributes[] =3D { + &dev_attr_hw_rev.attr, + &dev_attr_hca_type.attr, + &dev_attr_board_id.attr, + NULL, +}; + +static const struct attribute_group virtio_rdma_attr_group =3D { + .attrs =3D virtio_rdma_class_attributes, +}; + +int virtio_rdma_register_ib_device(struct virtio_rdma_dev *ri) +{ + int rc; + struct ib_device *dev =3D &ri->ib_dev; + + strlcpy(dev->node_desc, "VirtIO RDMA", sizeof(dev->node_desc)); + + dev->dev.dma_ops =3D &dma_virt_ops; + + dev->num_comp_vectors =3D 1; + dev->dev.parent =3D ri->vdev->dev.parent; + dev->node_type =3D RDMA_NODE_IB_CA; + dev->phys_port_cnt =3D 1; + dev->uverbs_cmd_mask =3D + (1ull << IB_USER_VERBS_CMD_QUERY_DEVICE) | + (1ull << IB_USER_VERBS_CMD_QUERY_PORT) | + (1ull << IB_USER_VERBS_CMD_CREATE_CQ) | + (1ull << IB_USER_VERBS_CMD_DESTROY_CQ) | + (1ull << IB_USER_VERBS_CMD_ALLOC_PD) | + (1ull << IB_USER_VERBS_CMD_DEALLOC_PD); + + ib_set_device_ops(dev, &virtio_rdma_dev_ops); + ib_device_set_netdev(dev, ri->netdev, 1); + rdma_set_device_sysfs_group(dev, &virtio_rdma_attr_group); + + rc =3D ib_register_device(dev, "virtio_rdma%d"); + + memcpy(&dev->node_guid, dev->name, 6); + return rc; +} + +void fini_ib(struct virtio_rdma_dev *ri) +{ + ib_unregister_device(&ri->ib_dev); +} diff --git a/drivers/infiniband/hw/virtio/virtio_rdma_ib.h b/drivers/infini= band/hw/virtio/virtio_rdma_ib.h new file mode 100644 index 000000000000..ff5d6a41db4d --- /dev/null +++ b/drivers/infiniband/hw/virtio/virtio_rdma_ib.h @@ -0,0 +1,237 @@ +/* + * Virtio RDMA device: IB related functions and data + * + * Copyright (C) 2019 Yuval Shaia Oracle Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 U= SA + */ + +#ifndef __VIRTIO_RDMA_IB__ +#define __VIRTIO_RDMA_IB__ + +#include + +#include + +enum virtio_rdma_type { + VIRTIO_RDMA_TYPE_USER, + VIRTIO_RDMA_TYPE_KERNEL +}; + +struct virtio_rdma_pd { + struct ib_pd ibpd; + u32 pd_handle; + enum virtio_rdma_type type; +}; + +struct virtio_rdma_mr { + struct ib_mr ibmr; + struct ib_umem *umem; + + u32 mr_handle; + enum virtio_rdma_type type; + u64 iova; + u64 size; + + u64 *pages; + dma_addr_t dma_pages; + u32 npages; + u32 max_pages; +}; + +struct virtio_rdma_vq { + struct virtqueue* vq; + spinlock_t lock; + char name[16]; + int idx; +}; + +struct virtio_rdma_cqe { + uint64_t wr_id; + enum ib_wc_status status; + enum ib_wc_opcode opcode; + uint32_t vendor_err; + uint32_t byte_len; + uint32_t imm_data; + uint32_t qp_num; + uint32_t src_qp; + int wc_flags; + uint16_t pkey_index; + uint16_t slid; + uint8_t sl; + uint8_t dlid_path_bits; +}; + +enum { + VIRTIO_RDMA_NOTIFY_NOT =3D (0), + VIRTIO_RDMA_NOTIFY_SOLICITED =3D (1 << 0), + VIRTIO_RDMA_NOTIFY_NEXT_COMPLETION =3D (1 << 1), + VIRTIO_RDMA_NOTIFY_MISSED_EVENTS =3D (1 << 2), + VIRTIO_RDMA_NOTIFY_ALL =3D VIRTIO_RDMA_NOTIFY_SOLICITED | VIRTIO_RDMA_NOT= IFY_NEXT_COMPLETION | + VIRTIO_RDMA_NOTIFY_MISSED_EVENTS +}; + +struct virtio_rdma_cq { + struct ib_cq ibcq; + u32 cq_handle; + + struct virtio_rdma_vq *vq; + + spinlock_t lock; + struct virtio_rdma_cqe *queue; + u32 cqe_enqueue; + u32 cqe_put; + u32 cqe_get; + u32 num_cqe; + + u32 notify_flags; + atomic_t cqe_cnt; +}; + +struct virtio_rdma_qp { + struct ib_qp ibqp; + u32 qp_handle; + enum virtio_rdma_type type; + u8 port; + + struct virtio_rdma_vq *sq; + int s_status; + struct cmd_post_send *s_cmd; + + struct virtio_rdma_vq *rq; + int r_status; + struct cmd_post_recv *r_cmd; +}; + +struct virtio_rdma_global_route { + union ib_gid dgid; + uint32_t flow_label; + uint8_t sgid_index; + uint8_t hop_limit; + uint8_t traffic_class; +}; + +struct virtio_rdma_ah_attr { + struct virtio_rdma_global_route grh; + uint16_t dlid; + uint8_t sl; + uint8_t src_path_bits; + uint8_t static_rate; + uint8_t port_num; +}; + +struct virtio_rdma_qp_cap { + uint32_t max_send_wr; + uint32_t max_recv_wr; + uint32_t max_send_sge; + uint32_t max_recv_sge; + uint32_t max_inline_data; +}; + +struct virtio_rdma_qp_attr { + enum ib_qp_state qp_state; + enum ib_qp_state cur_qp_state; + enum ib_mtu path_mtu; + enum ib_mig_state path_mig_state; + uint32_t qkey; + uint32_t rq_psn; + uint32_t sq_psn; + uint32_t dest_qp_num; + uint32_t qp_access_flags; + uint16_t pkey_index; + uint16_t alt_pkey_index; + uint8_t en_sqd_async_notify; + uint8_t sq_draining; + uint8_t max_rd_atomic; + uint8_t max_dest_rd_atomic; + uint8_t min_rnr_timer; + uint8_t port_num; + uint8_t timeout; + uint8_t retry_cnt; + uint8_t rnr_retry; + uint8_t alt_port_num; + uint8_t alt_timeout; + uint32_t rate_limit; + struct virtio_rdma_qp_cap cap; + struct virtio_rdma_ah_attr ah_attr; + struct virtio_rdma_ah_attr alt_ah_attr; +}; + +struct virtio_rdma_uar_map { + unsigned long pfn; + void __iomem *map; + int index; +}; + +struct virtio_rdma_ucontext { + struct ib_ucontext ibucontext; + struct virtio_rdma_dev *dev; + struct virtio_rdma_uar_map uar; + __u64 ctx_handle; +}; + +struct virtio_rdma_av { + __u32 port_pd; + __u32 sl_tclass_flowlabel; + __u8 dgid[16]; + __u8 src_path_bits; + __u8 gid_index; + __u8 stat_rate; + __u8 hop_limit; + __u8 dmac[6]; + __u8 reserved[6]; +}; + +struct virtio_rdma_ah { + struct ib_ah ibah; + struct virtio_rdma_av av; +}; + +void virtio_rdma_cq_ack(struct virtqueue *vq); + +static inline struct virtio_rdma_ah *to_vah(struct ib_ah *ibah) +{ + return container_of(ibah, struct virtio_rdma_ah, ibah); +} + +static inline struct virtio_rdma_pd *to_vpd(struct ib_pd *ibpd) +{ + return container_of(ibpd, struct virtio_rdma_pd, ibpd); +} + +static inline struct virtio_rdma_cq *to_vcq(struct ib_cq *ibcq) +{ + return container_of(ibcq, struct virtio_rdma_cq, ibcq); +} + +static inline struct virtio_rdma_qp *to_vqp(struct ib_qp *ibqp) +{ + return container_of(ibqp, struct virtio_rdma_qp, ibqp); +} + +static inline struct virtio_rdma_mr *to_vmr(struct ib_mr *ibmr) +{ + return container_of(ibmr, struct virtio_rdma_mr, ibmr); +} + +static inline struct virtio_rdma_ucontext *to_vucontext(struct ib_ucontext= *ibucontext) +{ + return container_of(ibucontext, struct virtio_rdma_ucontext, ibucontext); +} + +int virtio_rdma_register_ib_device(struct virtio_rdma_dev *ri); +void fini_ib(struct virtio_rdma_dev *ri); + +#endif diff --git a/drivers/infiniband/hw/virtio/virtio_rdma_main.c b/drivers/infi= niband/hw/virtio/virtio_rdma_main.c new file mode 100644 index 000000000000..8f467ee62cf2 --- /dev/null +++ b/drivers/infiniband/hw/virtio/virtio_rdma_main.c @@ -0,0 +1,152 @@ +/* + * Virtio RDMA device + * + * Copyright (C) 2019 Yuval Shaia Oracle Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 U= SA + */ + +#include +#include +#include +#include +#include +#include +#include + +#include "virtio_rdma.h" +#include "virtio_rdma_device.h" +#include "virtio_rdma_ib.h" +#include "virtio_rdma_netdev.h" + +/* TODO: + * - How to hook to unload driver, we need to undo all the stuff with did + * for all the devices that probed + * - + */ + +static int virtio_rdma_probe(struct virtio_device *vdev) +{ + struct virtio_rdma_dev *ri; + int rc =3D -EIO; + + ri =3D ib_alloc_device(virtio_rdma_dev, ib_dev); + if (!ri) { + pr_err("Fail to allocate IB device\n"); + rc =3D -ENOMEM; + goto out; + } + vdev->priv =3D ri; + + ri->vdev =3D vdev; + + spin_lock_init(&ri->ctrl_lock); + + rc =3D init_device(ri); + if (rc) { + pr_err("Fail to connect to device\n"); + goto out_dealloc_ib_device; + } + + rc =3D init_netdev(ri); + if (rc) { + pr_err("Fail to connect to NetDev layer\n"); + goto out_fini_device; + } + + rc =3D virtio_rdma_register_ib_device(ri); + if (rc) { + pr_err("Fail to connect to IB layer\n"); + goto out_fini_netdev; + } + + pr_info("VirtIO RDMA device %d probed\n", vdev->index); + + goto out; + +out_fini_netdev: + fini_netdev(ri); + +out_fini_device: + fini_device(ri); + +out_dealloc_ib_device: + ib_dealloc_device(&ri->ib_dev); + + vdev->priv =3D NULL; + +out: + return rc; +} + +static void virtio_rdma_remove(struct virtio_device *vdev) +{ + struct virtio_rdma_dev *ri =3D vdev->priv; + + if (!ri) + return; + + vdev->priv =3D NULL; + + fini_ib(ri); + + fini_netdev(ri); + + fini_device(ri); + + ib_dealloc_device(&ri->ib_dev); + + pr_info("VirtIO RDMA device %d removed\n", vdev->index); +} + +static struct virtio_device_id id_table[] =3D { + { VIRTIO_ID_RDMA, VIRTIO_DEV_ANY_ID }, + { 0 }, +}; + +static struct virtio_driver virtio_rdma_driver =3D { + .driver.name =3D KBUILD_MODNAME, + .driver.owner =3D THIS_MODULE, + .id_table =3D id_table, + .probe =3D virtio_rdma_probe, + .remove =3D virtio_rdma_remove, +}; + +static int __init virtio_rdma_init(void) +{ + int rc; + + rc =3D register_virtio_driver(&virtio_rdma_driver); + if (rc) { + pr_err("%s: Fail to register virtio driver (%d)\n", __func__, + rc); + return rc; + } + + return 0; +} + +static void __exit virtio_rdma_fini(void) +{ + unregister_virtio_driver(&virtio_rdma_driver); +} + +module_init(virtio_rdma_init); +module_exit(virtio_rdma_fini); + +MODULE_DEVICE_TABLE(virtio, id_table); +MODULE_AUTHOR("Yuval Shaia, Junji Wei"); +MODULE_DESCRIPTION("Virtio RDMA driver"); +MODULE_LICENSE("Dual BSD/GPL"); diff --git a/drivers/infiniband/hw/virtio/virtio_rdma_netdev.c b/drivers/in= finiband/hw/virtio/virtio_rdma_netdev.c new file mode 100644 index 000000000000..641a07b630bd --- /dev/null +++ b/drivers/infiniband/hw/virtio/virtio_rdma_netdev.c @@ -0,0 +1,68 @@ +/* + * Virtio RDMA device + * + * Copyright (C) 2019 Yuval Shaia Oracle Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 U= SA + */ + +#include +#include +#include + +#include "../../../virtio/virtio_pci_common.h" +#include "virtio_rdma_netdev.h" + +int init_netdev(struct virtio_rdma_dev *ri) +{ + struct pci_dev* pdev_net; + struct virtio_pci_device *vp_dev =3D to_vp_device(ri->vdev); + struct virtio_pci_device *vnet_pdev; + void* priv; + + pdev_net =3D pci_get_slot(vp_dev->pci_dev->bus, PCI_DEVFN(PCI_SLOT(vp_dev= ->pci_dev->devfn), 0)); + if (!pdev_net) { + pr_err("failed to find paired net device\n"); + return -ENODEV; + } + + if (pdev_net->vendor !=3D PCI_VENDOR_ID_REDHAT_QUMRANET || + pdev_net->subsystem_device !=3D VIRTIO_ID_NET) { + pr_err("failed to find paired virtio-net device\n"); + pci_dev_put(pdev_net); + return -ENODEV; + } + + vnet_pdev =3D pci_get_drvdata(pdev_net); + pci_dev_put(pdev_net); + + priv =3D vnet_pdev->vdev.priv; + /* get netdev from virtnet_info, which is netdev->priv */ + ri->netdev =3D priv - ALIGN(sizeof(struct net_device), NETDEV_ALIGN); + if (!ri->netdev) { + pr_err("failed to get backend net device\n"); + return -ENODEV; + } + dev_hold(ri->netdev); + return 0; +} + +void fini_netdev(struct virtio_rdma_dev *ri) +{ + if (ri->netdev) { + dev_put(ri->netdev); + ri->netdev =3D NULL; + } +} diff --git a/drivers/infiniband/hw/virtio/virtio_rdma_netdev.h b/drivers/in= finiband/hw/virtio/virtio_rdma_netdev.h new file mode 100644 index 000000000000..d9ca263f8bff --- /dev/null +++ b/drivers/infiniband/hw/virtio/virtio_rdma_netdev.h @@ -0,0 +1,29 @@ +/* + * Virtio RDMA device: Netdev related functions and data + * + * Copyright (C) 2019 Yuval Shaia Oracle Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 U= SA + */ + +#ifndef __VIRTIO_RDMA_NETDEV__ +#define __VIRTIO_RDMA_NETDEV__ + +#include "virtio_rdma.h" + +int init_netdev(struct virtio_rdma_dev *ri); +void fini_netdev(struct virtio_rdma_dev *ri); + +#endif diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_id= s.h index 70a8057ad4bb..7dba3cd48e72 100644 --- a/include/uapi/linux/virtio_ids.h +++ b/include/uapi/linux/virtio_ids.h @@ -55,6 +55,7 @@ #define VIRTIO_ID_FS 26 /* virtio filesystem */ #define VIRTIO_ID_PMEM 27 /* virtio pmem */ #define VIRTIO_ID_MAC80211_HWSIM 29 /* virtio mac80211-hwsim */ +#define VIRTIO_ID_RDMA 30 /* virtio rdma */ #define VIRTIO_ID_BT 40 /* virtio bluetooth */ =20 /* --=20 2.11.0 From nobody Sun Apr 28 23:27:47 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=bytedance.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1630591800823897.732076824175; Thu, 2 Sep 2021 07:10:00 -0700 (PDT) Received: from localhost ([::1]:48026 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mLnPj-0007fo-MW for importer@patchew.org; Thu, 02 Sep 2021 10:09:59 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:41878) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mLmRv-0002Pc-U3 for qemu-devel@nongnu.org; Thu, 02 Sep 2021 09:08:11 -0400 Received: from mail-pl1-x62a.google.com ([2607:f8b0:4864:20::62a]:46794) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mLmRt-0001Bt-K7 for qemu-devel@nongnu.org; Thu, 02 Sep 2021 09:08:11 -0400 Received: by mail-pl1-x62a.google.com with SMTP id bg1so1133567plb.13 for ; Thu, 02 Sep 2021 06:08:08 -0700 (PDT) Received: from C02FR1DUMD6V.bytedance.net ([139.177.225.225]) by smtp.gmail.com with ESMTPSA id d6sm2307415pfa.135.2021.09.02.06.08.02 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 02 Sep 2021 06:08:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=XcVnJQEhgtcOxINMlj+i4YpyBZo/hQgIS6EKROvLd1Q=; b=JTNEABJwi7PZEx0z3/poMSdlNLxtNbh+iR5h20eqUYOzqueYfnt4PG1NWoRArNjWwQ R2o+n43Fl0fn5PqMCJyTu26mBKGLipO72tl7uAbOH0vDm3CEg9iJ5lFKS9m8WdWM4CCM kJXbk+OJpe+8h0i6hvUj28hE2dZ9+k0MjpVZqnPx2ParuGB7T5YgYKSFCJ0cOeFyyQ5e fW0tCSJpR3H4R7IG60bBME7uXGiPGqD8qMRg1AY3jihY/9JGQr4WIviQyDoZuzHeoqpb fVbRwmhkSzsAdcADx5FV/8N6mbuA9kPZajJi5Nkxxjz3z5ET0la7JI2v03vYYmtG6dHS i2yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=XcVnJQEhgtcOxINMlj+i4YpyBZo/hQgIS6EKROvLd1Q=; b=ldpQmgvN0rnXplsoitARcS76Y1fcqj2kSaOt0nmAqjgw4p3JVudgZK+mjPFJ5Sob5I +9KBC9uuC1SAy0LkRKp4QtGzSw4ir7XrGUfIQW0aIbrU4HBEkuGvhOry7BpowyRJC4RB lbZ0ZznAYc4cPj5kQyVqyU9hXNP9r7KeKPYQc8+durjBycGRsqlhAi7DtttejD3sdZPh TrwYuSBrsy7bW8V2c+oFYOCky3PPUKHrD+GZl+YVz4v4KcWPjeRyuYEXSBrJUZWcRa3x EFLhceDglT3hVRfsXUoEa7Bam4Fo3kBvpt55lQE8RWF7RWIVgbLlK9mpM5zC62gz/o4o brXQ== X-Gm-Message-State: AOAM530RqmoGZeKTOnNy+xG/vGBixp2BHzqV0aeyf7mv9g2nBYZPujDP /QE1i6Qfc78w30onNgQBP55W5g== X-Google-Smtp-Source: ABdhPJyOa8c/fisZvhGLKr6cJ+Z9Gq1L8p8QU4yJI1LbZ3QlsbzL7rLS2qxGOLhnro5Yb4pTJowjyA== X-Received: by 2002:a17:90a:3ec4:: with SMTP id k62mr3901890pjc.32.1630588087548; Thu, 02 Sep 2021 06:08:07 -0700 (PDT) From: Junji Wei To: dledford@redhat.com, jgg@ziepe.ca, mst@redhat.com, jasowang@redhat.com, yuval.shaia.ml@gmail.com, marcel.apfelbaum@gmail.com, cohuck@redhat.com, hare@suse.de Subject: [RFC 3/5] RDMA/virtio-rdma: VirtIO RDMA test module Date: Thu, 2 Sep 2021 21:06:23 +0800 Message-Id: <20210902130625.25277-4-weijunji@bytedance.com> X-Mailer: git-send-email 2.30.1 (Apple Git-130) In-Reply-To: <20210902130625.25277-1-weijunji@bytedance.com> References: <20210902130625.25277-1-weijunji@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::62a; envelope-from=weijunji@bytedance.com; helo=mail-pl1-x62a.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Thu, 02 Sep 2021 10:08:37 -0400 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-rdma@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, xieyongji@bytedance.com, chaiwen.cc@bytedance.com, weijunji@bytedance.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1630591803263100003 Content-Type: text/plain; charset="utf-8" This is a test module for virtio-rdma, it can work with rc_pingpong server included in rdma-core. Signed-off-by: Junji Wei --- drivers/infiniband/hw/virtio/Makefile | 1 + .../hw/virtio/virtio_rdma_rc_pingpong_client.c | 477 +++++++++++++++++= ++++ 2 files changed, 478 insertions(+) create mode 100644 drivers/infiniband/hw/virtio/virtio_rdma_rc_pingpong_cl= ient.c diff --git a/drivers/infiniband/hw/virtio/Makefile b/drivers/infiniband/hw/= virtio/Makefile index fb637e467167..eb72a0aa48f3 100644 --- a/drivers/infiniband/hw/virtio/Makefile +++ b/drivers/infiniband/hw/virtio/Makefile @@ -1,4 +1,5 @@ obj-$(CONFIG_INFINIBAND_VIRTIO_RDMA) +=3D virtio_rdma.o +obj-m :=3D virtio_rdma_rc_pingpong_client.o =20 virtio_rdma-y :=3D virtio_rdma_main.o virtio_rdma_device.o virtio_rdma_ib.= o \ virtio_rdma_netdev.o diff --git a/drivers/infiniband/hw/virtio/virtio_rdma_rc_pingpong_client.c = b/drivers/infiniband/hw/virtio/virtio_rdma_rc_pingpong_client.c new file mode 100644 index 000000000000..d1a38fe8f8cd --- /dev/null +++ b/drivers/infiniband/hw/virtio/virtio_rdma_rc_pingpong_client.c @@ -0,0 +1,477 @@ +/* + * Virtio RDMA device: Test client + * + * Copyright (C) 2021 Junji Wei Bytedance Inc. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 U= SA + */ + +#include +#include +#include + +#include +#include +#include +#include + +#include + +#include +#include +#include "../../core/uverbs.h" + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Junji Wei"); +MODULE_DESCRIPTION("Virtio rdma test module"); +MODULE_VERSION("0.01"); + +#define SERVER_ADDR "10.131.251.125" +#define SERVER_PORT 18515 + +#define RX_DEPTH 500 +#define ITER 500 +#define PAGES 5 + +struct pingpong_dest { + int lid; + int out_reads; + int qpn; + int psn; + unsigned rkey; + unsigned long long vaddr; + union ib_gid gid; + unsigned srqn; + int gid_index; +}; + +static struct ib_device* open_dev(char* path) +{ + struct ib_device *ib_dev; + struct ib_uverbs_file *file; + struct file* filp; + struct ib_port_attr port_attr; + int rc; + + filp =3D filp_open(path, O_RDWR | O_CLOEXEC, 0); + if (!filp) + pr_err("Open failed\n"); + + file =3D filp->private_data; + ib_dev =3D file->device->ib_dev; + if (!ib_dev) + pr_err("Get ib_dev failed\n"); + + pr_info("Open ib_device %s\n", ib_dev->node_desc); + + /* test query_port */ + rc =3D ib_query_port(ib_dev, 1, &port_attr); + if (rc) + pr_err("Query port failed\n"); + pr_info("Port gid_tbl_len %d\n", port_attr.gid_tbl_len); + + return ib_dev; +} + +static struct socket* ethernet_client_connect(void) +{ + struct socket *sock; + struct sockaddr_in s_addr; + int ret; + + memset(&s_addr,0,sizeof(s_addr)); + s_addr.sin_family=3DAF_INET; + s_addr.sin_port=3Dhtons(SERVER_PORT); + =20 + s_addr.sin_addr.s_addr =3D in_aton(SERVER_ADDR); + sock =3D (struct socket *)kmalloc(sizeof(struct socket), GFP_KERNEL); + + /*create a socket*/ + ret =3D sock_create_kern(&init_net, AF_INET, SOCK_STREAM, 0, &sock); + if (ret < 0) { + pr_err("client: socket create error\n"); + } + pr_info("client: socket create ok\n"); + + /*connect server*/ + ret =3D sock->ops->connect(sock, (struct sockaddr *)&s_addr, sizeof(s_= addr), 0); + if (ret) { + pr_err("client: connect error\n"); + return NULL; + } + pr_info("client: connect ok\n"); + + return sock; +} + +static int ethernet_read_data(struct socket *sock, char* buf, int size) { + struct kvec vec; + struct msghdr msg; + int ret; + + memset(&vec,0,sizeof(vec)); + memset(&msg,0,sizeof(msg)); + vec.iov_base =3D buf; + vec.iov_len =3D size; + + ret =3D kernel_recvmsg(sock, &msg, &vec, 1, size, 0); + if (ret < 0) { + pr_err("read failed\n"); + return ret; + } + return ret; +} + +static int ethernet_write_data(struct socket *sock, char* buf, int size) {= =20 + struct kvec vec; + struct msghdr msg; + int ret; + + vec.iov_base =3D buf; + vec.iov_len =3D size; + + memset(&msg,0,sizeof(msg)); + + ret =3D kernel_sendmsg(sock, &msg, &vec, 1, size); + if (ret < 0) { + pr_err("kernel_sendmsg error\n"); + return ret; + }else if(ret !=3D size){ + pr_info("write ret !=3D size"); + } + + pr_info("send success\n"); + return ret; +} + +static void gid_to_wire_gid(const union ib_gid *gid, char wgid[]) +{ + uint32_t tmp_gid[4]; + int i; + + memcpy(tmp_gid, gid, sizeof(tmp_gid)); + for (i =3D 0; i < 4; ++i) + sprintf(&wgid[i * 8], "%08x", cpu_to_be32(tmp_gid[i])); +} + +void wire_gid_to_gid(const char *wgid, union ib_gid *gid) +{ + char tmp[9]; + __be32 v32; + int i; + uint32_t tmp_gid[4]; + + for (tmp[8] =3D 0, i =3D 0; i < 4; ++i) { + memcpy(tmp, wgid + i * 8, 8); + sscanf(tmp, "%x", &v32); + tmp_gid[i] =3D be32_to_cpu(v32); + } + memcpy(gid, tmp_gid, sizeof(*gid)); +} + +static struct pingpong_dest *pp_client_exch_dest(const struct pingpong_des= t *my_dest) +{ + struct socket* sock; + char msg[sizeof "0000:000000:000000:00000000000000000000000000000000"]; + struct pingpong_dest *rem_dest =3D NULL; + char gid[33]; + + sock =3D ethernet_client_connect(); + if (!sock) { + return NULL; + } + + gid_to_wire_gid(&my_dest->gid, gid); + sprintf(msg, "%04x:%06x:%06x:%s", my_dest->lid, my_dest->qpn, + my_dest->psn, gid); + pr_info("Local %s\n", msg); + if (ethernet_write_data(sock, msg, sizeof msg) !=3D sizeof msg) { + pr_err("Couldn't send local address\n"); + goto out; + } + + if (ethernet_read_data(sock, msg, sizeof msg) !=3D sizeof msg || + ethernet_write_data(sock, "done", sizeof "done") !=3D sizeof "done") { + pr_err("Couldn't read/write remote address\n"); + goto out; + } + + rem_dest =3D kmalloc(sizeof *rem_dest, GFP_KERNEL); + if (!rem_dest) + goto out; + + pr_info("Remote %s\n", msg); + sscanf(msg, "%x:%x:%x:%s", &rem_dest->lid, &rem_dest->qpn, + &rem_dest->psn, gid); + wire_gid_to_gid(gid, &rem_dest->gid); + +out: + return rem_dest; +} + +static int __init rdma_test_init(void) { + struct ib_device* ib_dev; + struct ib_pd* pd; + struct ib_mr *mr, *mr_recv; + uint64_t dma_addr, dma_addr_recv; + struct scatterlist sg; + struct scatterlist sgr; + const struct ib_cq_init_attr cq_attr =3D { 64, 0, 0 }; + struct ib_cq *cq; + struct ib_qp *qp; + struct ib_qp_init_attr qp_init_attr =3D { + .event_handler =3D NULL, + .qp_context =3D NULL, + .srq =3D NULL, + .xrcd =3D NULL, + .cap =3D { + RX_DEPTH, RX_DEPTH, 1, 1, -1, 0 + }, + .sq_sig_type =3D IB_SIGNAL_ALL_WR, + .qp_type =3D IB_QPT_RC, + .create_flags =3D 0, + .port_num =3D 0, + .rwq_ind_tbl =3D NULL, + .source_qpn =3D 0 + }; + struct ib_qp_attr qp_attr =3D {}; + struct ib_port_attr port_attr; + struct pingpong_dest my_dest; + struct pingpong_dest *rem_dest; + int mask, rand_num, iter; + struct ib_rdma_wr swr; + const struct ib_send_wr *bad_swr; + struct ib_recv_wr rwr; + const struct ib_recv_wr *bad_rwr; + struct ib_sge wsge[1], rsge[1]; + uint64_t *addr_send, *addr_recv; + int i, wc_got; + struct ib_wc wc[2]; + struct ib_reg_wr reg_wr; + + ktime_t t0; + uint64_t rt; + int wc_total =3D 0; + + pr_info("Start rdma test\n"); + pr_info("Normal address: 0x%lu -- 0x%px\n", MAX_DMA_ADDRESS, high_memo= ry); + =20 + ib_dev =3D open_dev("/dev/infiniband/uverbs0"); + + pd =3D ib_alloc_pd(ib_dev, 0); + if (!pd) { + pr_err("alloc_pd failed\n"); + return -ENOMEM; + } + + mr =3D ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG, PAGES); + mr_recv =3D ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG, PAGES); + if (!mr || !mr_recv) { + pr_err("alloc_mr failed\n"); + return -EIO; + } + + addr_send =3D ib_dma_alloc_coherent(ib_dev, PAGE_SIZE * PAGES, &dma_ad= dr, GFP_KERNEL); + memset((char*)addr_send, '?', 4096 * PAGES); + sg_dma_address(&sg) =3D dma_addr; + sg_dma_len(&sg) =3D PAGE_SIZE * PAGES; + ib_map_mr_sg(mr, &sg, 1, NULL, PAGE_SIZE); + + addr_recv =3D ib_dma_alloc_coherent(ib_dev, PAGE_SIZE * PAGES, &dma_ad= dr_recv, GFP_KERNEL); + sg_dma_address(&sgr) =3D dma_addr_recv; + sg_dma_len(&sgr) =3D PAGE_SIZE * PAGES; + ib_map_mr_sg(mr_recv, &sgr, 1, NULL, PAGE_SIZE); + + memset((char*)addr_recv, 'x', 4096 * PAGES); + strcpy((char*)addr_recv, "hello world"); + pr_info("Before %s\n", (char*)addr_send); + pr_info("Before %s\n", (char*)addr_recv); + + cq =3D ib_create_cq(ib_dev, NULL, NULL, NULL, &cq_attr); + if (!cq) { + pr_err("create_cq failed\n"); + } + + qp_init_attr.send_cq =3D cq; + qp_init_attr.recv_cq =3D cq; + pr_info("qp type: %d\n", qp_init_attr.qp_type); + qp =3D ib_create_qp(pd, &qp_init_attr); + if (!qp) { + pr_err("create_qp failed\n"); + } + + // modify to init + memset(&qp_attr, 0, sizeof(qp_attr)); + mask =3D IB_QP_STATE | IB_QP_ACCESS_FLAGS | IB_QP_PKEY_INDEX | IB_QP_P= ORT; + qp_attr.qp_state =3D IB_QPS_INIT; + qp_attr.port_num =3D 1; + qp_attr.pkey_index =3D 0; + qp_attr.qp_access_flags =3D 0; + ib_modify_qp(qp, &qp_attr, mask); + + memset(®_wr, 0, sizeof(reg_wr)); + reg_wr.wr.opcode =3D IB_WR_REG_MR; + reg_wr.wr.num_sge =3D 0; + reg_wr.mr =3D mr; + reg_wr.key =3D mr->lkey; + reg_wr.access =3D IB_ACCESS_LOCAL_WRITE; + ib_post_send(qp, ®_wr.wr, &bad_swr); + + memset(®_wr, 0, sizeof(reg_wr)); + reg_wr.wr.opcode =3D IB_WR_REG_MR; + reg_wr.wr.num_sge =3D 0; + reg_wr.mr =3D mr_recv; + reg_wr.key =3D mr_recv->lkey; + reg_wr.access =3D IB_ACCESS_LOCAL_WRITE; + ib_post_send(qp, ®_wr.wr, &bad_swr); + + // post recv + rsge[0].addr =3D dma_addr_recv; + rsge[0].length =3D 4096 * PAGES; + rsge[0].lkey =3D mr_recv->lkey; + + rwr.next =3D NULL; + rwr.wr_id =3D 1; + rwr.sg_list =3D rsge; + rwr.num_sge =3D 1; + for (i =3D 0; i < ITER; i++) { + if (ib_post_recv(qp, &rwr, &bad_rwr)) { + pr_err("post recv failed\n"); + return -EIO; + } + } + + // exchange info + if (ib_query_port(ib_dev, 1, &port_attr)) + pr_err("query port failed"); + my_dest.lid =3D port_attr.lid; + + // TODO: fix rdma_query_gid + if (rdma_query_gid(ib_dev, 1, 1, &my_dest.gid)) + pr_err("query gid failed"); + + get_random_bytes(&rand_num, sizeof(rand_num)); + my_dest.gid_index =3D 1; + my_dest.qpn =3D qp->qp_num; + my_dest.psn =3D rand_num & 0xffffff; + + pr_info(" local address: LID 0x%04x, QPN 0x%06x, PSN 0x%06x, GID %pI= 6\n", + my_dest.lid, my_dest.qpn, my_dest.psn, &my_dest.gid); + + rem_dest =3D pp_client_exch_dest(&my_dest); + if (!rem_dest) { + return -EIO; + } + + pr_info(" remote address: LID 0x%04x, QPN 0x%06x, PSN 0x%06x, GID %pI= 6\n", + rem_dest->lid, rem_dest->qpn, rem_dest->psn, &rem_dest->gid); + + my_dest.rkey =3D mr->rkey; + my_dest.out_reads =3D 1; + my_dest.vaddr =3D dma_addr; + my_dest.srqn =3D 0; + + // modify to rtr + memset(&qp_attr, 0, sizeof(qp_attr)); + mask =3D IB_QP_STATE | IB_QP_AV | IB_QP_PATH_MTU | IB_QP_DEST_QPN | IB= _QP_RQ_PSN | IB_QP_MIN_RNR_TIMER | IB_QP_MAX_DEST_RD_ATOMIC; + qp_attr.qp_state =3D IB_QPS_RTR; + qp_attr.path_mtu =3D IB_MTU_1024; + qp_attr.dest_qp_num =3D rem_dest->qpn; + qp_attr.rq_psn =3D rem_dest->psn; + qp_attr.max_dest_rd_atomic =3D 1; + qp_attr.min_rnr_timer =3D 12; + qp_attr.ah_attr.ah_flags =3D IB_AH_GRH; + qp_attr.ah_attr.ib.dlid =3D rem_dest->lid; // is_global lid + qp_attr.ah_attr.ib.src_path_bits =3D 0; + qp_attr.ah_attr.sl =3D 0; + qp_attr.ah_attr.port_num =3D 1; + + if (rem_dest->gid.global.interface_id) { + qp_attr.ah_attr.grh.hop_limit =3D 1; + qp_attr.ah_attr.grh.dgid =3D rem_dest->gid; + qp_attr.ah_attr.grh.sgid_index =3D my_dest.gid_index; + } + + if (ib_modify_qp(qp, &qp_attr, mask)) { + pr_info("Failed to modify to RTR\n"); + return -EIO; + } + + // modify to rts + memset(&qp_attr, 0, sizeof(qp_attr)); + mask =3D IB_QP_STATE | IB_QP_SQ_PSN | IB_QP_TIMEOUT | IB_QP_RETRY_CNT = | IB_QP_RNR_RETRY | IB_QP_MAX_QP_RD_ATOMIC; + qp_attr.qp_state =3D IB_QPS_RTS; + qp_attr.sq_psn =3D my_dest.psn; + qp_attr.timeout =3D 14; + qp_attr.retry_cnt =3D 7; + qp_attr.rnr_retry =3D 7; + qp_attr.max_rd_atomic =3D 1; + if (ib_modify_qp(qp, &qp_attr, mask)) { + pr_info("Failed to modify to RTS\n"); + } + + wsge[0].addr =3D dma_addr; + wsge[0].length =3D 4096 * PAGES; + wsge[0].lkey =3D mr->lkey; + + swr.wr.next =3D NULL; + swr.wr.wr_id =3D 2; + swr.wr.sg_list =3D wsge; + swr.wr.num_sge =3D 1; + swr.wr.opcode =3D IB_WR_SEND; + swr.wr.send_flags =3D IB_SEND_SIGNALED; + swr.remote_addr =3D rem_dest->vaddr; + swr.rkey =3D rem_dest->rkey; + + t0 =3D ktime_get(); + + for (iter =3D 0; iter < ITER; iter++) { + if (ib_post_send(qp, &swr.wr, &bad_swr)) { + pr_err("post send failed\n"); + return -EIO; + } + + do { + wc_got =3D ib_poll_cq(cq, 2, wc); + } while(wc_got < 1); + wc_total +=3D wc_got; + } + + pr_info("Total wc %d\n", wc_total); + do { + wc_total +=3D ib_poll_cq(cq, 2, wc); + }while(wc_total < ITER * 2); + + rt =3D ktime_to_us(ktime_sub(ktime_get(), t0)); + pr_info("%d iters in %lld us =3D %lld usec/iter\n", ITER, rt, rt / ITE= R); + pr_info("%d bytes in %lld us =3D %lld Mbit/sec\n", ITER * 4096 * 2, rt= , (uint64_t)ITER * 62500 / rt); + + pr_info("After %s\n", (char*)addr_send); + pr_info("After %s\n", (char*)addr_recv); + + ib_destroy_qp(qp); + ib_destroy_cq(cq); + ib_dereg_mr(mr); + ib_dereg_mr(mr_recv); + ib_dealloc_pd(pd); + return 0; +} + +static void __exit rdma_test_exit(void) { + pr_info("Exit rdma test\n"); +} + +module_init(rdma_test_init); +module_exit(rdma_test_exit); --=20 2.11.0 From nobody Sun Apr 28 23:27:47 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=bytedance.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1630591799325342.63796893859785; Thu, 2 Sep 2021 07:09:59 -0700 (PDT) Received: from localhost ([::1]:47742 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mLnPi-0007TV-8q for importer@patchew.org; Thu, 02 Sep 2021 10:09:58 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:41914) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mLmS5-0002vQ-UZ for qemu-devel@nongnu.org; Thu, 02 Sep 2021 09:08:21 -0400 Received: from mail-pj1-x1029.google.com ([2607:f8b0:4864:20::1029]:50837) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mLmS4-0001Ce-9c for qemu-devel@nongnu.org; Thu, 02 Sep 2021 09:08:21 -0400 Received: by mail-pj1-x1029.google.com with SMTP id fz10so1341314pjb.0 for ; Thu, 02 Sep 2021 06:08:19 -0700 (PDT) Received: from C02FR1DUMD6V.bytedance.net ([139.177.225.225]) by smtp.gmail.com with ESMTPSA id d6sm2307415pfa.135.2021.09.02.06.08.13 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 02 Sep 2021 06:08:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=L+G9kPhwsuho0YbQ3uM8CI129eTwrDx36Qp4gKZ7PCA=; b=XEzu6SxGtCjCDZ5ZILrNAjU5WcoIoZgBfHSLRPy7EoHcdr0ee+eGu5jLuqpbKtZMiq hd+qdiyXzgT4SDSEZtQUh0pSH/pOwcbsQWNMfYXjd/z2p9Neb2GQ17PmO13ApLxqd7JB u+8dsnbasdGBBXubbiU7CpPrOWwbTUAOWhvytAshDGMnsi8yYWU136hG03559H+D15C4 FmbA1FqBB8FZCNrnWjCSWme8rSnMjaOh7Dp8Mu2OlS5+TAPLQySWvUOHBpRMJXdKxbUN szaVr9hfpYqhzDy5pHOh3ZORkUFtPPzOYxx0H1Z86AY1e1TVFhUuTw9U8GgaSqA8L/6J GOMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=L+G9kPhwsuho0YbQ3uM8CI129eTwrDx36Qp4gKZ7PCA=; b=m1pq2qkrXnooNW9mDPLRbRdNbjIVxZW45f1CX5FL7Y9OxSzLOtP/rIuLVRxckVYFaW UQAsMk+BWcxGTCefHkgstxOs20UcQJTCa9JLQI0V5nAr+kVYxeKqFY+u0mQypTrZ7OOm O9+w1MMY3p/TzLjySIgR127z1+LOVGhVZlUNyaj7S4K3NhlUaAl7pbp5ZEnPxH6oqjZK g0P1N7P19S/rsS81s9yTUAUfxf9XHP/UfjZE1Brx/JEXv5vCuBXvwvU9KW1fQCZw5YtA +BMsfhCaQYngnpVBtTw9ayF0VYuO9Ys0wfXgfRbkP1fa/IbmAVgaoFABj73ZW+vohJMN HsSQ== X-Gm-Message-State: AOAM530MuVfTnOg7g7W4MhleQBZLBtqhWvUyNc5+GbBmGUkH9IfDeRO8 4aXOlXqr9m3nFqkLaZ9CG5A+eg== X-Google-Smtp-Source: ABdhPJyCxnuTLqOl7CCROnnXqyJ/zDL9uG3QdE9kp2zXMQh6j8vqadez6ThslEN21V0vilvkCkk6VQ== X-Received: by 2002:a17:90a:8005:: with SMTP id b5mr3875331pjn.190.1630588098890; Thu, 02 Sep 2021 06:08:18 -0700 (PDT) From: Junji Wei To: dledford@redhat.com, jgg@ziepe.ca, mst@redhat.com, jasowang@redhat.com, yuval.shaia.ml@gmail.com, marcel.apfelbaum@gmail.com, cohuck@redhat.com, hare@suse.de Subject: [RFC 4/5] virtio-net: Move some virtio-net-pci decl to include/hw/virtio Date: Thu, 2 Sep 2021 21:06:24 +0800 Message-Id: <20210902130625.25277-5-weijunji@bytedance.com> X-Mailer: git-send-email 2.30.1 (Apple Git-130) In-Reply-To: <20210902130625.25277-1-weijunji@bytedance.com> References: <20210902130625.25277-1-weijunji@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::1029; envelope-from=weijunji@bytedance.com; helo=mail-pj1-x1029.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Thu, 02 Sep 2021 10:08:37 -0400 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-rdma@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, xieyongji@bytedance.com, chaiwen.cc@bytedance.com, weijunji@bytedance.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1630591802191100001 Content-Type: text/plain; charset="utf-8" From: Yuval Shaia This patch is from Yuval Shaia's [RFC 1/3] Signed-off-by: Yuval Shaia --- hw/virtio/virtio-net-pci.c | 18 ++---------------- include/hw/virtio/virtio-net-pci.h | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+), 16 deletions(-) create mode 100644 include/hw/virtio/virtio-net-pci.h diff --git a/hw/virtio/virtio-net-pci.c b/hw/virtio/virtio-net-pci.c index 292d13d278..6cea7e0441 100644 --- a/hw/virtio/virtio-net-pci.c +++ b/hw/virtio/virtio-net-pci.c @@ -18,26 +18,12 @@ #include "qemu/osdep.h" =20 #include "hw/qdev-properties.h" -#include "hw/virtio/virtio-net.h" +#include "hw/virtio/virtio-net-pci.h" #include "virtio-pci.h" #include "qapi/error.h" #include "qemu/module.h" #include "qom/object.h" =20 -typedef struct VirtIONetPCI VirtIONetPCI; - -/* - * virtio-net-pci: This extends VirtioPCIProxy. - */ -#define TYPE_VIRTIO_NET_PCI "virtio-net-pci-base" -DECLARE_INSTANCE_CHECKER(VirtIONetPCI, VIRTIO_NET_PCI, - TYPE_VIRTIO_NET_PCI) - -struct VirtIONetPCI { - VirtIOPCIProxy parent_obj; - VirtIONet vdev; -}; - static Property virtio_net_properties[] =3D { DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true), @@ -84,7 +70,7 @@ static void virtio_net_pci_instance_init(Object *obj) =20 static const VirtioPCIDeviceTypeInfo virtio_net_pci_info =3D { .base_name =3D TYPE_VIRTIO_NET_PCI, - .generic_name =3D "virtio-net-pci", + .generic_name =3D TYPE_VIRTIO_NET_PCI_GENERIC, .transitional_name =3D "virtio-net-pci-transitional", .non_transitional_name =3D "virtio-net-pci-non-transitional", .instance_size =3D sizeof(VirtIONetPCI), diff --git a/include/hw/virtio/virtio-net-pci.h b/include/hw/virtio/virtio-= net-pci.h new file mode 100644 index 0000000000..c1915cd54f --- /dev/null +++ b/include/hw/virtio/virtio-net-pci.h @@ -0,0 +1,35 @@ +/* + * PCI Virtio Network Device + * + * Copyright IBM, Corp. 2007 + * + * Authors: + * Anthony Liguori + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#ifndef QEMU_VIRTIO_NET_PCI_H +#define QEMU_VIRTIO_NET_PCI_H + +#include "hw/virtio/virtio-net.h" +#include "hw/virtio/virtio-pci.h" + +typedef struct VirtIONetPCI VirtIONetPCI; + +/* + * virtio-net-pci: This extends VirtioPCIProxy. + */ +#define TYPE_VIRTIO_NET_PCI_GENERIC "virtio-net-pci" +#define TYPE_VIRTIO_NET_PCI "virtio-net-pci-base" +#define VIRTIO_NET_PCI(obj) \ + OBJECT_CHECK(VirtIONetPCI, (obj), TYPE_VIRTIO_NET_PCI) + +struct VirtIONetPCI { + VirtIOPCIProxy parent_obj; + VirtIONet vdev; +}; + +#endif --=20 2.11.0 From nobody Sun Apr 28 23:27:47 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=bytedance.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1630592004510379.9199839338005; Thu, 2 Sep 2021 07:13:24 -0700 (PDT) Received: from localhost ([::1]:56610 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mLnT1-0005P9-2i for importer@patchew.org; Thu, 02 Sep 2021 10:13:23 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:41966) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mLmSK-0003uQ-6e for qemu-devel@nongnu.org; Thu, 02 Sep 2021 09:08:36 -0400 Received: from mail-pg1-x52b.google.com ([2607:f8b0:4864:20::52b]:33399) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mLmSF-0001De-Mz for qemu-devel@nongnu.org; Thu, 02 Sep 2021 09:08:35 -0400 Received: by mail-pg1-x52b.google.com with SMTP id c17so1915257pgc.0 for ; Thu, 02 Sep 2021 06:08:31 -0700 (PDT) Received: from C02FR1DUMD6V.bytedance.net ([139.177.225.225]) by smtp.gmail.com with ESMTPSA id d6sm2307415pfa.135.2021.09.02.06.08.24 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 02 Sep 2021 06:08:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=JdKx1CchErer0L4Gf3BkL6es4TGk0q8NPWAfgcNqaLg=; b=H95BuP3iy0w2R2nZvHcuTTKBo7w14Pnmc0+iO+a94GaamGWCsLjOi4cumQ73d6w8fv 6tfO7fg9iGBp7Rxia98lH8x7Pkc4e76+ji87IT5yr/jsR5HvZig4ZAjG04wxQl4YBIot 47BrSMpfkqDDso600vcsriCTOwA4OEtF6yIXDNvK+A+i7Dj5VvoSkO2ZXJ1M+pe+4Gtn 5zQENlbRLSkDd9uUavygA+V/FdGv7I7hGpwdo5MsDA/Yc+FCGHTpsp/I3GdyFd31i5Ol FY94jhdlgasgqeZk+FMl5i1iWDSns5VoljZ8MWhXOSfcKKXUY2/DLUaT1gaiRJ7fOcc+ bHtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JdKx1CchErer0L4Gf3BkL6es4TGk0q8NPWAfgcNqaLg=; b=aM7DkHvkemC4KxynFDXJ+zG4toby9nE+pHJ1Ck+s6ofPAWU66qWpV2xjBdKHdb1P/3 HJnYg8O8nGQByaGUbRqmdZxZVuMMtt+SWtvQ3VufJvhOEKr+InwM9EK10BJifllspG41 bzgJdixIvT60ghizzfodGTfK9cFiKhZWLKsndi0qqyGnMe8H9h6h43FjjPiWNHeel6xd 8jhCuL+L0zZBqjunH+G5P0PR/8xw8KYBLJJVzwZIv2Bz3J8EnLeLHQplAMYH4y1cB81i RN6V9clw8qWzDJ97nKhKUM77rIbvg0KD2sOWYvLHiEZm7R1wy58LjqR8tEvH6GIusq1V 1zGg== X-Gm-Message-State: AOAM533pa8A2mNvy3OnHjz5s/eQDrnql+LrTDIw5SFLGA9pJJplZMklV VDke2Bo35EXuy2W6JJgwtZ3nag== X-Google-Smtp-Source: ABdhPJyU+U98LGYjuuoG2RpNVXFbZNMBasd0En9CmUJeS8X21CurptkmyJlQ0zTkZVnDy2IsXygXrQ== X-Received: by 2002:aa7:8e81:0:b0:3fe:f212:f9dd with SMTP id a1-20020aa78e81000000b003fef212f9ddmr3374201pfr.46.1630588109969; Thu, 02 Sep 2021 06:08:29 -0700 (PDT) From: Junji Wei To: dledford@redhat.com, jgg@ziepe.ca, mst@redhat.com, jasowang@redhat.com, yuval.shaia.ml@gmail.com, marcel.apfelbaum@gmail.com, cohuck@redhat.com, hare@suse.de Subject: [RFC 5/5] hw/virtio-rdma: VirtIO rdma device Date: Thu, 2 Sep 2021 21:06:25 +0800 Message-Id: <20210902130625.25277-6-weijunji@bytedance.com> X-Mailer: git-send-email 2.30.1 (Apple Git-130) In-Reply-To: <20210902130625.25277-1-weijunji@bytedance.com> References: <20210902130625.25277-1-weijunji@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::52b; envelope-from=weijunji@bytedance.com; helo=mail-pg1-x52b.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Thu, 02 Sep 2021 10:08:37 -0400 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-rdma@vger.kernel.org, qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, xieyongji@bytedance.com, chaiwen.cc@bytedance.com, weijunji@bytedance.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZM-MESSAGEID: 1630592005706100001 Content-Type: text/plain; charset="utf-8" This based on Yuval Shaia's [RFC 2/3] [ Junji Wei: Implement simple date path and complete control path. ] Signed-off-by: Yuval Shaia Signed-off-by: Junji Wei --- hw/rdma/Kconfig | 5 + hw/rdma/meson.build | 10 + hw/rdma/virtio/virtio-rdma-dev-api.h | 269 ++++++++++ hw/rdma/virtio/virtio-rdma-ib.c | 764 ++++++++++++++++++++++++= ++++ hw/rdma/virtio/virtio-rdma-ib.h | 176 +++++++ hw/rdma/virtio/virtio-rdma-main.c | 231 +++++++++ hw/rdma/virtio/virtio-rdma-qp.c | 241 +++++++++ hw/rdma/virtio/virtio-rdma-qp.h | 29 ++ hw/virtio/meson.build | 1 + hw/virtio/virtio-rdma-pci.c | 110 ++++ include/hw/pci/pci.h | 1 + include/hw/virtio/virtio-rdma.h | 58 +++ include/standard-headers/linux/virtio_ids.h | 1 + 13 files changed, 1896 insertions(+) create mode 100644 hw/rdma/virtio/virtio-rdma-dev-api.h create mode 100644 hw/rdma/virtio/virtio-rdma-ib.c create mode 100644 hw/rdma/virtio/virtio-rdma-ib.h create mode 100644 hw/rdma/virtio/virtio-rdma-main.c create mode 100644 hw/rdma/virtio/virtio-rdma-qp.c create mode 100644 hw/rdma/virtio/virtio-rdma-qp.h create mode 100644 hw/virtio/virtio-rdma-pci.c create mode 100644 include/hw/virtio/virtio-rdma.h diff --git a/hw/rdma/Kconfig b/hw/rdma/Kconfig index 8e2211288f..245b5b4d11 100644 --- a/hw/rdma/Kconfig +++ b/hw/rdma/Kconfig @@ -1,3 +1,8 @@ config VMW_PVRDMA default y if PCI_DEVICES depends on PVRDMA && PCI && MSI_NONBROKEN + +config VIRTIO_RDMA + bool + default y + depends on VIRTIO diff --git a/hw/rdma/meson.build b/hw/rdma/meson.build index 7325f40c32..da9c3aaaf4 100644 --- a/hw/rdma/meson.build +++ b/hw/rdma/meson.build @@ -8,3 +8,13 @@ specific_ss.add(when: 'CONFIG_VMW_PVRDMA', if_true: files( 'vmw/pvrdma_main.c', 'vmw/pvrdma_qp_ops.c', )) + +specific_ss.add(when: 'CONFIG_VIRTIO_RDMA', if_true: files( + 'rdma.c', + 'rdma_backend.c', + 'rdma_rm.c', + 'rdma_utils.c', + 'virtio/virtio-rdma-main.c', + 'virtio/virtio-rdma-ib.c', + 'virtio/virtio-rdma-qp.c', +)) diff --git a/hw/rdma/virtio/virtio-rdma-dev-api.h b/hw/rdma/virtio/virtio-r= dma-dev-api.h new file mode 100644 index 0000000000..d4d8f2acc2 --- /dev/null +++ b/hw/rdma/virtio/virtio-rdma-dev-api.h @@ -0,0 +1,269 @@ +/* + * Virtio RDMA Device - QP ops + * + * Copyright (C) 2021 Bytedance Inc. + * + * Authors: + * Junji Wei + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#ifndef VIRTIO_RDMA_DEV_API_H +#define VIRTIO_RDMA_DEV_API_H + +#include "virtio-rdma-ib.h" + +#define VIRTIO_RDMA_CTRL_OK 0 +#define VIRTIO_RDMA_CTRL_ERR 1 + +enum { + VIRTIO_CMD_QUERY_DEVICE =3D 10, + VIRTIO_CMD_QUERY_PORT, + VIRTIO_CMD_CREATE_CQ, + VIRTIO_CMD_DESTROY_CQ, + VIRTIO_CMD_CREATE_PD, + VIRTIO_CMD_DESTROY_PD, + VIRTIO_CMD_GET_DMA_MR, + VIRTIO_CMD_CREATE_MR, + VIRTIO_CMD_MAP_MR_SG, + VIRTIO_CMD_REG_USER_MR, + VIRTIO_CMD_DEREG_MR, + VIRTIO_CMD_CREATE_QP, + VIRTIO_CMD_MODIFY_QP, + VIRTIO_CMD_QUERY_QP, + VIRTIO_CMD_DESTROY_QP, + VIRTIO_CMD_QUERY_GID, + VIRTIO_CMD_CREATE_UC, + VIRTIO_CMD_DEALLOC_UC, + VIRTIO_CMD_QUERY_PKEY, + VIRTIO_MAX_CMD_NUM, +}; + +struct control_buf { + uint8_t cmd; + uint8_t status; +}; + +struct cmd_query_port { + uint8_t port; +}; + +struct virtio_rdma_port_attr { + enum ibv_port_state state; + enum ibv_mtu max_mtu; + enum ibv_mtu active_mtu; + int gid_tbl_len; + unsigned int ip_gids:1; + uint32_t port_cap_flags; + uint32_t max_msg_sz; + uint32_t bad_pkey_cntr; + uint32_t qkey_viol_cntr; + uint16_t pkey_tbl_len; + uint32_t sm_lid; + uint32_t lid; + uint8_t lmc; + uint8_t max_vl_num; + uint8_t sm_sl; + uint8_t subnet_timeout; + uint8_t init_type_reply; + uint8_t active_width; + uint8_t active_speed; + uint8_t phys_state; + uint16_t port_cap_flags2; +}; + +struct cmd_create_cq { + uint32_t cqe; +}; + +struct rsp_create_cq { + uint32_t cqn; +}; + +struct cmd_destroy_cq { + uint32_t cqn; +}; + +struct cmd_create_pd { + uint32_t ctx_handle; +}; + +struct rsp_create_pd { + uint32_t pdn; +}; + +struct cmd_destroy_pd { + uint32_t pdn; +}; + +struct cmd_create_mr { + uint32_t pdn; + uint32_t access_flags; + + uint32_t max_num_sg; +}; + +struct rsp_create_mr { + uint32_t mrn; + uint32_t lkey; + uint32_t rkey; +}; + +struct cmd_map_mr_sg { + uint32_t mrn; + uint64_t start; + uint32_t npages; + + uint64_t pages; +}; + +struct rsp_map_mr_sg { + uint32_t npages; +}; + +struct cmd_reg_user_mr { + uint32_t pdn; + uint32_t access_flags; + uint64_t start; + uint64_t length; + + uint64_t pages; + uint32_t npages; +}; + +struct rsp_reg_user_mr { + uint32_t mrn; + uint32_t lkey; + uint32_t rkey; +}; + +struct cmd_dereg_mr { + uint32_t mrn; + + uint8_t is_user_mr; +}; + +struct rsp_dereg_mr { + uint32_t mrn; +}; + +struct cmd_create_qp { + uint32_t pdn; + uint8_t qp_type; + uint32_t max_send_wr; + uint32_t max_send_sge; + uint32_t send_cqn; + uint32_t max_recv_wr; + uint32_t max_recv_sge; + uint32_t recv_cqn; + uint8_t is_srq; + uint32_t srq_handle; +}; + +struct rsp_create_qp { + uint32_t qpn; +}; + +struct cmd_modify_qp { + uint32_t qpn; + uint32_t attr_mask; + struct virtio_rdma_qp_attr attr; +}; + +struct cmd_destroy_qp { + uint32_t qpn; +}; + +struct rsp_destroy_qp { + uint32_t qpn; +}; + +struct cmd_query_qp { + uint32_t qpn; + uint32_t attr_mask; +}; + +struct rsp_query_qp { + struct virtio_rdma_qp_attr attr; +}; + +struct cmd_query_gid { + uint8_t port; + uint32_t index; +}; + +struct cmd_create_uc { + uint64_t pfn; +}; + +struct rsp_create_uc { + uint32_t ctx_handle; +}; + +struct cmd_dealloc_uc { + uint32_t ctx_handle; +}; + +struct rsp_dealloc_uc { + uint32_t ctx_handle; +}; + +struct cmd_query_pkey { + __u8 port; + __u16 index; +}; + +struct rsp_query_pkey { + __u16 pkey; +}; + +struct cmd_post_send { + __u32 qpn; + __u32 is_kernel; + __u32 num_sge; + + int send_flags; + enum virtio_rdma_wr_opcode opcode; + __u64 wr_id; + + union { + __be32 imm_data; + __u32 invalidate_rkey; + } ex; +=09 + union { + struct { + __u64 remote_addr; + __u32 rkey; + } rdma; + struct { + __u64 remote_addr; + __u64 compare_add; + __u64 swap; + __u32 rkey; + } atomic; + struct { + __u32 remote_qpn; + __u32 remote_qkey; + __u32 ahn; + } ud; + struct { + __u32 mrn; + __u32 key; + int access; + } reg; + } wr; +}; + +struct cmd_post_recv { + __u32 qpn; + __u32 is_kernel; + + __u32 num_sge; + __u64 wr_id; +}; + +#endif diff --git a/hw/rdma/virtio/virtio-rdma-ib.c b/hw/rdma/virtio/virtio-rdma-i= b.c new file mode 100644 index 0000000000..54831ec787 --- /dev/null +++ b/hw/rdma/virtio/virtio-rdma-ib.c @@ -0,0 +1,764 @@ +/* + * Virtio RDMA Device - IB verbs + * + * Copyright (C) 2019 Oracle + * + * Authors: + * Yuval Shaia + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include + +#include "qemu/osdep.h" +#include "qemu/atomic.h" +#include "cpu.h" + +#include "virtio-rdma-ib.h" +#include "virtio-rdma-qp.h" +#include "virtio-rdma-dev-api.h" + +#include "../rdma_utils.h" +#include "../rdma_rm.h" +#include "../rdma_backend.h" + +#include + +int virtio_rdma_query_device(VirtIORdma *rdev, struct iovec *in, + struct iovec *out) +{ + int offs; + size_t s; + + addrconf_addr_eui48((unsigned char *)&rdev->dev_attr.sys_image_guid, + (const char *)&rdev->netdev->mac); + + offs =3D offsetof(struct ibv_device_attr, sys_image_guid); + s =3D iov_from_buf(out, 1, 0, (void *)&rdev->dev_attr + offs, sizeof(r= dev->dev_attr) - offs); + + return s =3D=3D sizeof(rdev->dev_attr) - offs ? VIRTIO_RDMA_CTRL_OK : + VIRTIO_RDMA_CTRL_ERR; +} + +int virtio_rdma_query_port(VirtIORdma *rdev, struct iovec *in, + struct iovec *out) +{ + struct virtio_rdma_port_attr attr =3D {}; + struct ibv_port_attr vattr =3D {}; + struct cmd_query_port cmd =3D {}; + int offs; + size_t s; + + s =3D iov_to_buf(in, 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) { + return VIRTIO_RDMA_CTRL_ERR; + } + + if (cmd.port !=3D 1) { + return VIRTIO_RDMA_CTRL_ERR; + } + + if(rdma_backend_query_port(rdev->backend_dev, &vattr)) + return VIRTIO_RDMA_CTRL_ERR; + + attr.state =3D vattr.state; + attr.max_mtu =3D vattr.max_mtu; + attr.active_mtu =3D vattr.active_mtu; + attr.gid_tbl_len =3D vattr.gid_tbl_len; + attr.port_cap_flags =3D vattr.port_cap_flags; + attr.max_msg_sz =3D vattr.max_msg_sz; + attr.bad_pkey_cntr =3D vattr.bad_pkey_cntr; + attr.qkey_viol_cntr =3D vattr.qkey_viol_cntr; + attr.pkey_tbl_len =3D vattr.pkey_tbl_len; + attr.lid =3D vattr.lid; + attr.sm_lid =3D vattr.sm_lid; + attr.lmc =3D vattr.lmc; + attr.max_vl_num =3D vattr.max_vl_num; + attr.sm_sl =3D vattr.sm_sl; + attr.subnet_timeout =3D vattr.subnet_timeout; + attr.init_type_reply =3D vattr.init_type_reply; + attr.active_width =3D vattr.active_width; + attr.active_speed =3D vattr.phys_state; + attr.phys_state =3D vattr.phys_state; + attr.port_cap_flags2 =3D vattr.port_cap_flags2; + + offs =3D offsetof(struct virtio_rdma_port_attr, state); + + s =3D iov_from_buf(out, 1, 0, (void *)&attr + offs, sizeof(attr) - off= s); + + return s =3D=3D sizeof(attr) - offs ? VIRTIO_RDMA_CTRL_OK : + VIRTIO_RDMA_CTRL_ERR; +} + +int virtio_rdma_create_cq(VirtIORdma *rdev, struct iovec *in, + struct iovec *out) +{ + struct cmd_create_cq cmd =3D {}; + struct rsp_create_cq rsp =3D {}; + size_t s; + int rc; + + s =3D iov_to_buf(in, 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) { + return VIRTIO_RDMA_CTRL_ERR; + } + + /* TODO: Define MAX_CQE */ +#define MAX_CQE 1024 + /* TODO: Check MAX_CQ */ + if (cmd.cqe > MAX_CQE) { + return VIRTIO_RDMA_CTRL_ERR; + } + + printf("%s: %d\n", __func__, cmd.cqe); + + rc =3D rdma_rm_alloc_cq(rdev->rdma_dev_res, rdev->backend_dev, cmd.cqe, + &rsp.cqn, NULL); + if (rc) + return VIRTIO_RDMA_CTRL_ERR; + + printf("%s: %d\n", __func__, rsp.cqn); + + s =3D iov_from_buf(out, 1, 0, &rsp, sizeof(rsp)); + + return s =3D=3D sizeof(rsp) ? VIRTIO_RDMA_CTRL_OK : + VIRTIO_RDMA_CTRL_ERR; +} + +int virtio_rdma_destroy_cq(VirtIORdma *rdev, struct iovec *in, + struct iovec *out) +{ + struct cmd_destroy_cq cmd =3D {}; + size_t s; + + s =3D iov_to_buf(in, 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) { + return VIRTIO_RDMA_CTRL_ERR; + } + + printf("%s: %d\n", __func__, cmd.cqn); + + virtqueue_drop_all(rdev->cq_vqs[cmd.cqn]); + rdma_rm_dealloc_cq(rdev->rdma_dev_res, cmd.cqn); + + return VIRTIO_RDMA_CTRL_OK; +} + +int virtio_rdma_create_pd(VirtIORdma *rdev, struct iovec *in, + struct iovec *out) +{ + struct cmd_create_pd cmd =3D {}; + struct rsp_create_pd rsp =3D {}; + size_t s; + int rc; + + if (qatomic_inc_fetch(&rdev->num_pd) > rdev->dev_attr.max_pd) + goto err; + + s =3D iov_to_buf(in, 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) + goto err; + + /* TODO: Check MAX_PD */ + + rc =3D rdma_rm_alloc_pd(rdev->rdma_dev_res, rdev->backend_dev, &rsp.pd= n, + cmd.ctx_handle); + if (rc) + goto err; + + printf("%s: pdn %d num_pd %d\n", __func__, rsp.pdn, qatomic_read(&rde= v->num_pd)); + + s =3D iov_from_buf(out, 1, 0, &rsp, sizeof(rsp)); + + if (s =3D=3D sizeof(rsp)) + return VIRTIO_RDMA_CTRL_OK; + +err: + qatomic_dec(&rdev->num_pd); + return VIRTIO_RDMA_CTRL_ERR; +} + +int virtio_rdma_destroy_pd(VirtIORdma *rdev, struct iovec *in, + struct iovec *out) +{ + struct cmd_destroy_pd cmd =3D {}; + size_t s; + + s =3D iov_to_buf(in, 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) { + return VIRTIO_RDMA_CTRL_ERR; + } + + printf("%s: %d\n", __func__, cmd.pdn); + + rdma_rm_dealloc_pd(rdev->rdma_dev_res, cmd.pdn); + + return VIRTIO_RDMA_CTRL_OK; +} + +int virtio_rdma_get_dma_mr(VirtIORdma *rdev, struct iovec *in, + struct iovec *out) +{ + struct cmd_create_mr cmd =3D {}; + struct rsp_create_mr rsp =3D {}; + size_t s; + uint32_t *htbl_key; + struct virtio_rdma_kernel_mr *kernel_mr; + + // FIXME: how to support dma mr + rdma_warn_report("DMA mr is not supported now"); + + htbl_key =3D g_malloc0(sizeof(*htbl_key)); + if (htbl_key =3D=3D NULL) + return VIRTIO_RDMA_CTRL_ERR; + + kernel_mr =3D g_malloc0(sizeof(*kernel_mr)); + if (kernel_mr =3D=3D NULL) { + g_free(htbl_key); + return VIRTIO_RDMA_CTRL_ERR; + } + + s =3D iov_to_buf(in, 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) { + g_free(kernel_mr); + g_free(htbl_key); + return VIRTIO_RDMA_CTRL_ERR; + } + + rdma_rm_alloc_mr(rdev->rdma_dev_res, cmd.pdn, 0, 0, NULL, cmd.access_f= lags, &rsp.mrn, &rsp.lkey, &rsp.rkey); + + *htbl_key =3D rsp.lkey; + kernel_mr->dummy_mr =3D rdma_rm_get_mr(rdev->rdma_dev_res, rsp.mrn); + kernel_mr->max_num_sg =3D cmd.max_num_sg; + kernel_mr->real_mr =3D NULL; + kernel_mr->dma_mr =3D true; + g_hash_table_insert(rdev->lkey_mr_tbl, htbl_key, kernel_mr); + + s =3D iov_from_buf(out, 1, 0, &rsp, sizeof(rsp)); + + return s =3D=3D sizeof(rsp) ? VIRTIO_RDMA_CTRL_OK : + VIRTIO_RDMA_CTRL_ERR; +} + +int virtio_rdma_create_mr(VirtIORdma *rdev, struct iovec *in, + struct iovec *out) +{ + struct cmd_create_mr cmd =3D {}; + struct rsp_create_mr rsp =3D {}; + size_t s; + void* map_addr; + // uint64_t length; + uint32_t *htbl_key; + struct virtio_rdma_kernel_mr *kernel_mr; + RdmaRmMR *mr; + + htbl_key =3D g_malloc0(sizeof(*htbl_key)); + if (htbl_key =3D=3D NULL) + return VIRTIO_RDMA_CTRL_ERR; + + kernel_mr =3D g_malloc0(sizeof(*kernel_mr)); + if (kernel_mr =3D=3D NULL) { + g_free(htbl_key); + return VIRTIO_RDMA_CTRL_ERR; + } + + s =3D iov_to_buf(in, 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) { + g_free(kernel_mr); + g_free(htbl_key); + return VIRTIO_RDMA_CTRL_ERR; + } + + // when length is zero, will return same lkey + map_addr =3D mmap(0, TARGET_PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_ANO= NYMOUS | MAP_SHARED, -1, 0); + rdma_rm_alloc_mr(rdev->rdma_dev_res, cmd.pdn, (uint64_t)map_addr, TARG= ET_PAGE_SIZE, map_addr, cmd.access_flags, &rsp.mrn, &rsp.lkey, &rsp.rkey); + // rkey is -1, because in kernel mode mr cannot access from remotes + + /* we need to build a lkey to MR map, in order to set the local address + * in post_send and post_recv. + */ + *htbl_key =3D rsp.lkey; + mr =3D rdma_rm_get_mr(rdev->rdma_dev_res, rsp.mrn); + mr->lkey =3D rsp.lkey; + kernel_mr->dummy_mr =3D mr; + kernel_mr->max_num_sg =3D cmd.max_num_sg; + kernel_mr->real_mr =3D NULL; + kernel_mr->dma_mr =3D false; + g_hash_table_insert(rdev->lkey_mr_tbl, htbl_key, kernel_mr); + + s =3D iov_from_buf(out, 1, 0, &rsp, sizeof(rsp)); + + return s =3D=3D sizeof(rsp) ? VIRTIO_RDMA_CTRL_OK : + VIRTIO_RDMA_CTRL_ERR; +} + +static int remap_pages(AddressSpace *as, uint64_t *pages, void* remap_star= t, int npages) +{ + int i; + void* addr; + void* curr_page; + dma_addr_t len =3D TARGET_PAGE_SIZE; + + for (i =3D 0; i < npages; i++) { + rdma_info_report("remap page %lx to %p", pages[i], remap_start + T= ARGET_PAGE_SIZE * i); + curr_page =3D dma_memory_map(as, pages[i], &len, DMA_DIRECTION_TO_= DEVICE); + addr =3D mremap(curr_page, 0, TARGET_PAGE_SIZE, MREMAP_MAYMOVE | M= REMAP_FIXED, + remap_start + TARGET_PAGE_SIZE * i); + dma_memory_unmap(as, curr_page, TARGET_PAGE_SIZE, DMA_DIRECTION_TO= _DEVICE, 0); + if (addr =3D=3D MAP_FAILED) + break; + } + return i; +} + +int virtio_rdma_map_mr_sg(VirtIORdma *rdev, struct iovec *in, + struct iovec *out) +{ + struct cmd_map_mr_sg cmd =3D {}; + struct rsp_map_mr_sg rsp =3D {}; + size_t s; + uint64_t *pages; + dma_addr_t len =3D TARGET_PAGE_SIZE; + RdmaRmMR *mr; + void *remap_addr; + AddressSpace *dma_as =3D VIRTIO_DEVICE(rdev)->dma_as; + struct virtio_rdma_kernel_mr *kmr; + uint32_t num_pages; + + s =3D iov_to_buf(in, 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) { + return VIRTIO_RDMA_CTRL_ERR; + } + + mr =3D rdma_rm_get_mr(rdev->rdma_dev_res, cmd.mrn); + if (!mr) { + rdma_error_report("get mr failed\n"); + return VIRTIO_RDMA_CTRL_ERR; + } + + pages =3D dma_memory_map(dma_as, cmd.pages, &len, DMA_DIRECTION_TO_DEV= ICE); + + kmr =3D g_hash_table_lookup(rdev->lkey_mr_tbl, &mr->lkey); + if (!kmr) { + rdma_error_report("Get kmr failed\n"); + return VIRTIO_RDMA_CTRL_ERR; + } + + num_pages =3D kmr->max_num_sg > cmd.npages ? cmd.npages : kmr->max_num= _sg; + remap_addr =3D mmap(0, num_pages * TARGET_PAGE_SIZE, PROT_READ | PROT_= WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0); + + rsp.npages =3D remap_pages(dma_as, pages, remap_addr, num_pages); + dma_memory_unmap(dma_as, pages, len, DMA_DIRECTION_TO_DEVICE, 0); + + // rdma_rm_alloc_mr(rdev->rdma_dev_res, mr->pd_handle, (uint64_t)remap= _addr, num_pages * TARGET_PAGE_SIZE, + // remap_addr, IBV_ACCESS_LOCAL_WRITE, &kmr->mrn, &km= r->lkey, &kmr->rkey); + + kmr->virt =3D remap_addr; + kmr->length =3D num_pages * TARGET_PAGE_SIZE; + kmr->start =3D cmd.start; + // kmr->real_mr =3D rdma_rm_get_mr(rdev->rdma_dev_res, kmr->mrn); + + s =3D iov_from_buf(out, 1, 0, &rsp, sizeof(rsp)); + + return s =3D=3D sizeof(rsp) ? VIRTIO_RDMA_CTRL_OK : + VIRTIO_RDMA_CTRL_ERR; +} + +int virtio_rdma_reg_user_mr(VirtIORdma *rdev, struct iovec *in, + struct iovec *out) +{ + struct cmd_reg_user_mr cmd =3D {}; + struct rsp_reg_user_mr rsp =3D {}; + size_t s; + uint64_t *pages; + dma_addr_t len =3D TARGET_PAGE_SIZE; + void *remap_addr, *curr_page; + AddressSpace *dma_as =3D VIRTIO_DEVICE(rdev)->dma_as; + + s =3D iov_to_buf(in, 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) { + return VIRTIO_RDMA_CTRL_ERR; + } + + pages =3D dma_memory_map(dma_as, cmd.pages, &len, DMA_DIRECTION_TO_DEV= ICE); + + curr_page =3D dma_memory_map(dma_as, pages[0], &len, DMA_DIRECTION_TO_= DEVICE); + remap_addr =3D mremap(curr_page, 0, TARGET_PAGE_SIZE * cmd.npages, MRE= MAP_MAYMOVE); + dma_memory_unmap(dma_as, curr_page, TARGET_PAGE_SIZE, DMA_DIRECTION_TO= _DEVICE, 0); + if (remap_addr =3D=3D MAP_FAILED) { + rdma_error_report("mremap failed\n"); + return VIRTIO_RDMA_CTRL_ERR; + } + + remap_pages(dma_as, pages + 1, remap_addr + TARGET_PAGE_SIZE, cmd.npag= es - 1); + dma_memory_unmap(dma_as, pages, len, DMA_DIRECTION_TO_DEVICE, 0); + + rdma_rm_alloc_mr(rdev->rdma_dev_res, cmd.pdn, cmd.start, TARGET_PAGE_S= IZE * cmd.npages, + remap_addr, cmd.access_flags, &rsp.mrn, &rsp.lkey, &r= sp.rkey); + rsp.rkey =3D rdma_backend_mr_rkey(&rdma_rm_get_mr(rdev->rdma_dev_res, = rsp.mrn)->backend_mr); + rdma_info_report("%s: 0x%x\n", __func__, rsp.mrn); + + s =3D iov_from_buf(out, 1, 0, &rsp, sizeof(rsp)); + + return s =3D=3D sizeof(rsp) ? VIRTIO_RDMA_CTRL_OK : + VIRTIO_RDMA_CTRL_ERR; +} + +int virtio_rdma_dereg_mr(VirtIORdma *rdev, struct iovec *in, + struct iovec *out) +{ + struct cmd_dereg_mr cmd =3D {}; + struct RdmaRmMR *mr; + struct virtio_rdma_kernel_mr *kmr; + size_t s; + uint32_t lkey; + + s =3D iov_to_buf(in, 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) { + return VIRTIO_RDMA_CTRL_ERR; + } + + mr =3D rdma_rm_get_mr(rdev->rdma_dev_res, cmd.mrn); + if (!mr) + return VIRTIO_RDMA_CTRL_ERR; + + if (!cmd.is_user_mr) { + lkey =3D mr->lkey; + kmr =3D g_hash_table_lookup(rdev->lkey_mr_tbl, &lkey); + if (!kmr) + return VIRTIO_RDMA_CTRL_ERR; + rdma_backend_destroy_mr(&kmr->dummy_mr->backend_mr); + mr =3D kmr->real_mr; + g_hash_table_remove(rdev->lkey_mr_tbl, &lkey); + if (!mr) + return VIRTIO_RDMA_CTRL_OK; + } + + munmap(mr->virt, mr->length); + rdma_backend_destroy_mr(&mr->backend_mr); + g_free(kmr); + return VIRTIO_RDMA_CTRL_OK; +} + +int virtio_rdma_create_qp(VirtIORdma *rdev, struct iovec *in, + struct iovec *out) +{ + struct cmd_create_qp cmd =3D {}; + struct rsp_create_qp rsp =3D {}; + size_t s; + int rc; + //uint32_t recv_cqn; + + s =3D iov_to_buf(in, 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) { + return VIRTIO_RDMA_CTRL_ERR; + } + + // TODO: check max qp + + printf("%s: %d qp type %d\n", __func__, cmd.pdn, cmd.qp_type); + + // store recv_cqn in opaque + rc =3D rdma_rm_alloc_qp(rdev->rdma_dev_res, cmd.pdn, cmd.qp_type, cmd.= max_send_wr, + cmd.max_send_sge, cmd.send_cqn, cmd.max_recv_wr, + cmd.max_recv_sge, cmd.recv_cqn, NULL, &rsp.qpn, + cmd.is_srq, cmd.srq_handle); + + if (rc) + return VIRTIO_RDMA_CTRL_ERR; + + printf("%s: %d\n", __func__, rsp.qpn); + + s =3D iov_from_buf(out, 1, 0, &rsp, sizeof(rsp)); + + return s =3D=3D sizeof(rsp) ? VIRTIO_RDMA_CTRL_OK : + VIRTIO_RDMA_CTRL_ERR; +} + +static void virtio_rdma_ah_attr_to_ibv (struct virtio_rdma_ah_attr *ah_att= r, struct ibv_ah_attr *ibv_attr) { + ibv_attr->grh.dgid =3D ah_attr->grh.dgid; + ibv_attr->grh.flow_label =3D ah_attr->grh.flow_label; + ibv_attr->grh.sgid_index =3D ah_attr->grh.sgid_index; + ibv_attr->grh.hop_limit =3D ah_attr->grh.hop_limit; + ibv_attr->grh.traffic_class =3D ah_attr->grh.traffic_class; + + ibv_attr->dlid =3D ah_attr->dlid; + ibv_attr->sl =3D ah_attr->sl; + ibv_attr->src_path_bits =3D ah_attr->src_path_bits; + ibv_attr->static_rate =3D ah_attr->static_rate; + ibv_attr->port_num =3D ah_attr->port_num; +} + +int virtio_rdma_modify_qp(VirtIORdma *rdev, struct iovec *in, + struct iovec *out) +{ + struct cmd_modify_qp cmd =3D {}; + size_t s; + int rc; + + RdmaRmQP *rqp; + struct ibv_qp_attr attr =3D {}; + + s =3D iov_to_buf(in, 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) { + return VIRTIO_RDMA_CTRL_ERR; + } + + printf("%s: %d %d\n", __func__, cmd.qpn, cmd.attr.qp_state); + + rqp =3D rdma_rm_get_qp(rdev->rdma_dev_res, cmd.qpn); + if (!rqp) + printf("Get qp failed\n"); + + if (rqp->qp_type =3D=3D IBV_QPT_GSI) { + return VIRTIO_RDMA_CTRL_OK; + } + + // TODO: assign attr based on cmd.attr_mask + attr.qp_state =3D cmd.attr.qp_state; + attr.cur_qp_state =3D cmd.attr.cur_qp_state; + attr.path_mtu =3D cmd.attr.path_mtu; + attr.path_mig_state =3D cmd.attr.path_mig_state; + attr.qkey =3D cmd.attr.qkey; + attr.rq_psn =3D cmd.attr.rq_psn; + attr.sq_psn =3D cmd.attr.sq_psn; + attr.dest_qp_num =3D cmd.attr.dest_qp_num; + attr.qp_access_flags =3D cmd.attr.qp_access_flags; + attr.pkey_index =3D cmd.attr.pkey_index; + attr.en_sqd_async_notify =3D cmd.attr.en_sqd_async_notify; + attr.sq_draining =3D cmd.attr.sq_draining; + attr.max_rd_atomic =3D cmd.attr.max_rd_atomic; + attr.max_dest_rd_atomic =3D cmd.attr.max_dest_rd_atomic; + attr.min_rnr_timer =3D cmd.attr.min_rnr_timer; + attr.port_num =3D cmd.attr.port_num; + attr.timeout =3D cmd.attr.timeout; + attr.retry_cnt =3D cmd.attr.retry_cnt; + attr.rnr_retry =3D cmd.attr.rnr_retry; + attr.alt_port_num =3D cmd.attr.alt_port_num; + attr.alt_timeout =3D cmd.attr.alt_timeout; + attr.rate_limit =3D cmd.attr.rate_limit; + attr.cap.max_inline_data =3D cmd.attr.cap.max_inline_data; + attr.cap.max_recv_sge =3D cmd.attr.cap.max_recv_sge; + attr.cap.max_recv_wr =3D cmd.attr.cap.max_recv_wr; + attr.cap.max_send_sge =3D cmd.attr.cap.max_send_sge; + attr.cap.max_send_wr =3D cmd.attr.cap.max_send_wr; + virtio_rdma_ah_attr_to_ibv(&cmd.attr.ah_attr, &attr.ah_attr); + virtio_rdma_ah_attr_to_ibv(&cmd.attr.alt_ah_attr, &attr.alt_ah_attr); + + rqp->qp_state =3D cmd.attr.qp_state; + + if (rqp->qp_state =3D=3D IBV_QPS_RTR) { + rqp->backend_qp.sgid_idx =3D cmd.attr.ah_attr.grh.sgid_index; + attr.ah_attr.grh.sgid_index =3D cmd.attr.ah_attr.grh.sgid_index; + attr.ah_attr.is_global =3D 1; + } + =20 + printf("modify_qp_debug %d %d %d %d %d %d %d %d\n", cmd.qpn, cmd.attr_= mask, cmd.attr.ah_attr.grh.sgid_index, + cmd.attr.dest_qp_num, cmd.attr.qp_state, cmd.attr.qkey, cmd.att= r.rq_psn, cmd.attr.sq_psn); + + rc =3D ibv_modify_qp(rqp->backend_qp.ibqp, &attr, cmd.attr_mask); + /* + rc =3D rdma_rm_modify_qp(rdev->rdma_dev_res, rdev->backend_dev, + cmd.qpn, cmd.attr_mask, + cmd.attr.ah_attr.grh.sgid_index, + &cmd.attr.ah_attr.grh.dgid, + cmd.attr.dest_qp_num, + (enum ibv_qp_state)cmd.attr.qp_state, + cmd.attr.qkey, cmd.attr.rq_psn, + cmd.attr.sq_psn);*/ + + if (rc) { + rdma_error_report( "ibv_modify_qp fail, rc=3D%d, errno=3D%d", rc, = errno); + return -EIO; + } + return rc; +} + +int virtio_rdma_query_qp(VirtIORdma *rdev, struct iovec *in, + struct iovec *out) +{ + struct cmd_query_qp cmd =3D {}; + struct rsp_query_qp rsp =3D {}; + struct ibv_qp_init_attr init_attr; + size_t s; + int rc; + + s =3D iov_to_buf(in, 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) { + return VIRTIO_RDMA_CTRL_ERR; + } + + memset(&rsp, 0, sizeof(rsp)); + + rc =3D rdma_rm_query_qp(rdev->rdma_dev_res, rdev->backend_dev, cmd.qpn, + (struct ibv_qp_attr *)&rsp.attr, cmd.attr_mask, + &init_attr); + if (rc) + return -EIO; + =20 + s =3D iov_from_buf(out, 1, 0, &rsp, sizeof(rsp)); + + return s =3D=3D sizeof(rsp) ? VIRTIO_RDMA_CTRL_OK : + VIRTIO_RDMA_CTRL_ERR; +} + +int virtio_rdma_destroy_qp(VirtIORdma *rdev, struct iovec *in, + struct iovec *out) +{ + struct cmd_destroy_qp cmd =3D {}; + size_t s; + + s =3D iov_to_buf(in, 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) { + return VIRTIO_RDMA_CTRL_ERR; + } + + rdma_info_report("%s: %d", __func__, cmd.qpn); + + rdma_rm_dealloc_qp(rdev->rdma_dev_res, cmd.qpn); + + return VIRTIO_RDMA_CTRL_OK; +} + +int virtio_rdma_query_gid(VirtIORdma *rdev, struct iovec *in, + struct iovec *out) +{ + struct cmd_query_gid cmd =3D {}; + union ibv_gid gid =3D {}; + size_t s; + int rc; + + s =3D iov_to_buf(in, 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) { + return VIRTIO_RDMA_CTRL_ERR; + } + + rc =3D ibv_query_gid(rdev->backend_dev->context, cmd.port, cmd.index, + &gid); + if (rc) + return VIRTIO_RDMA_CTRL_ERR; + + s =3D iov_from_buf(out, 1, 0, &gid, sizeof(gid)); + + return s =3D=3D sizeof(gid) ? VIRTIO_RDMA_CTRL_OK : + VIRTIO_RDMA_CTRL_ERR; +} + +int virtio_rdma_create_uc(VirtIORdma *rdev, struct iovec *in, + struct iovec *out) +{ + struct cmd_create_uc cmd =3D {}; + struct rsp_create_uc rsp =3D {}; + size_t s; + int rc; + + s =3D iov_to_buf(in, 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) { + return VIRTIO_RDMA_CTRL_ERR; + } + + rc =3D rdma_rm_alloc_uc(rdev->rdma_dev_res, cmd.pfn, &rsp.ctx_handle); + + if (rc) + return VIRTIO_RDMA_CTRL_ERR; + + s =3D iov_from_buf(out, 1, 0, &rsp, sizeof(rsp)); + + return s =3D=3D sizeof(rsp) ? VIRTIO_RDMA_CTRL_OK : + VIRTIO_RDMA_CTRL_ERR; +} + +int virtio_rdma_dealloc_uc(VirtIORdma *rdev, struct iovec *in, + struct iovec *out) +{ + struct cmd_dealloc_uc cmd =3D {}; + size_t s; + + s =3D iov_to_buf(in, 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) { + return VIRTIO_RDMA_CTRL_ERR; + } + + rdma_rm_dealloc_uc(rdev->rdma_dev_res, cmd.ctx_handle); + + return VIRTIO_RDMA_CTRL_OK; +} + +int virtio_rdma_query_pkey(VirtIORdma *rdev, struct iovec *in, + struct iovec *out) +{ + struct cmd_query_pkey cmd =3D {}; + struct rsp_query_pkey rsp =3D {}; + size_t s; + + s =3D iov_to_buf(in, 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) { + return VIRTIO_RDMA_CTRL_ERR; + } + + rsp.pkey =3D 0xFFFF; + + s =3D iov_from_buf(out, 1, 0, &rsp, sizeof(rsp)); + + return s =3D=3D sizeof(rsp) ? VIRTIO_RDMA_CTRL_OK : + VIRTIO_RDMA_CTRL_ERR; +} + +static void virtio_rdma_init_dev_caps(VirtIORdma *rdev) +{ + rdev->dev_attr.max_qp_wr =3D 1024; +} + +int virtio_rdma_init_ib(VirtIORdma *rdev) +{ + int rc; + + virtio_rdma_init_dev_caps(rdev); + + rdev->rdma_dev_res =3D g_malloc0(sizeof(RdmaDeviceResources)); + rdev->backend_dev =3D g_malloc0(sizeof(RdmaBackendDev)); + + rc =3D rdma_backend_init(rdev->backend_dev, NULL, rdev->rdma_dev_res, + rdev->backend_device_name, + rdev->backend_port_num, &rdev->dev_attr, + &rdev->mad_chr); + if (rc) { + rdma_error_report("Fail to initialize backend device"); + return rc; + } + + rdev->dev_attr.max_mr_size =3D 4096; + rdev->dev_attr.page_size_cap =3D 4096; + rdev->dev_attr.vendor_id =3D 1; + rdev->dev_attr.vendor_part_id =3D 1; + rdev->dev_attr.hw_ver =3D VIRTIO_RDMA_HW_VER; + rdev->dev_attr.atomic_cap =3D IBV_ATOMIC_NONE; + rdev->dev_attr.max_pkeys =3D 1; + rdev->dev_attr.phys_port_cnt =3D VIRTIO_RDMA_PORT_CNT; + + rc =3D rdma_rm_init(rdev->rdma_dev_res, &rdev->dev_attr); + if (rc) { + rdma_error_report("Fail to initialize resource manager"); + return rc; + } + + virtio_rdma_qp_ops_init(); + + rdma_backend_start(rdev->backend_dev); + + return 0; +} + +void virtio_rdma_fini_ib(VirtIORdma *rdev) +{ + rdma_backend_stop(rdev->backend_dev); + virtio_rdma_qp_ops_fini(); + rdma_rm_fini(rdev->rdma_dev_res, rdev->backend_dev, + rdev->backend_eth_device_name); + rdma_backend_fini(rdev->backend_dev); + g_free(rdev->rdma_dev_res); + g_free(rdev->backend_dev); +} diff --git a/hw/rdma/virtio/virtio-rdma-ib.h b/hw/rdma/virtio/virtio-rdma-i= b.h new file mode 100644 index 0000000000..457b25f998 --- /dev/null +++ b/hw/rdma/virtio/virtio-rdma-ib.h @@ -0,0 +1,176 @@ +/* + * Virtio RDMA Device - IB verbs + * + * Copyright (C) 2019 Oracle + * + * Authors: + * Yuval Shaia + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#ifndef VIRTIO_RDMA_IB_H +#define VIRTIO_RDMA_IB_H + +#include "qemu/osdep.h" +#include "qemu/iov.h" +#include "hw/virtio/virtio-rdma.h" + +#include "../rdma_rm.h" + +enum virtio_rdma_wr_opcode { + VIRTIO_RDMA_WR_RDMA_WRITE, + VIRTIO_RDMA_WR_RDMA_WRITE_WITH_IMM, + VIRTIO_RDMA_WR_SEND, + VIRTIO_RDMA_WR_SEND_WITH_IMM, + VIRTIO_RDMA_WR_RDMA_READ, + VIRTIO_RDMA_WR_ATOMIC_CMP_AND_SWP, + VIRTIO_RDMA_WR_ATOMIC_FETCH_AND_ADD, + VIRTIO_RDMA_WR_LOCAL_INV, + VIRTIO_RDMA_WR_BIND_MW, + VIRTIO_RDMA_WR_SEND_WITH_INV, + VIRTIO_RDMA_WR_TSO, + VIRTIO_RDMA_WR_DRIVER1, + + VIRTIO_RDMA_WR_REG_MR =3D 0x20, +}; + +struct virtio_rdma_cqe { + uint64_t wr_id; + enum ibv_wc_status status; + enum ibv_wc_opcode opcode; + uint32_t vendor_err; + uint32_t byte_len; + uint32_t imm_data; + uint32_t qp_num; + uint32_t src_qp; + int wc_flags; + uint16_t pkey_index; + uint16_t slid; + uint8_t sl; + uint8_t dlid_path_bits; +}; + +struct CompHandlerCtx { + VirtIORdma *dev; + uint32_t cq_handle; + struct virtio_rdma_cqe cqe; +}; + +struct virtio_rdma_kernel_mr { + RdmaRmMR *dummy_mr; // created by create_mr + RdmaRmMR *real_mr; // real mr created by map_mr_sg + + void* virt; + uint64_t length; + uint64_t start; + uint32_t mrn; + uint32_t lkey; + uint32_t rkey; + + uint32_t max_num_sg; + uint8_t dma_mr; +}; + +struct virtio_rdma_global_route { + union ibv_gid dgid; + uint32_t flow_label; + uint8_t sgid_index; + uint8_t hop_limit; + uint8_t traffic_class; +}; + +struct virtio_rdma_ah_attr { + struct virtio_rdma_global_route grh; + uint16_t dlid; + uint8_t sl; + uint8_t src_path_bits; + uint8_t static_rate; + uint8_t port_num; +}; + +struct virtio_rdma_qp_cap { + uint32_t max_send_wr; + uint32_t max_recv_wr; + uint32_t max_send_sge; + uint32_t max_recv_sge; + uint32_t max_inline_data; +}; + +struct virtio_rdma_qp_attr { + enum ibv_qp_state qp_state; + enum ibv_qp_state cur_qp_state; + enum ibv_mtu path_mtu; + enum ibv_mig_state path_mig_state; + uint32_t qkey; + uint32_t rq_psn; + uint32_t sq_psn; + uint32_t dest_qp_num; + uint32_t qp_access_flags; + uint16_t pkey_index; + uint16_t alt_pkey_index; + uint8_t en_sqd_async_notify; + uint8_t sq_draining; + uint8_t max_rd_atomic; + uint8_t max_dest_rd_atomic; + uint8_t min_rnr_timer; + uint8_t port_num; + uint8_t timeout; + uint8_t retry_cnt; + uint8_t rnr_retry; + uint8_t alt_port_num; + uint8_t alt_timeout; + uint32_t rate_limit; + struct virtio_rdma_qp_cap cap; + struct virtio_rdma_ah_attr ah_attr; + struct virtio_rdma_ah_attr alt_ah_attr; +}; + +#define VIRTIO_RDMA_PORT_CNT 1 +#define VIRTIO_RDMA_HW_VER 1 + +int virtio_rdma_init_ib(VirtIORdma *rdev); +void virtio_rdma_fini_ib(VirtIORdma *rdev); + +int virtio_rdma_query_device(VirtIORdma *rdev, struct iovec *in, + struct iovec *out); +int virtio_rdma_query_port(VirtIORdma *rdev, struct iovec *in, + struct iovec *out); +int virtio_rdma_create_cq(VirtIORdma *rdev, struct iovec *in, + struct iovec *out); +int virtio_rdma_destroy_cq(VirtIORdma *rdev, struct iovec *in, + struct iovec *out); +int virtio_rdma_create_pd(VirtIORdma *rdev, struct iovec *in, + struct iovec *out); +int virtio_rdma_destroy_pd(VirtIORdma *rdev, struct iovec *in, + struct iovec *out); +int virtio_rdma_get_dma_mr(VirtIORdma *rdev, struct iovec *in, + struct iovec *out); +int virtio_rdma_create_mr(VirtIORdma *rdev, struct iovec *in, + struct iovec *out); +int virtio_rdma_reg_user_mr(VirtIORdma *rdev, struct iovec *in, + struct iovec *out); +int virtio_rdma_create_qp(VirtIORdma *rdev, struct iovec *in, + struct iovec *out); +int virtio_rdma_modify_qp(VirtIORdma *rdev, struct iovec *in, + struct iovec *out); +int virtio_rdma_query_qp(VirtIORdma *rdev, struct iovec *in, + struct iovec *out); +int virtio_rdma_query_gid(VirtIORdma *rdev, struct iovec *in, + struct iovec *out); +int virtio_rdma_destroy_qp(VirtIORdma *rdev, struct iovec *in, + struct iovec *out); +int virtio_rdma_map_mr_sg(VirtIORdma *rdev, struct iovec *in, + struct iovec *out); +int virtio_rdma_dereg_mr(VirtIORdma *rdev, struct iovec *in, + struct iovec *out); +int virtio_rdma_create_uc(VirtIORdma *rdev, struct iovec *in, + struct iovec *out); +int virtio_rdma_query_pkey(VirtIORdma *rdev, struct iovec *in, + struct iovec *out); +int virtio_rdma_dealloc_uc(VirtIORdma *rdev, struct iovec *in, + struct iovec *out); + +#endif diff --git a/hw/rdma/virtio/virtio-rdma-main.c b/hw/rdma/virtio/virtio-rdma= -main.c new file mode 100644 index 0000000000..a69f0eb054 --- /dev/null +++ b/hw/rdma/virtio/virtio-rdma-main.c @@ -0,0 +1,231 @@ +/* + * Virtio RDMA Device + * + * Copyright (C) 2019 Oracle + * + * Authors: + * Yuval Shaia + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include +#include + +#include "qemu/osdep.h" +#include "hw/virtio/virtio.h" +#include "qemu/error-report.h" +#include "hw/virtio/virtio-bus.h" +#include "hw/virtio/virtio-rdma.h" +#include "hw/qdev-properties.h" +#include "include/standard-headers/linux/virtio_ids.h" + +#include "virtio-rdma-ib.h" +#include "virtio-rdma-qp.h" +#include "virtio-rdma-dev-api.h" + +#include "../rdma_rm_defs.h" +#include "../rdma_utils.h" + +#define DEFINE_VIRTIO_RDMA_CMD(cmd, handler) [cmd] =3D {handler, #cmd}, + +struct { + int (*handler)(VirtIORdma *rdev, struct iovec *in, struct iovec *out); + const char* name; +} cmd_tbl[] =3D { + DEFINE_VIRTIO_RDMA_CMD(VIRTIO_CMD_QUERY_DEVICE, virtio_rdma_query_devi= ce) + DEFINE_VIRTIO_RDMA_CMD(VIRTIO_CMD_QUERY_PORT, virtio_rdma_query_port) + DEFINE_VIRTIO_RDMA_CMD(VIRTIO_CMD_CREATE_CQ, virtio_rdma_create_cq) + DEFINE_VIRTIO_RDMA_CMD(VIRTIO_CMD_DESTROY_CQ, virtio_rdma_destroy_cq) + DEFINE_VIRTIO_RDMA_CMD(VIRTIO_CMD_CREATE_PD, virtio_rdma_create_pd) + DEFINE_VIRTIO_RDMA_CMD(VIRTIO_CMD_DESTROY_PD, virtio_rdma_destroy_pd) + DEFINE_VIRTIO_RDMA_CMD(VIRTIO_CMD_GET_DMA_MR, virtio_rdma_get_dma_mr) + DEFINE_VIRTIO_RDMA_CMD(VIRTIO_CMD_CREATE_MR, virtio_rdma_create_mr) + DEFINE_VIRTIO_RDMA_CMD(VIRTIO_CMD_MAP_MR_SG, virtio_rdma_map_mr_sg) + DEFINE_VIRTIO_RDMA_CMD(VIRTIO_CMD_REG_USER_MR, virtio_rdma_reg_user_mr) + DEFINE_VIRTIO_RDMA_CMD(VIRTIO_CMD_DEREG_MR, virtio_rdma_dereg_mr) + DEFINE_VIRTIO_RDMA_CMD(VIRTIO_CMD_CREATE_QP, virtio_rdma_create_qp) + DEFINE_VIRTIO_RDMA_CMD(VIRTIO_CMD_MODIFY_QP, virtio_rdma_modify_qp) + DEFINE_VIRTIO_RDMA_CMD(VIRTIO_CMD_QUERY_QP, virtio_rdma_query_qp) + DEFINE_VIRTIO_RDMA_CMD(VIRTIO_CMD_DESTROY_QP, virtio_rdma_destroy_qp) + DEFINE_VIRTIO_RDMA_CMD(VIRTIO_CMD_QUERY_GID, virtio_rdma_query_gid) + DEFINE_VIRTIO_RDMA_CMD(VIRTIO_CMD_CREATE_UC, virtio_rdma_create_uc) + DEFINE_VIRTIO_RDMA_CMD(VIRTIO_CMD_DEALLOC_UC, virtio_rdma_dealloc_uc) + DEFINE_VIRTIO_RDMA_CMD(VIRTIO_CMD_QUERY_PKEY, virtio_rdma_query_pkey) +}; + +static void virtio_rdma_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq) +{ + VirtIORdma *r =3D VIRTIO_RDMA(vdev); + struct control_buf cb; + VirtQueueElement *e; + size_t s; + + virtio_queue_set_notification(vq, 0); + + for (;;) { + e =3D virtqueue_pop(vq, sizeof(VirtQueueElement)); + if (!e) { + break; + } + + if (iov_size(e->in_sg, e->in_num) < sizeof(cb.status) || + iov_size(e->out_sg, e->out_num) < sizeof(cb.cmd)) { + virtio_error(vdev, "Got invalid message size"); + virtqueue_detach_element(vq, e, 0); + g_free(e); + break; + } + + s =3D iov_to_buf(&e->out_sg[0], 1, 0, &cb.cmd, sizeof(cb.cmd)); + if (s !=3D sizeof(cb.cmd)) { + cb.status =3D VIRTIO_RDMA_CTRL_ERR; + } else { + printf("cmd=3D%d %s\n", cb.cmd, cmd_tbl[cb.cmd].name); + if (cb.cmd >=3D VIRTIO_MAX_CMD_NUM) { + rdma_warn_report("unknown cmd %d\n", cb.cmd); + cb.status =3D VIRTIO_RDMA_CTRL_ERR; + } else { + if (cmd_tbl[cb.cmd].handler) { + cb.status =3D cmd_tbl[cb.cmd].handler(r, &e->out_sg[1], + &e->in_sg[0]); + } else { + rdma_warn_report("no handler for cmd %d\n", cb.cmd); + cb.status =3D VIRTIO_RDMA_CTRL_ERR; + } + } + } + printf("status=3D%d\n", cb.status); + s =3D iov_from_buf(&e->in_sg[1], 1, 0, &cb.status, sizeof(cb.statu= s)); + assert(s =3D=3D sizeof(cb.status)); + + virtqueue_push(vq, e, sizeof(cb.status)); + g_free(e); + virtio_notify(vdev, vq); + } + + virtio_queue_set_notification(vq, 1); +} + +static void g_free_destroy(gpointer data) { + g_free(data); +} + +static void virtio_rdma_device_realize(DeviceState *dev, Error **errp) +{ + VirtIODevice *vdev =3D VIRTIO_DEVICE(dev); + VirtIORdma *r =3D VIRTIO_RDMA(dev); + int rc, i; + + rc =3D virtio_rdma_init_ib(r); + if (rc) { + rdma_error_report("Fail to initialize IB layer"); + return; + } + + virtio_init(vdev, "virtio-rdma", VIRTIO_ID_RDMA, 1024); + + r->lkey_mr_tbl =3D g_hash_table_new_full(g_int_hash, g_int_equal, g_fr= ee_destroy, NULL); + + r->ctrl_vq =3D virtio_add_queue(vdev, 64, virtio_rdma_handle_ctrl); + + r->cq_vqs =3D g_malloc0_n(64, sizeof(*r->cq_vqs)); + for (i =3D 0; i < 64; i++) { + r->cq_vqs[i] =3D virtio_add_queue(vdev, 64, NULL); + } + + r->qp_vqs =3D g_malloc0_n(64 * 2, sizeof(*r->cq_vqs)); + for (i =3D 0; i < 64 * 2; i +=3D 2) { + r->qp_vqs[i] =3D virtio_add_queue(vdev, 64, virtio_rdma_handle_sq); + r->qp_vqs[i+1] =3D virtio_add_queue(vdev, 64, virtio_rdma_handle_r= q); + } +} + +static void virtio_rdma_device_unrealize(DeviceState *dev) +{ + VirtIODevice *vdev =3D VIRTIO_DEVICE(dev); + VirtIORdma *r =3D VIRTIO_RDMA(dev); + + virtio_del_queue(vdev, 0); + + virtio_cleanup(vdev); + + virtio_rdma_fini_ib(r); +} + +static uint64_t virtio_rdma_get_features(VirtIODevice *vdev, uint64_t feat= ures, + Error **errp) +{ + /* virtio_add_feature(&features, VIRTIO_NET_F_MAC); */ + + vdev->backend_features =3D features; + + return features; +} + + +static Property virtio_rdma_dev_properties[] =3D { + DEFINE_PROP_STRING("netdev", VirtIORdma, backend_eth_device_name), + DEFINE_PROP_STRING("ibdev",VirtIORdma, backend_device_name), + DEFINE_PROP_UINT8("ibport", VirtIORdma, backend_port_num, 1), + DEFINE_PROP_UINT64("dev-caps-max-mr-size", VirtIORdma, dev_attr.max_mr= _size, + MAX_MR_SIZE), + DEFINE_PROP_INT32("dev-caps-max-qp", VirtIORdma, dev_attr.max_qp, MAX_= QP), + DEFINE_PROP_INT32("dev-caps-max-cq", VirtIORdma, dev_attr.max_cq, MAX_= CQ), + DEFINE_PROP_INT32("dev-caps-max-mr", VirtIORdma, dev_attr.max_mr, MAX_= MR), + DEFINE_PROP_INT32("dev-caps-max-pd", VirtIORdma, dev_attr.max_pd, MAX_= PD), + DEFINE_PROP_INT32("dev-caps-qp-rd-atom", VirtIORdma, + dev_attr.max_qp_rd_atom, MAX_QP_RD_ATOM), + DEFINE_PROP_INT32("dev-caps-max-qp-init-rd-atom", VirtIORdma, + dev_attr.max_qp_init_rd_atom, MAX_QP_INIT_RD_ATOM), + DEFINE_PROP_INT32("dev-caps-max-ah", VirtIORdma, dev_attr.max_ah, MAX_= AH), + DEFINE_PROP_INT32("dev-caps-max-srq", VirtIORdma, dev_attr.max_srq, MA= X_SRQ), + DEFINE_PROP_CHR("mad-chardev", VirtIORdma, mad_chr), + DEFINE_PROP_END_OF_LIST(), +}; + +struct virtio_rdma_config { + int32_t max_cq; +}; + +static void virtio_rdma_get_config(VirtIODevice *vdev, uint8_t *config) +{ + VirtIORdma *r =3D VIRTIO_RDMA(vdev); + struct virtio_rdma_config cfg; + + cfg.max_cq =3D r->dev_attr.max_cq; + + memcpy(config, &cfg, sizeof(cfg)); +} + +static void virtio_rdma_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc =3D DEVICE_CLASS(klass); + VirtioDeviceClass *vdc =3D VIRTIO_DEVICE_CLASS(klass); + + set_bit(DEVICE_CATEGORY_NETWORK, dc->categories); + vdc->realize =3D virtio_rdma_device_realize; + vdc->unrealize =3D virtio_rdma_device_unrealize; + vdc->get_features =3D virtio_rdma_get_features; + vdc->get_config =3D virtio_rdma_get_config; + + dc->desc =3D "Virtio RDMA Device"; + device_class_set_props(dc, virtio_rdma_dev_properties); + set_bit(DEVICE_CATEGORY_NETWORK, dc->categories); +} + +static const TypeInfo virtio_rdma_info =3D { + .name =3D TYPE_VIRTIO_RDMA, + .parent =3D TYPE_VIRTIO_DEVICE, + .instance_size =3D sizeof(VirtIORdma), + .class_init =3D virtio_rdma_class_init, +}; + +static void virtio_register_types(void) +{ + type_register_static(&virtio_rdma_info); +} + +type_init(virtio_register_types) diff --git a/hw/rdma/virtio/virtio-rdma-qp.c b/hw/rdma/virtio/virtio-rdma-q= p.c new file mode 100644 index 0000000000..8b95c115cb --- /dev/null +++ b/hw/rdma/virtio/virtio-rdma-qp.c @@ -0,0 +1,241 @@ +/* + * Virtio RDMA Device - QP ops + * + * Copyright (C) 2021 Bytedance Inc. + * + * Authors: + * Junji Wei + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include +#include + +#include "qemu/osdep.h" +#include "qemu/atomic.h" +#include "cpu.h" + +#include "virtio-rdma-ib.h" +#include "virtio-rdma-qp.h" +#include "virtio-rdma-dev-api.h" + +#include "../rdma_utils.h" +#include "../rdma_rm.h" +#include "../rdma_backend.h" + +void virtio_rdma_qp_ops_comp_handler(void *ctx, struct ibv_wc *wc) +{ + VirtQueueElement *e; + VirtQueue *vq; + struct CompHandlerCtx *comp_ctx =3D (struct CompHandlerCtx *)ctx; + size_t s; + struct virtio_rdma_cqe* cqe; + + vq =3D comp_ctx->dev->cq_vqs[comp_ctx->cq_handle]; + e =3D virtqueue_pop(vq, sizeof(VirtQueueElement)); + if (!e) { + rdma_error_report("pop cq vq failed"); + } + + cqe =3D &comp_ctx->cqe; + cqe->status =3D wc->status; + cqe->opcode =3D wc->opcode; + cqe->vendor_err =3D wc->vendor_err; + cqe->byte_len =3D wc->byte_len; + cqe->imm_data =3D wc->imm_data; + cqe->src_qp =3D wc->src_qp; + cqe->wc_flags =3D wc->wc_flags; + cqe->pkey_index =3D wc->pkey_index; + cqe->slid =3D wc->slid; + cqe->sl =3D wc->sl; + cqe->dlid_path_bits =3D wc->dlid_path_bits; + + s =3D iov_from_buf(&e->in_sg[0], 1, 0, &comp_ctx->cqe, sizeof(comp_ctx= ->cqe)); + assert(s =3D=3D sizeof(comp_ctx->cqe)); + virtqueue_push(vq, e, sizeof(comp_ctx->cqe)); + + virtio_notify(&comp_ctx->dev->parent_obj, vq); + + g_free(e); + g_free(comp_ctx); +} + +void virtio_rdma_qp_ops_fini(void) +{ + rdma_backend_unregister_comp_handler(); +} + +int virtio_rdma_qp_ops_init(void) +{ + rdma_backend_register_comp_handler(virtio_rdma_qp_ops_comp_handler); + + return 0; +} + +void virtio_rdma_handle_sq(VirtIODevice *vdev, VirtQueue *vq) +{ + VirtIORdma *dev =3D VIRTIO_RDMA(vdev); + VirtQueueElement *e; + struct cmd_post_send cmd; + struct ibv_sge *sge; + RdmaRmQP *qp; + struct virtio_rdma_kernel_mr *kmr; + size_t s; + int status =3D 0, i; + struct CompHandlerCtx *comp_ctx; + + RdmaRmMR *mr; + uint32_t lkey; + uint32_t *htbl_key; + + for (;;) { + e =3D virtqueue_pop(vq, sizeof(VirtQueueElement)); + if (!e) { + break; + } + + s =3D iov_to_buf(&e->out_sg[0], 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) { + rdma_error_report("bad cmd"); + break; + } + + qp =3D rdma_rm_get_qp(dev->rdma_dev_res, cmd.qpn); + + sge =3D g_malloc0_n(cmd.num_sge, sizeof(*sge)); + s =3D iov_to_buf(&e->out_sg[1], 1, 0, sge, cmd.num_sge * sizeof(*s= ge)); + if (s !=3D cmd.num_sge * sizeof(*sge)) { + rdma_error_report("bad sge"); + break; + } + + if (cmd.is_kernel) { + if (cmd.opcode =3D=3D VIRTIO_RDMA_WR_REG_MR) { + mr =3D rdma_rm_get_mr(dev->rdma_dev_res, cmd.wr.reg.mrn); + lkey =3D mr->lkey; + kmr =3D g_hash_table_lookup(dev->lkey_mr_tbl, &lkey); + rdma_rm_alloc_mr(dev->rdma_dev_res, mr->pd_handle, (uint64= _t)kmr->virt, kmr->length, + kmr->virt, cmd.wr.reg.access, &kmr->mrn, &kmr->lkey, = &kmr->rkey); + kmr->real_mr =3D rdma_rm_get_mr(dev->rdma_dev_res, kmr->mr= n); + if (cmd.wr.reg.key !=3D mr->lkey) { + // rebuild lkey -> kmr + g_hash_table_remove(dev->lkey_mr_tbl, &lkey); + + htbl_key =3D g_malloc0(sizeof(*htbl_key)); + *htbl_key =3D cmd.wr.reg.key; + + g_hash_table_insert(dev->lkey_mr_tbl, htbl_key, kmr); + } + goto fin; + } + /* In kernel mode, need to map guest addr to remaped addr */ + for (i =3D 0; i < cmd.num_sge; i++) { + kmr =3D g_hash_table_lookup(dev->lkey_mr_tbl, &sge[i].lkey= ); + if (!kmr) { + rdma_error_report("Cannot found mr with lkey %u", sge[= i].lkey); + // TODO: handler this error + } + sge[i].addr =3D (uint64_t) kmr->virt + (sge[i].addr - kmr-= >start); + sge[i].lkey =3D kmr->lkey; + } + } + // TODO: copy depend on opcode + + /* Prepare CQE */ + comp_ctx =3D g_malloc(sizeof(*comp_ctx)); + comp_ctx->dev =3D dev; + comp_ctx->cq_handle =3D qp->send_cq_handle; + comp_ctx->cqe.wr_id =3D cmd.wr_id; + comp_ctx->cqe.qp_num =3D cmd.qpn; + comp_ctx->cqe.opcode =3D IBV_WC_SEND; + + rdma_backend_post_send(dev->backend_dev, &qp->backend_qp, qp->qp_t= ype, sge, 1, 0, NULL, NULL, 0, 0, comp_ctx); + +fin: + s =3D iov_from_buf(&e->in_sg[0], 1, 0, &status, sizeof(status)); + if (s !=3D sizeof(status)) + break; + + virtqueue_push(vq, e, sizeof(status)); + g_free(e); + g_free(sge); + virtio_notify(vdev, vq); + } +} + +void virtio_rdma_handle_rq(VirtIODevice *vdev, VirtQueue *vq) +{ + VirtIORdma *dev =3D VIRTIO_RDMA(vdev); + VirtQueueElement *e; + struct cmd_post_recv cmd; + struct ibv_sge *sge; + RdmaRmQP *qp; + struct virtio_rdma_kernel_mr *kmr; + size_t s; + int i, status =3D 0; + struct CompHandlerCtx *comp_ctx; + + for (;;) { + e =3D virtqueue_pop(vq, sizeof(VirtQueueElement)); + if (!e) + break; + + s =3D iov_to_buf(&e->out_sg[0], 1, 0, &cmd, sizeof(cmd)); + if (s !=3D sizeof(cmd)) { + fprintf(stderr, "bad cmd\n"); + break; + } + + qp =3D rdma_rm_get_qp(dev->rdma_dev_res, cmd.qpn); + + if (!qp->backend_qp.ibqp) { + if (qp->qp_type =3D=3D IBV_QPT_SMI) + rdma_error_report("Not support SMI"); + if (qp->qp_type =3D=3D IBV_QPT_GSI) + rdma_warn_report("Not support GSI now"); + goto end; + } + + sge =3D g_malloc0_n(cmd.num_sge, sizeof(*sge)); + s =3D iov_to_buf(&e->out_sg[1], 1, 0, sge, cmd.num_sge * sizeof(*s= ge)); + if (s !=3D cmd.num_sge * sizeof(*sge)) { + rdma_error_report("bad sge"); + break; + } + + if (cmd.is_kernel) { + /* In kernel mode, need to map guest addr to remaped addr */ + for (i =3D 0; i < cmd.num_sge; i++) { + kmr =3D g_hash_table_lookup(dev->lkey_mr_tbl, &sge[i].lkey= ); + if (!kmr) { + rdma_error_report("Cannot found mr with lkey %u", sge[= i].lkey); + // TODO: handler this error + } + sge[i].addr =3D (uint64_t) kmr->virt + (sge[i].addr - kmr-= >start); + sge[i].lkey =3D kmr->lkey; + } + } + + comp_ctx =3D g_malloc(sizeof(*comp_ctx)); + comp_ctx->dev =3D dev; + comp_ctx->cq_handle =3D qp->recv_cq_handle; + comp_ctx->cqe.wr_id =3D cmd.wr_id; + comp_ctx->cqe.qp_num =3D cmd.qpn; + comp_ctx->cqe.opcode =3D IBV_WC_RECV; + + rdma_backend_post_recv(dev->backend_dev, &qp->backend_qp, qp->qp_t= ype, sge, 1, comp_ctx); + +end: + s =3D iov_from_buf(&e->in_sg[0], 1, 0, &status, sizeof(status)); + if (s !=3D sizeof(status)) + break; + + virtqueue_push(vq, e, sizeof(status)); + g_free(e); + g_free(sge); + virtio_notify(vdev, vq); + } +} diff --git a/hw/rdma/virtio/virtio-rdma-qp.h b/hw/rdma/virtio/virtio-rdma-q= p.h new file mode 100644 index 0000000000..f4d9c755f3 --- /dev/null +++ b/hw/rdma/virtio/virtio-rdma-qp.h @@ -0,0 +1,29 @@ +/* + * Virtio RDMA Device - QP ops + * + * Copyright (C) 2021 Bytedance Inc. + * + * Authors: + * Junji Wei + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#ifndef VIRTIO_RDMA_QP_H +#define VIRTIO_RDMA_QP_H + +#include "qemu/osdep.h" +#include "qemu/iov.h" +#include "hw/virtio/virtio-rdma.h" + +#include "../rdma_rm.h" + +void virtio_rdma_qp_ops_comp_handler(void *ctx, struct ibv_wc *wc); +void virtio_rdma_qp_ops_fini(void); +int virtio_rdma_qp_ops_init(void); +void virtio_rdma_handle_sq(VirtIODevice *vdev, VirtQueue *vq); +void virtio_rdma_handle_rq(VirtIODevice *vdev, VirtQueue *vq); + +#endif \ No newline at end of file diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build index fbff9bc9d4..4de3d4e985 100644 --- a/hw/virtio/meson.build +++ b/hw/virtio/meson.build @@ -41,6 +41,7 @@ virtio_pci_ss.add(when: 'CONFIG_VIRTIO_9P', if_true: file= s('virtio-9p-pci.c')) virtio_pci_ss.add(when: 'CONFIG_VIRTIO_SCSI', if_true: files('virtio-scsi-= pci.c')) virtio_pci_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk-pc= i.c')) virtio_pci_ss.add(when: 'CONFIG_VIRTIO_NET', if_true: files('virtio-net-pc= i.c')) +virtio_pci_ss.add(when: 'CONFIG_VIRTIO_RDMA', if_true: files('virtio-rdma-= pci.c')) virtio_pci_ss.add(when: 'CONFIG_VIRTIO_SERIAL', if_true: files('virtio-ser= ial-pci.c')) virtio_pci_ss.add(when: 'CONFIG_VIRTIO_PMEM', if_true: files('virtio-pmem-= pci.c')) virtio_pci_ss.add(when: 'CONFIG_VIRTIO_IOMMU', if_true: files('virtio-iomm= u-pci.c')) diff --git a/hw/virtio/virtio-rdma-pci.c b/hw/virtio/virtio-rdma-pci.c new file mode 100644 index 0000000000..c4de92c88a --- /dev/null +++ b/hw/virtio/virtio-rdma-pci.c @@ -0,0 +1,110 @@ +/* + * Virtio rdma PCI Bindings + * + * Copyright (C) 2019 Oracle + * + * Authors: + * Yuval Shaia + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" + +#include "hw/virtio/virtio-net-pci.h" +#include "hw/virtio/virtio-rdma.h" +#include "virtio-pci.h" +#include "qapi/error.h" +#include "hw/qdev-properties.h" + +typedef struct VirtIORdmaPCI VirtIORdmaPCI; + +/* + * virtio-rdma-pci: This extends VirtioPCIProxy. + */ +#define TYPE_VIRTIO_RDMA_PCI "virtio-rdma-pci-base" +#define VIRTIO_RDMA_PCI(obj) \ + OBJECT_CHECK(VirtIORdmaPCI, (obj), TYPE_VIRTIO_RDMA_PCI) + +struct VirtIORdmaPCI { + VirtIOPCIProxy parent_obj; + VirtIORdma vdev; +}; + +static Property virtio_rdma_properties[] =3D { + DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags, + VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true), + DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 3), + DEFINE_PROP_END_OF_LIST(), +}; + +static void virtio_rdma_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp) +{ + VirtIORdmaPCI *dev =3D VIRTIO_RDMA_PCI(vpci_dev); + DeviceState *vdev =3D DEVICE(&dev->vdev); + VirtIONetPCI *vnet_pci; + PCIDevice *func0; + + qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus), errp); + object_property_set_bool(OBJECT(vdev), "realized", true, errp); + + func0 =3D pci_get_function_0(&vpci_dev->pci_dev); + /* Break if not virtio device in slot 0 */ + if (strcmp(object_get_typename(OBJECT(func0)), + TYPE_VIRTIO_NET_PCI_GENERIC)) { + fprintf(stderr, "Device on %x.0 is type %s but must be %s", + PCI_SLOT(vpci_dev->pci_dev.devfn), + object_get_typename(OBJECT(func0)), + TYPE_VIRTIO_NET_PCI_GENERIC); + return; + } + vnet_pci =3D VIRTIO_NET_PCI(func0); + dev->vdev.netdev =3D &vnet_pci->vdev; +} + +static void virtio_rdma_pci_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc =3D DEVICE_CLASS(klass); + PCIDeviceClass *k =3D PCI_DEVICE_CLASS(klass); + VirtioPCIClass *vpciklass =3D VIRTIO_PCI_CLASS(klass); + + k->vendor_id =3D PCI_VENDOR_ID_REDHAT_QUMRANET; + k->device_id =3D PCI_DEVICE_ID_VIRTIO_RDMA; + k->revision =3D VIRTIO_PCI_ABI_VERSION; + k->class_id =3D PCI_CLASS_NETWORK_OTHER; + set_bit(DEVICE_CATEGORY_NETWORK, dc->categories); + // dc->props_ =3D virtio_rdma_properties; + device_class_set_props(dc, virtio_rdma_properties); + vpciklass->realize =3D virtio_rdma_pci_realize; +} + +static void virtio_rdma_pci_instance_init(Object *obj) +{ + VirtIORdmaPCI *dev =3D VIRTIO_RDMA_PCI(obj); + + virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev), + TYPE_VIRTIO_RDMA); + /* + object_property_add_alias(obj, "bootindex", OBJECT(&dev->vdev), + "bootindex", &error_abort); + */ +} + +static const VirtioPCIDeviceTypeInfo virtio_rdma_pci_info =3D { + .base_name =3D TYPE_VIRTIO_RDMA_PCI, + .generic_name =3D "virtio-rdma-pci", + .transitional_name =3D "virtio-rdma-pci-transitional", + .non_transitional_name =3D "virtio-rdma-pci-non-transitional", + .instance_size =3D sizeof(VirtIORdmaPCI), + .instance_init =3D virtio_rdma_pci_instance_init, + .class_init =3D virtio_rdma_pci_class_init, +}; + +static void virtio_rdma_pci_register(void) +{ + virtio_pci_types_register(&virtio_rdma_pci_info); +} + +type_init(virtio_rdma_pci_register) diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index 72ce649eee..f976ea9db7 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -89,6 +89,7 @@ extern bool pci_available; #define PCI_DEVICE_ID_VIRTIO_PMEM 0x1013 #define PCI_DEVICE_ID_VIRTIO_IOMMU 0x1014 #define PCI_DEVICE_ID_VIRTIO_MEM 0x1015 +#define PCI_DEVICE_ID_VIRTIO_RDMA 0x1016 =20 #define PCI_VENDOR_ID_REDHAT 0x1b36 #define PCI_DEVICE_ID_REDHAT_BRIDGE 0x0001 diff --git a/include/hw/virtio/virtio-rdma.h b/include/hw/virtio/virtio-rdm= a.h new file mode 100644 index 0000000000..1ae10deb6a --- /dev/null +++ b/include/hw/virtio/virtio-rdma.h @@ -0,0 +1,58 @@ +/* + * Virtio RDMA Device + * + * Copyright (C) 2019 Oracle + * + * Authors: + * Yuval Shaia + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#ifndef QEMU_VIRTIO_RDMA_H +#define QEMU_VIRTIO_RDMA_H + +#include +#include + +#include "chardev/char-fe.h" +#include "hw/virtio/virtio.h" +#include "hw/virtio/virtio-net.h" + +#define TYPE_VIRTIO_RDMA "virtio-rdma-device" +#define VIRTIO_RDMA(obj) \ + OBJECT_CHECK(VirtIORdma, (obj), TYPE_VIRTIO_RDMA) + +typedef struct RdmaBackendDev RdmaBackendDev; +typedef struct RdmaDeviceResources RdmaDeviceResources; +struct ibv_device_attr; + +typedef struct VirtIORdma { + VirtIODevice parent_obj; + VirtQueue *ctrl_vq; + VirtIONet *netdev; + RdmaBackendDev *backend_dev; + RdmaDeviceResources *rdma_dev_res; + CharBackend mad_chr; + char *backend_eth_device_name; + char *backend_device_name; + uint8_t backend_port_num; + struct ibv_device_attr dev_attr; + + VirtQueue **cq_vqs; + VirtQueue **qp_vqs; + + GHashTable *lkey_mr_tbl; + + /* active objects statistics to enforce limits, should write with qato= mic */ + int num_qp; + int num_cq; + int num_pd; + int num_mr; + int num_srq; + int num_ctx; +} VirtIORdma; + +#endif diff --git a/include/standard-headers/linux/virtio_ids.h b/include/standard= -headers/linux/virtio_ids.h index b052355ac7..4c2151bffb 100644 --- a/include/standard-headers/linux/virtio_ids.h +++ b/include/standard-headers/linux/virtio_ids.h @@ -48,5 +48,6 @@ #define VIRTIO_ID_FS 26 /* virtio filesystem */ #define VIRTIO_ID_PMEM 27 /* virtio pmem */ #define VIRTIO_ID_MAC80211_HWSIM 29 /* virtio mac80211-hwsim */ +#define VIRTIO_ID_RDMA 30 /* virtio rdma */ =20 #endif /* _LINUX_VIRTIO_IDS_H */ --=20 2.11.0