From nobody Thu May 16 15:20:58 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=linux.dev ARC-Seal: i=1; a=rsa-sha256; t=1672689390; cv=none; d=zohomail.com; s=zohoarc; b=chTDR0IVI6n7E6yGlzDf07jseTSnX9sdDy7o5dln7VMGvny8wrAx8o6gcQ+6Q3OOIed80S8ltrnBJjLnEd3NCihxhM538dIKWCKRXzvW6sLovFHbFnPkuvokAlwG2cshNA4KhuDMbgrRikubVPnTlI9z+bvTVQYO69A88f304V4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1672689390; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=rbchk3ToheMQqluhvjGOwwzXSAe/Ht6ayohahIRHj/g=; b=Rvpvc5EOm2xEV2Q4TYOI4Jt6HdEMrE9OkIsXZjeZUOYoqX5w03aBZXj2ojO1lkP7oTZLLqap0RKH2JEE9kBss0tokaa2Fpm2YAFD226HOppImz2nvAJCpytskoLuJTQu8fq4j9JG/yFGbzC0+yC6OFN1usNG6UEcacGJAWt6u5I= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1672689390158895.3299226648817; Mon, 2 Jan 2023 11:56:30 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pCQuD-00058i-4m; Mon, 02 Jan 2023 14:55:33 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pCQu1-00052r-Q2 for qemu-devel@nongnu.org; Mon, 02 Jan 2023 14:55:23 -0500 Received: from resqmta-h1p-028595.sys.comcast.net ([2001:558:fd02:2446::3]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pCQtv-0007fm-Gb for qemu-devel@nongnu.org; Mon, 02 Jan 2023 14:55:21 -0500 Received: from resomta-h1p-027919.sys.comcast.net ([96.102.179.208]) by resqmta-h1p-028595.sys.comcast.net with ESMTP id CPigpTvnbwIZgCQtsp5vRx; Mon, 02 Jan 2023 19:55:12 +0000 Received: from localhost.localdomain ([71.205.181.50]) by resomta-h1p-027919.sys.comcast.net with ESMTPA id CQtMpowOQrMVsCQtVpVN61; Mon, 02 Jan 2023 19:54:50 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20211018a; t=1672689312; bh=rbchk3ToheMQqluhvjGOwwzXSAe/Ht6ayohahIRHj/g=; h=Received:Received:From:To:Subject:Date:Message-Id:MIME-Version: Xfinity-Spam-Result; b=BPv7qN1ED9IU886u5Uvw+SJ3gv2PnJE3xb7BzADAMmvfyRWWAc0kliQAiWxHIboy5 bTMFs/fBFd0YAKFWZNyeu9AHYM0qX8l4yM4DDCCHKh6d5kEVB/3X4ge4ZKWbzKvMZm /niUuL3bYNxnBERGW1ufiqYTKODr7QgtajOwod+JmEgig8vI0MJidxF8kXRn8Mv7hq ynMedxCRMHjyj4r1Pb3Tj1Gre2Mg6yCCDBCrb9oNhIo7lbO4vIeiwdiPWKCXLkesn0 5mmulcJ31NJXfr4AR28yzpD7V0005i6QEAgyRvcFMPG0TuxehZ56MSmvzcazjxnypO puEsOB9Z5Q+pw== X-Xfinity-VAAS: gggruggvucftvghtrhhoucdtuddrgedvhedrjedvgddufedvucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuvehomhgtrghsthdqtfgvshhipdfqfgfvpdfpqffurfetoffkrfenuceurghilhhouhhtmecufedtudenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomheplfhonhgrthhhrghnucffvghrrhhitghkuceojhhonhgrthhhrghnrdguvghrrhhitghksehlihhnuhigrdguvghvqeenucggtffrrghtthgvrhhnpeelteehudelteffveeivedvgfdvueekveffuedttddvteetieehieffvdfghfekhfenucffohhmrghinhepqhguvghvrdhiugenucfkphepjedurddvtdehrddukedurdehtdenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhephhgvlhhopehlohgtrghlhhhoshhtrdhlohgtrghlughomhgrihhnpdhinhgvthepjedurddvtdehrddukedurdehtddpmhgrihhlfhhrohhmpehjohhnrghthhgrnhdruggvrhhrihgtkheslhhinhhugidruggvvhdpnhgspghrtghpthhtohepjedprhgtphhtthhopehqvghmuhdquggvvhgvlhesnhhonhhgnhhurdhorhhgpdhrtghpthhtohepmhhitghhrggvlhdrkhhrohhprggtiigvkhesshholhhiughighhmrdgtohhmpdhrtghpthhtohepqhgvmhhuqdgslhhotghksehnohhnghhnuhdrohhrghdprhgtphhtthhopehksghushgthheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepihhtshesihhrrhgvlhgvvhgrnhhtrdgukhdprhgtphhtthhopehkfiholhhfsehrvgguhhgrthdrtghomhdprhgtphhtthhopehhrhgvihhtiiesrhgvughhrghtrdgtohhm X-Xfinity-VMeta: sc=-100.00;st=legit From: Jonathan Derrick To: qemu-devel@nongnu.org Cc: Michael Kropaczek , qemu-block@nongnu.org, Keith Busch , Klaus Jensen , Kevin Wolf , Hanna Reitz Subject: [PATCH v5 1/2] hw/nvme: Support for Namespaces Management from guest OS - create-ns Date: Mon, 2 Jan 2023 12:54:02 -0700 Message-Id: <20230102195403.461-2-jonathan.derrick@linux.dev> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230102195403.461-1-jonathan.derrick@linux.dev> References: <20230102195403.461-1-jonathan.derrick@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: softfail client-ip=2001:558:fd02:2446::3; envelope-from=jonathan.derrick@linux.dev; helo=resqmta-h1p-028595.sys.comcast.net X-Spam_score_int: -11 X-Spam_score: -1.2 X-Spam_bar: - X-Spam_report: (-1.2 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_HELO_PASS=-0.001, SPF_SOFTFAIL=0.665 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @comcastmailservice.net) X-ZM-MESSAGEID: 1672689392264100003 Content-Type: text/plain; charset="utf-8" From: Michael Kropaczek Added support for NVMEe NameSpaces Mangement allowing the guest OS to create namespaces by issuing nvme create-ns command. It is an extension to currently implemented Qemu nvme virtual device. Virtual devices representing namespaces will be created and/or deleted during Qemu's running session, at anytime. Signed-off-by: Michael Kropaczek --- docs/system/devices/nvme.rst | 59 ++++++- hw/nvme/cfg_key_checker.c | 51 ++++++ hw/nvme/ctrl-cfg.c | 219 ++++++++++++++++++++++++++ hw/nvme/ctrl.c | 245 ++++++++++++++++++++++++++++- hw/nvme/meson.build | 2 +- hw/nvme/ns-backend.c | 283 +++++++++++++++++++++++++++++++++ hw/nvme/ns.c | 295 ++++++++++++++++++++++++++++++----- hw/nvme/nvme.h | 30 +++- hw/nvme/subsys.c | 11 +- hw/nvme/trace-events | 2 + include/block/nvme.h | 30 ++++ include/hw/nvme/ctrl-cfg.h | 24 +++ include/hw/nvme/ns-cfg.h | 28 ++++ include/hw/nvme/nvme-cfg.h | 168 ++++++++++++++++++++ qemu-img-cmds.hx | 6 + qemu-img.c | 132 ++++++++++++++++ 16 files changed, 1531 insertions(+), 54 deletions(-) create mode 100644 hw/nvme/cfg_key_checker.c create mode 100644 hw/nvme/ctrl-cfg.c create mode 100644 hw/nvme/ns-backend.c create mode 100644 include/hw/nvme/ctrl-cfg.h create mode 100644 include/hw/nvme/ns-cfg.h create mode 100644 include/hw/nvme/nvme-cfg.h diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst index 30f841ef62..6b3bee5e5d 100644 --- a/docs/system/devices/nvme.rst +++ b/docs/system/devices/nvme.rst @@ -92,6 +92,63 @@ There are a number of parameters available: attach the namespace to a specific ``nvme`` device (identified by an ``i= d`` parameter on the controller device). =20 +Additional Namespaces managed by guest OS Namespaces Management +--------------------------------------------------------------------- + +.. code-block:: console + + -device nvme,id=3Dnvme-ctrl,serial=3D1234,subsys=3Dnvme-subsys,auto-ns-= path=3Dpath + +Parameters: + +``auto-ns-path=3D`` + If specified indicates a support for dynamic management of nvme namespac= es + by means of nvme create-ns command. This path points + to the storage area for backend images must exist. Additionally it requi= res + that parameter `ns-subsys` must be specified whereas parameter `drive` + must not. The legacy namespace backend is disabled, instead, a pair of + files 'nvme__ns_.cfg' and 'nvme__ns_.img' + will refer to respective namespaces. The create-ns, attach-ns + and detach-ns commands, issued at the guest side, will make changes to + those files accordingly. + For each namespace exists an image file in raw format and a config file + containing namespace parameters and state of the attachment allowing QEMU + to configure namespaces accordingly during start up. If for instance an + image file has a size of 0 bytes this will be interpreted as non existent + namespace. Issuing create-ns command will change the status in the config + files and and will re-size the image file accordingly so the image file + will be associated with the respective namespace. The main config file + nvme__ctrl.cfg keeps the track of allocated capacity to the + namespaces within the nvme controller. + As it is the case of a typical hard drive, backend images together with + config files need to be created. For this reason the qemu-img tool has + been extended by adding createns command. + + qemu-img createns {-S -C } + [-N ] {} + + Parameters: + -S and -C and are mandatory, `-S` must match `serial` parameter + and must match `auto-ns-path` parameter of "-device nvme,..." + specification. + -N is optional, if specified it will set a limit to the number of potent= ial + namespaces and will reduce the number of backend images and config files + accordingly. As a default, a set of images of 0 bytes size and default + config files for 256 namespaces will be created, a total of 513 files. + +Please note that ``nvme-ns`` device is not required to support of dynamic +namespaces management feature. It is not prohibited to assign a such devic= e to +``nvme`` device specified to support dynamic namespace management if one h= as +an use case to do so, however, it will only coexist and be out of the scop= e of +Namespaces Management. NsIds will be consistently managed, creation (creat= e-ns) +of a namespace will not allocate the NsId already being taken. If ``nvme-n= s`` +device conflicts with previously created one by create-ns (the same NsId), +it will break QEMU's start up. +More than one of NVMe controllers associated with NVMe subsystem are suppo= rted. +This feature requires that parameters ``serial=3D`` and ``subsys=3D`` of a= dditional +controllers must match those of the primary controller and ``auto-ns-path= =3D`` +must not be specified. + NVM Subsystems -------------- =20 @@ -320,4 +377,4 @@ controller are: =20 .. code-block:: console =20 - echo 0000:01:00.1 > /sys/bus/pci/drivers/nvme/bind \ No newline at end of file + echo 0000:01:00.1 > /sys/bus/pci/drivers/nvme/bind diff --git a/hw/nvme/cfg_key_checker.c b/hw/nvme/cfg_key_checker.c new file mode 100644 index 0000000000..ed50117801 --- /dev/null +++ b/hw/nvme/cfg_key_checker.c @@ -0,0 +1,51 @@ +/* + * QEMU NVM Express Virtual Dynamic Namespace Management + * + * + * Copyright (c) 2022 Solidigm + * + * Authors: + * Michael Kropaczek + * + * This work is licensed under the terms of the GNU GPL, version 2. See the + * COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include "qemu/error-report.h" +#include "qapi/qmp/qnum.h" +#include "qapi/qmp/qbool.h" +#include "qapi/error.h" +#include "block/qdict.h" + +#include "nvme.h" + +/* Here is a need for wrapping of original Qemu dictionary retrieval + * APIs. In rare cases, when nvme cfg files were tampered with or the + * Qemu version was upgraded and a new key is expected to be existent, + * but is missing, it will cause segfault crash. + * Builtin assert statements are not covering sufficiently such cases + * and additionally a possibility of error handling is lacking */ +#define NVME_KEY_CHECK_ERROR_FMT "key[%s] is expected to be existent" +int64_t qdict_get_int_chkd(const QDict *qdict, const char *key, Error **er= rp) +{ + QObject *qobject =3D qdict_get(qdict, key); + if (qobject) { + return qnum_get_int(qobject_to(QNum, qobject)); + } + + error_setg(errp, NVME_KEY_CHECK_ERROR_FMT, key); + return 0; +} + +QList *qdict_get_qlist_chkd(const QDict *qdict, const char *key, Error **e= rrp) +{ + QObject *qobject =3D qdict_get(qdict, key); + if (qobject) { + return qobject_to(QList, qobject); + } + + error_setg(errp, NVME_KEY_CHECK_ERROR_FMT, key); + return NULL; +} diff --git a/hw/nvme/ctrl-cfg.c b/hw/nvme/ctrl-cfg.c new file mode 100644 index 0000000000..f831aeab90 --- /dev/null +++ b/hw/nvme/ctrl-cfg.c @@ -0,0 +1,219 @@ +/* + * QEMU NVM Express Virtual Dynamic Namespace Management + * + * + * Copyright (c) 2022 Solidigm + * + * Authors: + * Michael Kropaczek + * + * This work is licensed under the terms of the GNU GPL, version 2. See the + * COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include "qemu/error-report.h" +#include "qapi/error.h" +#include "qapi/qmp/qjson.h" +#include "qapi/qmp/qstring.h" +#include "qapi/qmp/qlist.h" +#include "sysemu/block-backend.h" +#include "block/qdict.h" +#include "qemu/int128.h" +#include "hw/nvme/nvme-cfg.h" + +#include "nvme.h" +#include "trace.h" + +static char *nvme_create_cfg_name(NvmeCtrl *n, Error **errp) +{ + return c_create_cfg_name(n->params.ns_directory, n->params.serial, err= p); +} + +static NvmeIdCtrl *get_nvme_id_ctrl(NvmeCtrl *n) +{ + NvmeCtrl *ctrl; + NvmeSubsystem *subsys =3D n->subsys; + + ctrl =3D nvme_subsys_ctrl(subsys, 0); + return ctrl ? &ctrl->id_ctrl : &n->id_ctrl; +} + +int nvme_cfg_save(NvmeCtrl *n) +{ + NvmeIdCtrl *id =3D get_nvme_id_ctrl(n); + QDict *nvme_cfg =3D NULL; + Int128 tnvmcap128; + Int128 unvmcap128; + + nvme_cfg =3D qdict_new(); + + memcpy(&tnvmcap128, id->tnvmcap, sizeof(tnvmcap128)); + memcpy(&unvmcap128, id->unvmcap, sizeof(unvmcap128)); + +#define CTRL_CFG_DEF(type, key, value, default) \ + qdict_put_##type(nvme_cfg, key, value); +#include "hw/nvme/ctrl-cfg.h" +#undef CTRL_CFG_DEF + + return c_cfg_save(n->params.ns_directory, n->params.serial, nvme_cfg); +} + +int nvme_cfg_update(NvmeCtrl *n, uint64_t amount, NvmeNsAllocAction action) +{ + int ret =3D 0; + NvmeCtrl *ctrl; + NvmeIdCtrl *id =3D get_nvme_id_ctrl(n); + NvmeSubsystem *subsys =3D n->subsys; + Int128 tnvmcap128; + Int128 unvmcap128; + Int128 amount128 =3D int128_make64(amount); + int i; + + memcpy(&tnvmcap128, id->tnvmcap, sizeof(tnvmcap128)); + memcpy(&unvmcap128, id->unvmcap, sizeof(unvmcap128)); + + switch (action) { + case NVME_NS_ALLOC_CHK: + if (int128_ge(unvmcap128, amount128)) { + return 0; /* no update */ + } else { + ret =3D -1; + } + break; + case NVME_NS_ALLOC: + if (int128_ge(unvmcap128, amount128)) { + unvmcap128 =3D int128_sub(unvmcap128, amount128); + } else { + ret =3D -1; + } + break; + case NVME_NS_DEALLOC: + unvmcap128 =3D int128_add(unvmcap128, amount128); + if (int128_ge(unvmcap128, tnvmcap128)) { + unvmcap128 =3D tnvmcap128; + } + break; + default:; + } + + if (ret =3D=3D 0) { + if (subsys) { + for (i =3D 0; i < ARRAY_SIZE(subsys->ctrls); i++) { + ctrl =3D nvme_subsys_ctrl(subsys, i); + if (ctrl) { + id =3D &ctrl->id_ctrl; + memcpy(id->unvmcap, &unvmcap128, sizeof(id->unvmcap)); + } + } + } else { + memcpy(id->unvmcap, &unvmcap128, sizeof(id->unvmcap)); + } + } + + return ret; +} + +/* Note: id->tnvmcap and id->unvmcap are pointing to 16 bytes arrays, + * but those are interpreted as 128bits int objects. + * It is OK here to use Int128 because backend's namespace images ca= nnot + * exceed size of 64bit max value */ +static void nvme_cfg_validate(NvmeCtrl *n, uint64_t tnvmcap, uint64_t unvm= cap, + Error **errp) +{ + NvmeIdCtrl *id =3D &n->id_ctrl; + Int128 tnvmcap128; + Int128 unvmcap128; + + if (unvmcap > tnvmcap) { + error_setg(errp, "nvme-cfg file is corrupted, free to allocate[%"P= RIu64 + "] > total capacity[%"PRIu64"]", + unvmcap, tnvmcap); + } else if (tnvmcap =3D=3D (uint64_t) 0) { + error_setg(errp, "nvme-cfg file error: total capacity cannot be ze= ro"); + } else { + tnvmcap128 =3D int128_make64(tnvmcap); + unvmcap128 =3D int128_make64(unvmcap); + memcpy(id->tnvmcap, &tnvmcap128, sizeof(id->tnvmcap)); + memcpy(id->unvmcap, &unvmcap128, sizeof(id->unvmcap)); + } +} + +int nvme_cfg_load(NvmeCtrl *n) +{ + QObject *nvme_cfg_obj =3D NULL; + QDict *nvme_cfg =3D NULL; + NvmeIdCtrl *id_p =3D get_nvme_id_ctrl(n); + NvmeIdCtrl *id =3D &n->id_ctrl; + int ret =3D 0; + char *filename =3D NULL; + uint64_t tnvmcap; + uint64_t unvmcap; + FILE *fp =3D NULL; + char buf[NVME_CFG_MAXSIZE] =3D {}; + Error *local_err =3D NULL; + + if (n->cntlid) { /* secondary controller */ + memcpy(id->tnvmcap, id_p->tnvmcap, sizeof(id->tnvmcap)); + memcpy(id->unvmcap, id_p->unvmcap, sizeof(id->unvmcap)); + goto fail2; + } + + filename =3D nvme_create_cfg_name(n, &local_err); + if (local_err) { + goto fail2; + } + + if (access(filename, F_OK)) { + error_setg(&local_err, "Missing nvme-cfg file"); + goto fail2; + } + + fp =3D fopen(filename, "r"); + if (fp =3D=3D NULL) { + error_setg(&local_err, "open %s: %s", filename, + strerror(errno)); + goto fail2; + } + + if (fread(buf, sizeof(buf), 1, fp)) { + error_setg(&local_err, "Could not read nvme-cfg"); + goto fail1; + } + + nvme_cfg_obj =3D qobject_from_json(buf, NULL); + if (!nvme_cfg_obj) { + error_setg(&local_err, "Could not parse the JSON for nvme-cfg"); + goto fail1; + } + + nvme_cfg =3D qobject_to(QDict, nvme_cfg_obj); + qdict_flatten(nvme_cfg); + + tnvmcap =3D qdict_get_int_chkd(nvme_cfg, "tnvmcap", &local_err); + if (local_err) { + goto fail1; + } + + unvmcap =3D qdict_get_int_chkd(nvme_cfg, "unvmcap", &local_err); + if (local_err) { + goto fail1; + } + + nvme_cfg_validate(n, tnvmcap, unvmcap, &local_err); + +fail1: + fclose(fp); + +fail2: + if (local_err) { + error_report_err(local_err); + ret =3D -1; + } + + qobject_unref(nvme_cfg_obj); + g_free(filename); + + return ret; +} diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 4a0c51a947..5ed35d7cf4 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -40,7 +40,9 @@ * sriov_vi_flexible=3D \ * sriov_max_vi_per_vf=3D \ * sriov_max_vq_per_vf=3D \ - * subsys=3D + * subsys=3D, \ + * auto-ns-path=3D + * * -device nvme-ns,drive=3D,bus=3D,nsid=3D,\ * zoned=3D, \ * subsys=3D,detached=3D @@ -140,6 +142,65 @@ * a secondary controller. The default 0 resolves to * `(sriov_vq_flexible / sriov_max_vfs)`. * + * - `auto-ns-path` + * If specified indicates a support for dynamic management of nvme names= paces + * by means of nvme create-ns command. This path pointing + * to a storage area for backend images must exist. Additionally it requ= ires + * that parameter `ns-subsys` must be specified whereas parameter `drive` + * must not. The legacy namespace backend is disabled, instead, a pair of + * files 'nvme__ns_.cfg' and 'nvme__ns_.= img' + * will refer to respective namespaces. The create-ns, attach-ns + * and detach-ns commands, issued at the guest side, will make changes to + * those files accordingly. + * For each namespace exists an image file in raw format and a config fi= le + * containing namespace parameters and a state of the attachment allowin= g QEMU + * to configure namespace during its start up accordingly. If for instan= ce an + * image file has a size of 0 bytes, this will be interpreted as non exi= stent + * namespace. Issuing create-ns command will change the status in the co= nfig + * files and and will re-size the image file accordingly so the image fi= le + * will be associated with the respective namespace. The main config file + * nvme__ctrl.cfg keeps the track of allocated capacity to the + * namespaces within the nvme controller. + * As it is the case of a typical hard drive, backend images together wi= th + * config files need to be created. For this reason the qemu-img tool has + * been extended by adding createns command. + * + * qemu-img createns {-S -C } + * [-N ] {} + * + * Parameters: + * -S and -C and are mandatory, `-S` must match `serial` parameter + * and must match `auto-ns-path` parameter of "-device nvme,..." + * specification. + * -N is optional, if specified, it will set a limit to the number of po= tential + * namespaces and will reduce the number of backend images and config fi= les + * accordingly. As a default, a set of images of 0 bytes size and default + * config files for 256 namespaces will be created, a total of 513 files. + * + * Note 1: + * If the main "-drive" is not specified with 'if=3Dvirtio', then = SeaBIOS + * must be built with disabled "Parallelize hardware init" to allow + * a proper boot. Without it, it is probable that non deterministic + * order of collecting of potential block devices for a boot will = not + * catch that one with guest OS. Deterministic order however will = fill + * up the list of potential boot devices starting with a typical A= TA + * devices usually containing guest OS. + * SeaBIOS has a limited space to store all potential boot block d= evices + * if there are more than 11 namespaces. (other types require less + * memory so the number of 11 does not apply universally) + * (above Note refers to SeaBIOS rel-1.16.0) + * Note 2: + * If the main "-drive" referring to guest OS is specified with + * 'if=3Dvirtio', then there is no need to build SeaBIOS with disa= bled + * "Parallelize hardware init". + * Block boot device 'Virtio disk PCI:xx:xx.x" will appear as a fi= rst + * listed instead of an ATA device. + * Note 3: + * More than one of NVMe controllers associated with NVMe subsyste= m are + * supported. This feature requires that parameters 'serial=3D' and + * 'subsys=3D' of additional controllers must match those of the p= rimary + * controller and 'auto-ns-path=3D' must not be specified. + * * nvme namespace device parameters * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * - `shared` @@ -262,6 +323,7 @@ static const uint32_t nvme_cse_acs[256] =3D { [NVME_ADM_CMD_SET_FEATURES] =3D NVME_CMD_EFF_CSUPP, [NVME_ADM_CMD_GET_FEATURES] =3D NVME_CMD_EFF_CSUPP, [NVME_ADM_CMD_ASYNC_EV_REQ] =3D NVME_CMD_EFF_CSUPP, + [NVME_ADM_CMD_NS_MGMT] =3D NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_= NIC, [NVME_ADM_CMD_NS_ATTACHMENT] =3D NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_= NIC, [NVME_ADM_CMD_VIRT_MNGMT] =3D NVME_CMD_EFF_CSUPP, [NVME_ADM_CMD_DBBUF_CONFIG] =3D NVME_CMD_EFF_CSUPP, @@ -5577,6 +5639,134 @@ static void nvme_select_iocs_ns(NvmeCtrl *n, NvmeNa= mespace *ns) } } =20 +static uint16_t nvme_ns_mgmt_create(NvmeCtrl *n, NvmeRequest *req, uint32_= t nsid) +{ + NvmeCtrl *ctrl =3D NULL; /* primary controller */ + NvmeNamespace *ns =3D NULL; + NvmeIdNsMgmt id_ns =3D {}; + uint64_t nsze; + uint64_t ncap; + uint16_t i; + uint16_t ret; + Error *local_err =3D NULL; + + if (nsid) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + ret =3D nvme_h2c(n, (uint8_t *)&id_ns, sizeof(id_ns), req); + if (ret) { + return ret; + } + + ctrl =3D nvme_subsys_ctrl(n->subsys, 0); + + nsze =3D le64_to_cpu(id_ns.nsze); + ncap =3D le64_to_cpu(id_ns.ncap); + + if (ncap > nsze) { + return NVME_INVALID_FIELD | NVME_DNR; + } else if (ncap !=3D nsze) { + return NVME_THIN_PROVISION_NOTSPRD | NVME_DNR; + } + + nvme_validate_flbas(id_ns.flbas, &local_err); + if (local_err) { + goto fail; + } + + if (!n->params.ns_directory) { + error_setg(&local_err, "create-ns not supported if 'auto-ns-path' = is not specified"); + goto fail; + } else if (n->namespace.blkconf.blk) { + error_setg(&local_err, "create-ns not supported if 'drive' is spec= ified"); + goto fail; + } + + for (i =3D 1; i <=3D NVME_MAX_NAMESPACES; i++) { + if (nvme_ns(ctrl, (uint32_t)i) || nvme_subsys_ns(ctrl->subsys, (ui= nt32_t)i)) { + continue; + } + break; + } + + if (i > le32_to_cpu(ctrl->id_ctrl.nn) || i > NVME_MAX_NAMESPACES) { + return NVME_NS_IDNTIFIER_UNAVAIL | NVME_DNR; + } + nsid =3D i; + + /* create ns here */ + ns =3D nvme_ns_create(n, nsid, &id_ns, &local_err); + if (local_err) { + goto fail; + } + + if (nvme_cfg_update(n, ns->size, NVME_NS_ALLOC_CHK)) { + /* place for delete-ns */ + error_setg(&local_err, "Insufficient capacity, an orphaned ns[%"PR= Iu32"] will be left behind", nsid); + error_report_err(local_err); + return NVME_NS_INSUFFICIENT_CAPAC | NVME_DNR; + } + (void)nvme_cfg_update(n, ns->size, NVME_NS_ALLOC); + if (nvme_cfg_save(n)) { + (void)nvme_cfg_update(n, ns->size, NVME_NS_DEALLOC); + /* place for delete-ns */ + error_setg(&local_err, "Cannot save conf file, an orphaned ns[%"PR= Iu32"] will be left behind", nsid); + error_report_err(local_err); + return NVME_INVALID_FIELD | NVME_DNR; + } + +fail: + if (local_err) { + error_report_err(local_err); + return NVME_INVALID_FIELD | NVME_DNR; + } + + req->cqe.result =3D cpu_to_le32(nsid); + return NVME_SUCCESS; +} + +static uint16_t nvme_ns_mgmt(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeIdCtrl *id =3D &n->id_ctrl; + uint8_t flags =3D req->cmd.flags; + uint32_t nsid =3D le32_to_cpu(req->cmd.nsid); + uint32_t dw10 =3D le32_to_cpu(req->cmd.cdw10); + uint32_t dw11 =3D le32_to_cpu(req->cmd.cdw11); + uint8_t sel =3D dw10 & 0xf; + uint8_t csi =3D (dw11 >> 24) & 0xf; + Error *local_err =3D NULL; + + trace_pci_nvme_ns_mgmt(nvme_cid(req), nsid, sel, csi, NVME_CMD_FLAGS_P= SDT(flags)); + + if (!(le16_to_cpu(id->oacs) & NVME_OACS_NS_MGMT)) { + return NVME_NS_ATTACH_MGMT_NOTSPRD | NVME_DNR; + } + + if (n->cntlid && !n->subsys) { + error_setg(&local_err, "Secondary controller without subsystem"); + error_report_err(local_err); + return NVME_NS_ATTACH_MGMT_NOTSPRD | NVME_DNR; + } + + switch (sel) { + case NVME_NS_MANAGEMENT_CREATE: + switch (csi) { + case NVME_CSI_NVM: + return nvme_ns_mgmt_create(n, req, nsid); + case NVME_CSI_ZONED: + /* fall through for now */ + default: + return NVME_INVALID_FIELD | NVME_DNR; + } + break; + default: + return NVME_INVALID_FIELD | NVME_DNR; + } + + return NVME_SUCCESS; +} + static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req) { NvmeNamespace *ns; @@ -5589,6 +5779,7 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeR= equest *req) uint16_t *ids =3D &list[1]; uint16_t ret; int i; + Error *local_err; =20 trace_pci_nvme_ns_attachment(nvme_cid(req), dw10 & 0xf); =20 @@ -5627,6 +5818,8 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeR= equest *req) return NVME_NS_PRIVATE | NVME_DNR; } =20 + ns->params.detached =3D false; + nvme_attach_ns(ctrl, ns); nvme_select_iocs_ns(ctrl, ns); =20 @@ -5639,6 +5832,11 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, Nvme= Request *req) =20 ctrl->namespaces[nsid] =3D NULL; ns->attached--; + if (ns->attached) { + ns->params.detached =3D false; + } else { + ns->params.detached =3D true; + } =20 nvme_update_dmrsl(ctrl); =20 @@ -5659,6 +5857,12 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, Nvme= Request *req) } } =20 + if (ns_cfg_save(n, ns, nsid) =3D=3D -1) { /* save ns cfg */ + error_setg(&local_err, "Unable to save ns-cnf"); + error_report_err(local_err); + return NVME_INVALID_FIELD | NVME_DNR; + } + return NVME_SUCCESS; } =20 @@ -6124,6 +6328,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeReque= st *req) return nvme_get_feature(n, req); case NVME_ADM_CMD_ASYNC_EV_REQ: return nvme_aer(n, req); + case NVME_ADM_CMD_NS_MGMT: + return nvme_ns_mgmt(n, req); case NVME_ADM_CMD_NS_ATTACHMENT: return nvme_ns_attachment(n, req); case NVME_ADM_CMD_VIRT_MNGMT: @@ -6966,7 +7172,7 @@ static void nvme_check_constraints(NvmeCtrl *n, Error= **errp) params->max_ioqpairs =3D params->num_queues - 1; } =20 - if (n->namespace.blkconf.blk && n->subsys) { + if (n->namespace.blkconf.blk && n->subsys && !params->ns_directory) { error_setg(errp, "subsystem support is unavailable with legacy " "namespace ('drive' property)"); return; @@ -7480,12 +7686,14 @@ void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns) BDRV_REQUEST_MAX_BYTES / nvme_l2b(ns, 1)); } =20 +static void nvme_add_bootindex(Object *obj); static void nvme_realize(PCIDevice *pci_dev, Error **errp) { NvmeCtrl *n =3D NVME(pci_dev); NvmeNamespace *ns; Error *local_err =3D NULL; NvmeCtrl *pn =3D NVME(pcie_sriov_get_pf(pci_dev)); + NvmeCtrl *ctrl =3D NULL; =20 if (pci_is_vf(pci_dev)) { /* @@ -7505,6 +7713,15 @@ static void nvme_realize(PCIDevice *pci_dev, Error *= *errp) qbus_init(&n->bus, sizeof(NvmeBus), TYPE_NVME_BUS, &pci_dev->qdev, n->parent_obj.qdev.id); =20 + ctrl =3D nvme_subsys_ctrl(n->subsys, 0); + + /* check if secondary controller, if so take over the ns_directory */ + if (ctrl && ctrl->params.ns_directory && !n->params.ns_directory) { + n->params.ns_directory =3D g_strdup(ctrl->params.ns_directory); + } + + nvme_add_bootindex(OBJECT(n)); + if (nvme_init_subsys(n, errp)) { error_propagate(errp, local_err); return; @@ -7516,7 +7733,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **= errp) nvme_init_ctrl(n, pci_dev); =20 /* setup a namespace if the controller drive property was given */ - if (n->namespace.blkconf.blk) { + if (n->namespace.blkconf.blk && !n->params.ns_directory) { ns =3D &n->namespace; ns->params.nsid =3D 1; =20 @@ -7525,6 +7742,14 @@ static void nvme_realize(PCIDevice *pci_dev, Error *= *errp) } =20 nvme_attach_ns(n, ns); + } else if (!n->namespace.blkconf.blk && n->params.ns_directory) { + if (nvme_cfg_load(n)) { + error_setg(errp, "Could not process nvme-cfg"); + return; + } + if (nvme_ns_backend_setup(n, errp)) { + return; + } } } =20 @@ -7569,6 +7794,7 @@ static void nvme_exit(PCIDevice *pci_dev) =20 static Property nvme_props[] =3D { DEFINE_BLOCK_PROPERTIES(NvmeCtrl, namespace.blkconf), + DEFINE_PROP_STRING("auto-ns-path", NvmeCtrl,params.ns_directory), DEFINE_PROP_LINK("pmrdev", NvmeCtrl, pmr.dev, TYPE_MEMORY_BACKEND, HostMemoryBackend *), DEFINE_PROP_LINK("subsys", NvmeCtrl, subsys, TYPE_NVME_SUBSYS, @@ -7706,14 +7932,19 @@ static void nvme_class_init(ObjectClass *oc, void *= data) dc->reset =3D nvme_pci_reset; } =20 -static void nvme_instance_init(Object *obj) +static void nvme_add_bootindex(Object *obj) { NvmeCtrl *n =3D NVME(obj); =20 - device_add_bootindex_property(obj, &n->namespace.blkconf.bootindex, - "bootindex", "/namespace@1,0", - DEVICE(obj)); + if (!n->params.ns_directory) { + device_add_bootindex_property(obj, &n->namespace.blkconf.bootindex, + "bootindex", "/namespace@1,0", + DEVICE(obj)); + } +} =20 +static void nvme_instance_init(Object *obj) +{ object_property_add(obj, "smart_critical_warning", "uint8", nvme_get_smart_warning, nvme_set_smart_warning, NULL, NULL); diff --git a/hw/nvme/meson.build b/hw/nvme/meson.build index 3cf40046ee..8900831701 100644 --- a/hw/nvme/meson.build +++ b/hw/nvme/meson.build @@ -1 +1 @@ -softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('ctrl.c', 'dif.c', = 'ns.c', 'subsys.c')) +softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('ctrl.c', 'dif.c', = 'ns.c', 'subsys.c', 'ns-backend.c', 'cfg_key_checker.c', 'ctrl-cfg.c')) diff --git a/hw/nvme/ns-backend.c b/hw/nvme/ns-backend.c new file mode 100644 index 0000000000..06de8f262c --- /dev/null +++ b/hw/nvme/ns-backend.c @@ -0,0 +1,283 @@ +/* + * QEMU NVM Express Virtual Dynamic Namespace Management + * + * + * Copyright (c) 2022 Solidigm + * + * Authors: + * Michael Kropaczek + * + * This work is licensed under the terms of the GNU GPL, version 2. See the + * COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include "qemu/units.h" +#include "qemu/error-report.h" +#include "qapi/error.h" +#include "qapi/qmp/qnum.h" +#include "qapi/qmp/qjson.h" +#include "qapi/qmp/qstring.h" +#include "qapi/qmp/qlist.h" +#include "sysemu/sysemu.h" +#include "sysemu/block-backend.h" +#include "block/qdict.h" +#include "hw/nvme/nvme-cfg.h" + +#include "nvme.h" +#include "trace.h" + +/* caller will take ownership */ +static QDict *ns_get_bs_default_opts(bool read_only) +{ + QDict *bs_opts =3D qdict_new(); + + qdict_set_default_str(bs_opts, BDRV_OPT_CACHE_DIRECT, "off"); + qdict_set_default_str(bs_opts, BDRV_OPT_CACHE_NO_FLUSH, "off"); + qdict_set_default_str(bs_opts, BDRV_OPT_READ_ONLY, + read_only ? "on" : "off"); + qdict_set_default_str(bs_opts, BDRV_OPT_AUTO_READ_ONLY, "on"); + qdict_set_default_str(bs_opts, "driver", "raw"); + + return bs_opts; +} + +BlockBackend *ns_blockdev_init(const char *file, Error **errp) +{ + BlockBackend *blk =3D NULL; + bool read_only =3D false; + QDict *bs_opts; + + if (access(file, F_OK)) { + error_setg(errp, "%s not found, please create one", file); + } else { + bs_opts =3D ns_get_bs_default_opts(read_only); + blk =3D blk_new_open(file, NULL, bs_opts, BDRV_O_RDWR | BDRV_O_RES= IZE, errp); + } + + return blk; +} + +void ns_blockdev_activate(BlockBackend *blk, uint64_t image_size, Error *= *errp) +{ + int ret; + + ret =3D blk_set_perm(blk, BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_W= RITE_UNCHANGED, errp); + if (ret < 0) { + return; + } + ret =3D blk_truncate(blk, image_size, false, PREALLOC_MODE_OFF, 0, + errp); +} + +int ns_storage_path_check(NvmeCtrl *n, Error **errp) +{ + return storage_path_check(n->params.ns_directory, n->params.serial, e= rrp); +} + +/* caller will take ownership */ +char *ns_create_image_name(NvmeCtrl *n, uint32_t nsid, Error **errp) +{ + return create_image_name(n->params.ns_directory, n->params.serial, nsi= d, errp); +} + +static char *ns_create_cfg_name(NvmeCtrl *n, uint32_t nsid, Error **errp) +{ + return create_cfg_name(n->params.ns_directory, n->params.serial, nsid,= errp); +} + +int ns_auto_check(NvmeCtrl *n, NvmeNamespace *ns, uint32_t nsid) +{ + int ret =3D 0; + BlockBackend *blk =3D ns->blkconf.blk; + char *file_name_img =3D NULL; + + if (!blk) { + return 0; + } + + file_name_img =3D ns_create_image_name(n, nsid, NULL); + + if (!file_name_img || strcmp(blk_bs(blk)->filename, file_name_img)) { + ret =3D -1; + } + + g_free(file_name_img); + + return ret; +} + +void ns_cfg_clear(NvmeNamespace *ns) +{ + ns->params.pi =3D 0; + ns->lbasz =3D 0; + ns->id_ns.nsze =3D 0; + ns->id_ns.ncap =3D 0; + ns->id_ns.nuse =3D 0; + ns->id_ns.nsfeat =3D 0; + ns->id_ns.flbas =3D 0; + ns->id_ns.nmic=3D 0; + ns->size =3D 0; +} + +int ns_cfg_save(NvmeCtrl *n, NvmeNamespace *ns, uint32_t nsid) +{ + NvmeSubsystem *subsys =3D n->subsys; + QDict *ns_cfg =3D NULL; + QList *ctrl_qlist =3D NULL; + Error *local_err =3D NULL; + int i; + + if (ns_auto_check(n, ns, nsid)) { + error_setg(&local_err, "ns-cfg not saved: ns[%"PRIu32"] configured= via '-device nvme-ns'", nsid); + error_report_err(local_err); + return 1; /* not an error */ + } + + ctrl_qlist =3D qlist_new(); + ns_cfg =3D qdict_new(); + + if (subsys) { + for (i =3D 0; i < ARRAY_SIZE(subsys->ctrls); i++) { + NvmeCtrl *ctrl =3D subsys->ctrls[i]; + + if (ctrl && nvme_ns(ctrl, nsid)) { + qlist_append_int(ctrl_qlist, i); + } + } + } + +#define NS_CFG_DEF(type, key, value, default) \ + qdict_put_##type(ns_cfg, key, value); +#include "hw/nvme/ns-cfg.h" +#undef NS_CFG_DEF + + return nsid_cfg_save(n->params.ns_directory, n->params.serial, ns_cfg,= nsid); +} + +static bool glist_exists_int(QList *qlist, int64_t value) +{ + QListEntry *entry; + + QLIST_FOREACH_ENTRY(qlist, entry) { + if (qnum_get_int(qobject_to(QNum, entry->value)) =3D=3D value) { + return true; + } + } + return false; +} + + +int ns_cfg_load(NvmeCtrl *n, NvmeNamespace *ns, uint32_t nsid) +{ + QObject *ns_cfg_obj =3D NULL; + QDict *ns_cfg =3D NULL; + QList *ctrl_qlist =3D NULL; + int ret =3D 0; + char *filename =3D NULL; + FILE *fp =3D NULL; + char buf[NS_CFG_MAXSIZE] =3D {}; + Error *local_err =3D NULL; + + if (ns_auto_check(n, ns, nsid)) { + error_setg(&local_err, "ns-cfg not loaded: ns[%"PRIu32"] configure= d via '-device nvme-ns'", nsid); + ret =3D 1; /* not an error */ + goto fail2; + } + + filename =3D ns_create_cfg_name(n, nsid, &local_err); + if (local_err) { + goto fail2; + } + + if (access(filename, F_OK)) { + error_setg(&local_err, "Missing ns-cfg file"); + goto fail2; + } + + fp =3D fopen(filename, "r"); + if (fp =3D=3D NULL) { + error_setg(&local_err, "open %s: %s", filename, + strerror(errno)); + goto fail2; + } + + if (fread(buf, sizeof(buf), 1, fp)) { + error_setg(&local_err, "Could not read ns-cfg"); + goto fail1; + } + + ns_cfg_obj =3D qobject_from_json(buf, NULL); + if (!ns_cfg_obj) { + error_setg(&local_err, "Could not parse the JSON for ns-cfg"); + goto fail1; + } + + ns_cfg =3D qobject_to(QDict, ns_cfg_obj); + + ns->params.nsid =3D (uint32_t)qdict_get_int_chkd(ns_cfg, "params.nsid"= , &local_err); /* (uint32_t) */ + if (local_err) { + goto fail1; + } + ctrl_qlist =3D qdict_get_qlist_chkd(ns_cfg, "attached_ctrls", &local_e= rr); /* (QList) */ + if (local_err) { + goto fail1; + } + ns->params.detached =3D !glist_exists_int(ctrl_qlist, n->cntlid); + ns->params.pi =3D (uint8_t)qdict_get_int_chkd(ns_cfg, "params.pi", &lo= cal_err); /* (uint8_t) */ + if (local_err) { + goto fail1; + } + ns->lbasz =3D (size_t)qdict_get_int_chkd(ns_cfg, "lbasz", &local_err);= /* (size_t) */ + if (local_err) { + goto fail1; + } + ns->id_ns.nsze =3D cpu_to_le64(qdict_get_int_chkd(ns_cfg, "id_ns.nsze"= , &local_err)); /* (uint64_t) */ + if (local_err) { + goto fail1; + } + ns->id_ns.ncap =3D cpu_to_le64(qdict_get_int_chkd(ns_cfg, "id_ns.ncap"= , &local_err)); /* (uint64_t) */ + if (local_err) { + goto fail1; + } + ns->id_ns.nuse =3D cpu_to_le64(qdict_get_int_chkd(ns_cfg, "id_ns.nuse"= , &local_err)); /* (uint64_t) */ + if (local_err) { + goto fail1; + } + ns->id_ns.nsfeat =3D (uint8_t)qdict_get_int_chkd(ns_cfg, "id_ns.nsfeat= ", &local_err); /* (uint8_t) */ + if (local_err) { + goto fail1; + } + ns->id_ns.flbas =3D (uint8_t)qdict_get_int_chkd(ns_cfg, "id_ns.flbas",= &local_err); /* (uint8_t) */ + if (local_err) { + goto fail1; + } + ns->id_ns.nmic =3D (uint8_t)qdict_get_int_chkd(ns_cfg, "id_ns.nmic", &= local_err); /* (uint8_t) */ + if (local_err) { + goto fail1; + } + + /* ns->size below will be overwritten after nvme_ns_backend_sanity_chk= () */ + ns->size =3D qdict_get_int_chkd(ns_cfg, "ns_size", &local_err); = /* (uint64_t) */ + if (local_err) { + goto fail1; + } + + /* it is expected that ns-cfg file is consistent with paired ns-img fi= le + * here is a simple check preventing against a crash */ + nvme_validate_flbas(ns->id_ns.flbas, &local_err); + +fail1: + fclose(fp); + +fail2: + if (local_err) { + error_report_err(local_err); + ret =3D !ret ? -1: ret; + } + + qobject_unref(ns_cfg_obj); + g_free(filename); + return ret; +} diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c index 62a1f97be0..06cc6c8c71 100644 --- a/hw/nvme/ns.c +++ b/hw/nvme/ns.c @@ -3,9 +3,11 @@ * * Copyright (c) 2019 CNEX Labs * Copyright (c) 2020 Samsung Electronics + * Copyright (c) 2022 Solidigm * * Authors: * Klaus Jensen + * Michael Kropaczek * * This work is licensed under the terms of the GNU GPL, version 2. See the * COPYING file in the top-level directory. @@ -55,6 +57,26 @@ void nvme_ns_init_format(NvmeNamespace *ns) id_ns->npda =3D id_ns->npdg =3D npdg - 1; } =20 +#define NVME_LBAF_DFLT_CNT 8 +#define NVME_LBAF_DFLT_SIZE 16 +static unsigned int ns_get_default_lbafs(void *lbafp) +{ + static const NvmeLBAF lbaf[NVME_LBAF_DFLT_SIZE] =3D { + [0] =3D { .ds =3D 9 }, + [1] =3D { .ds =3D 9, .ms =3D 8 }, + [2] =3D { .ds =3D 9, .ms =3D 16 }, + [3] =3D { .ds =3D 9, .ms =3D 64 }, + [4] =3D { .ds =3D 12 }, + [5] =3D { .ds =3D 12, .ms =3D 8 }, + [6] =3D { .ds =3D 12, .ms =3D 16 }, + [7] =3D { .ds =3D 12, .ms =3D 64 }, + }; + + memcpy(lbafp, &lbaf[0], sizeof(lbaf)); + + return NVME_LBAF_DFLT_CNT; +} + static int nvme_ns_init(NvmeNamespace *ns, Error **errp) { static uint64_t ns_count; @@ -64,6 +86,11 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp) uint16_t ms; int i; =20 + ms =3D ns->params.ms; + if (ms && NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)) { + return -1; + } + ns->csi =3D NVME_CSI_NVM; ns->status =3D 0x0; =20 @@ -89,7 +116,6 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp) id_ns->eui64 =3D cpu_to_be64(ns->params.eui64); =20 ds =3D 31 - clz32(ns->blkconf.logical_block_size); - ms =3D ns->params.ms; =20 id_ns->mc =3D NVME_ID_NS_MC_EXTENDED | NVME_ID_NS_MC_SEPARATE; =20 @@ -105,39 +131,25 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **er= rp) =20 ns->pif =3D ns->params.pif; =20 - static const NvmeLBAF lbaf[16] =3D { - [0] =3D { .ds =3D 9 }, - [1] =3D { .ds =3D 9, .ms =3D 8 }, - [2] =3D { .ds =3D 9, .ms =3D 16 }, - [3] =3D { .ds =3D 9, .ms =3D 64 }, - [4] =3D { .ds =3D 12 }, - [5] =3D { .ds =3D 12, .ms =3D 8 }, - [6] =3D { .ds =3D 12, .ms =3D 16 }, - [7] =3D { .ds =3D 12, .ms =3D 64 }, - }; - - ns->nlbaf =3D 8; + ns->nlbaf =3D ns_get_default_lbafs(&id_ns->lbaf); =20 - memcpy(&id_ns->lbaf, &lbaf, sizeof(lbaf)); - - for (i =3D 0; i < ns->nlbaf; i++) { - NvmeLBAF *lbaf =3D &id_ns->lbaf[i]; - if (lbaf->ds =3D=3D ds) { - if (lbaf->ms =3D=3D ms) { - id_ns->flbas |=3D i; - goto lbaf_found; + if (ms) { /* ms from params */ + for (i =3D 0; i < ns->nlbaf; i++) { + NvmeLBAF *lbaf =3D &id_ns->lbaf[i]; + if (lbaf->ds =3D=3D ds && lbaf->ms =3D=3D ms) { + id_ns->flbas |=3D i; + goto lbaf_found; } } + /* add non-standard lba format */ + id_ns->lbaf[ns->nlbaf].ds =3D ds; + id_ns->lbaf[ns->nlbaf].ms =3D ms; + ns->nlbaf++; + id_ns->flbas |=3D i; + } else { + i =3D NVME_ID_NS_FLBAS_INDEX(id_ns->flbas); } =20 - /* add non-standard lba format */ - id_ns->lbaf[ns->nlbaf].ds =3D ds; - id_ns->lbaf[ns->nlbaf].ms =3D ms; - ns->nlbaf++; - - id_ns->flbas |=3D i; - - lbaf_found: id_ns_nvm->elbaf[i] =3D (ns->pif & 0x3) << 7; id_ns->nlbaf =3D ns->nlbaf - 1; @@ -482,6 +494,122 @@ static int nvme_ns_check_constraints(NvmeNamespace *n= s, Error **errp) return 0; } =20 +static void nvme_ns_backend_sanity_chk(NvmeNamespace *ns, BlockBackend *bl= k, Error **errp) +{ + uint64_t ns_size_img =3D ns->size; + uint64_t ns_size_cfg =3D blk_getlength(blk); + + if (ns_size_cfg !=3D ns_size_img) { + error_setg(errp, "ns-backend sanity check for nsid [%"PRIu32"] fai= led", ns->params.nsid); + } +} + +void nvme_validate_flbas(uint8_t flbas, Error **errp) +{ + uint8_t nlbaf; + NvmeLBAF lbaf[NVME_LBAF_DFLT_SIZE]; + + nlbaf =3D ns_get_default_lbafs(&lbaf[0]); + flbas =3D NVME_ID_NS_FLBAS_INDEX(flbas); + if (flbas >=3D nlbaf) { + error_setg(errp, "FLBA size index is out of range, max supported [= %"PRIu8"]", nlbaf - 1); + } +} + +NvmeNamespace *nvme_ns_create(NvmeCtrl *n, uint32_t nsid, NvmeIdNsMgmt *id= _ns, Error **errp) +{ + NvmeCtrl *ctrl =3D NULL; /* primary controller */ + NvmeSubsystem *subsys =3D n->subsys; + NvmeNamespace *ns =3D NULL; + DeviceState *dev =3D NULL; + uint64_t nsze =3D le64_to_cpu(id_ns->nsze); + uint64_t ncap =3D le64_to_cpu(id_ns->ncap); + uint8_t flbas =3D id_ns->flbas; + uint8_t dps =3D id_ns->dps; + uint8_t nmic =3D id_ns->nmic; + NvmeLBAF lbaf[NVME_LBAF_DFLT_SIZE]; + size_t lbasz; + uint64_t image_size; + Error *local_err =3D NULL; + BlockBackend *blk =3D NULL; + + trace_pci_nvme_ns_create(nsid, nsze, ncap, flbas); + + flbas =3D NVME_ID_NS_FLBAS_INDEX(flbas); + + ns_get_default_lbafs(&lbaf[0]); + lbasz =3D 1 << lbaf[flbas].ds; + image_size =3D (lbasz + lbaf[flbas].ms) * nsze; + + dev =3D qdev_try_new(TYPE_NVME_NS); + if (!dev) { + error_setg(&local_err, "Unable to allocate ns QOM (dev)"); + goto fail; + } + + if (n->cntlid > 0 && !subsys) { + error_setg(&local_err, "Secondary controller without subsystem"); + goto fail2; + } + + ctrl =3D nvme_subsys_ctrl(subsys, 0); + + if (!ctrl) { + error_setg(&local_err, "Missing reference to primary controller"); + goto fail2; + } + + ns =3D NVME_NS(dev); + if (ns) { + ns->params.nsid =3D nsid; + ns->params.detached =3D true; + ns->params.pi =3D dps; + ns->id_ns.nsfeat =3D 0x0; /* reporting no support for THINP */ + ns->lbasz =3D lbasz; + ns->id_ns.flbas =3D id_ns->flbas; + ns->id_ns.nsze =3D cpu_to_le64(nsze); + ns->id_ns.ncap =3D cpu_to_le64(ncap); + ns->id_ns.nuse =3D cpu_to_le64(ncap); /* at this time no usage= recording */ + ns->id_ns.nmic =3D nmic; + + blk =3D ctrl->preloaded_blk[nsid]; + if (blk) { + ns_blockdev_activate(blk, image_size, &local_err); + if (local_err) { + goto fail2; + } + ns->blkconf.blk =3D blk; + qdev_realize_and_unref(dev, &n->bus.parent_bus, &local_err); = /* causes by extension + = * a call to + = * nvme_ns_realize() */ + if (local_err) { + goto fail2; + } + } else { + error_setg(&local_err, "Unable to find preloaded back-end refe= rence"); + goto fail2; + } + + if (ns_cfg_save(n, ns, nsid)) { /* save ns cfg */ + error_setg(&local_err, "Unable to save ns-cnf, an orphaned ns[= %"PRIu32"] will be left behind", nsid); + goto fail; + } + return ns; + } + + error_setg(&local_err, "No namespace reference from QOM (dev) for [%"P= RIu32"]", nsid); + +fail2: + if (blk) { + ctrl->preloaded_blk[nsid] =3D NULL; + } + object_unref(OBJECT(dev)); + +fail: + error_propagate(errp, local_err); + return NULL; +} + int nvme_ns_setup(NvmeNamespace *ns, Error **errp) { if (nvme_ns_check_constraints(ns, errp)) { @@ -505,6 +633,87 @@ int nvme_ns_setup(NvmeNamespace *ns, Error **errp) return 0; } =20 +static void nvme_ns_add_bootindex(Object *obj); +int nvme_ns_backend_setup(NvmeCtrl *n, Error **errp) +{ + DeviceState *dev =3D NULL; + NvmeCtrl *ctrl =3D NULL; + NvmeSubsystem *subsys =3D n->subsys; + BlockBackend *blk; + NvmeNamespace *ns; + NvmeNamespace *ns_p =3D NULL; + uint16_t i; + int ret =3D 0; + char *exact_filename; + Error *local_err =3D NULL; + + if (n->cntlid > 0) { /* secondary controller */ + ctrl =3D nvme_subsys_ctrl(subsys, 0); + } + + for (i =3D 1; i <=3D NVME_MAX_NAMESPACES && !local_err; i++ ) { + blk =3D NULL; + exact_filename =3D ns_create_image_name(n, i, &local_err); + if (local_err) { + break; + } + if (access(exact_filename, F_OK)) { /* skip if not found */ + g_free(exact_filename); + continue; + } + + if (ctrl) { /* reference from subsys to primary controller */ + ns_p =3D nvme_subsys_ns(subsys, i); + if (!ns_p) { + continue; + } + blk =3D ctrl->preloaded_blk[i]; + } else { + blk =3D ns_blockdev_init(exact_filename, &local_err); + } + + g_free(exact_filename); + + if (blk_getlength(blk)) { /* namespace was created in a prev= ious QEMU session */ + dev =3D qdev_try_new(TYPE_NVME_NS); + if (!dev) { + error_setg(&local_err, "Unable to create a new device entr= y"); + break; + } + + ns =3D NVME_NS(dev); + if (ns) { + if (ns_cfg_load(n, ns, i) =3D=3D -1) { /* load ns cfg = */ + error_setg(&local_err, "Unable to load ns-cfg for ns [= %"PRIu16"]", i); + break; + } + nvme_ns_backend_sanity_chk(ns, blk, &local_err); + if (local_err) { + break; + } + ns->blkconf.blk =3D blk; + qdev_realize_and_unref(dev, &n->bus.parent_bus, &local_err= ); /* causes by extension + = * a call to + = * nvme_ns_realize() */ + } + } + + if (!ctrl) { /* if primary controller */ + n->preloaded_blk[i] =3D blk; + n->id_ctrl.nn =3D cpu_to_le32(i); + } + } + + if (local_err) { + error_propagate(errp, local_err); + ret =3D -1; + } else if (ctrl) { + n->id_ctrl.nn =3D ctrl->id_ctrl.nn; + } + + return ret; +} + void nvme_ns_drain(NvmeNamespace *ns) { blk_drain(ns->blkconf.blk); @@ -563,11 +772,13 @@ static void nvme_ns_realize(DeviceState *dev, Error *= *errp) } } =20 + nvme_ns_add_bootindex(OBJECT(dev)); + if (nvme_ns_setup(ns, errp)) { return; } =20 - if (!nsid) { + if (!nsid) { /* legacy implementation without params.nsid specified= */ for (i =3D 1; i <=3D NVME_MAX_NAMESPACES; i++) { if (nvme_ns(n, i) || nvme_subsys_ns(subsys, i)) { continue; @@ -582,7 +793,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **e= rrp) return; } } else { - if (nvme_ns(n, nsid) || nvme_subsys_ns(subsys, nsid)) { + if (nvme_ns(n, nsid) || (!n->params.ns_directory && nvme_subsys_ns= (subsys, nsid))) { error_setg(errp, "namespace id '%d' already allocated", nsid); return; } @@ -596,14 +807,18 @@ static void nvme_ns_realize(DeviceState *dev, Error *= *errp) } =20 if (ns->params.shared) { - for (i =3D 0; i < ARRAY_SIZE(subsys->ctrls); i++) { - NvmeCtrl *ctrl =3D subsys->ctrls[i]; - - if (ctrl && ctrl !=3D SUBSYS_SLOT_RSVD) { - nvme_attach_ns(ctrl, ns); + if (!n->params.ns_directory) { + /* legacy implementation */ + for (i =3D 0; i < ARRAY_SIZE(subsys->ctrls); i++) { + NvmeCtrl *ctrl =3D subsys->ctrls[i]; + + if (ctrl && ctrl !=3D SUBSYS_SLOT_RSVD) { + nvme_attach_ns(ctrl, ns); + } } + } else { + nvme_attach_ns(n, ns); } - return; } } @@ -660,7 +875,7 @@ static void nvme_ns_class_init(ObjectClass *oc, void *d= ata) dc->desc =3D "Virtual NVMe namespace"; } =20 -static void nvme_ns_instance_init(Object *obj) +static void nvme_ns_add_bootindex(Object *obj) { NvmeNamespace *ns =3D NVME_NS(obj); char *bootindex =3D g_strdup_printf("/namespace@%d,0", ns->params.nsid= ); @@ -671,6 +886,10 @@ static void nvme_ns_instance_init(Object *obj) g_free(bootindex); } =20 +static void nvme_ns_instance_init(Object *obj) +{ +} + static const TypeInfo nvme_ns_info =3D { .name =3D TYPE_NVME_NS, .parent =3D TYPE_DEVICE, diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 7adf042ec3..4df57096ef 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -23,9 +23,8 @@ #include "hw/block/block.h" =20 #include "block/nvme.h" +#include "hw/nvme/ctrl-cfg.h" =20 -#define NVME_MAX_CONTROLLERS 256 -#define NVME_MAX_NAMESPACES 256 #define NVME_EUI64_DEFAULT ((uint64_t)0x5254000000000000) =20 QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > NVME_NSID_BROADCAST - 1); @@ -279,6 +278,8 @@ int nvme_ns_setup(NvmeNamespace *ns, Error **errp); void nvme_ns_drain(NvmeNamespace *ns); void nvme_ns_shutdown(NvmeNamespace *ns); void nvme_ns_cleanup(NvmeNamespace *ns); +void nvme_validate_flbas(uint8_t flbas, Error **errp); +NvmeNamespace *nvme_ns_create(NvmeCtrl *n, uint32_t nsid, NvmeIdNsMgmt *id= _ns, Error **errp); =20 typedef struct NvmeAsyncEvent { QTAILQ_ENTRY(NvmeAsyncEvent) entry; @@ -339,6 +340,7 @@ static inline const char *nvme_adm_opc_str(uint8_t opc) case NVME_ADM_CMD_SET_FEATURES: return "NVME_ADM_CMD_SET_FEATURES"; case NVME_ADM_CMD_GET_FEATURES: return "NVME_ADM_CMD_GET_FEATURES"; case NVME_ADM_CMD_ASYNC_EV_REQ: return "NVME_ADM_CMD_ASYNC_EV_REQ"; + case NVME_ADM_CMD_NS_MGMT: return "NVME_ADM_CMD_NS_MGMT"; case NVME_ADM_CMD_NS_ATTACHMENT: return "NVME_ADM_CMD_NS_ATTACHMENT= "; case NVME_ADM_CMD_VIRT_MNGMT: return "NVME_ADM_CMD_VIRT_MNGMT"; case NVME_ADM_CMD_DBBUF_CONFIG: return "NVME_ADM_CMD_DBBUF_CONFIG"; @@ -427,6 +429,7 @@ typedef struct NvmeParams { uint16_t sriov_vi_flexible; uint8_t sriov_max_vq_per_vf; uint8_t sriov_max_vi_per_vf; + char *ns_directory; /* if empty (default) one legacy ns will b= e created */ } NvmeParams; =20 typedef struct NvmeCtrl { @@ -485,8 +488,9 @@ typedef struct NvmeCtrl { =20 NvmeSubsystem *subsys; =20 - NvmeNamespace namespace; + NvmeNamespace namespace; /* if ns_directory is empt= y this will be used */ NvmeNamespace *namespaces[NVME_MAX_NAMESPACES + 1]; + BlockBackend *preloaded_blk[NVME_MAX_NAMESPACES + 1]; NvmeSQueue **sq; NvmeCQueue **cq; NvmeSQueue admin_sq; @@ -575,6 +579,9 @@ static inline NvmeSecCtrlEntry *nvme_sctrl_for_cntlid(N= vmeCtrl *n, return NULL; } =20 +BlockBackend *ns_blockdev_init(const char *file, Error **errp); +void ns_blockdev_activate(BlockBackend *blk, uint64_t image_size, Error *= *errp); +int nvme_ns_backend_setup(NvmeCtrl *n, Error **errp); void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns); uint16_t nvme_bounce_data(NvmeCtrl *n, void *ptr, uint32_t len, NvmeTxDirection dir, NvmeRequest *req); @@ -583,5 +590,22 @@ uint16_t nvme_bounce_mdata(NvmeCtrl *n, void *ptr, uin= t32_t len, void nvme_rw_complete_cb(void *opaque, int ret); uint16_t nvme_map_dptr(NvmeCtrl *n, NvmeSg *sg, size_t len, NvmeCmd *cmd); +char *ns_create_image_name(NvmeCtrl *n, uint32_t nsid, Error **errp); +int ns_storage_path_check(NvmeCtrl *n, Error **errp); +int ns_auto_check(NvmeCtrl *n, NvmeNamespace *ns, uint32_t nsid); +int ns_cfg_save(NvmeCtrl *n, NvmeNamespace *ns, uint32_t nsid); +int ns_cfg_load(NvmeCtrl *n, NvmeNamespace *ns, uint32_t nsid); +int64_t qdict_get_int_chkd(const QDict *qdict, const char *key, Error **er= rp); +QList *qdict_get_qlist_chkd(const QDict *qdict, const char *key, Error **e= rrp); +void ns_cfg_clear(NvmeNamespace *ns); +int nvme_cfg_save(NvmeCtrl *n); +int nvme_cfg_load(NvmeCtrl *n); + +typedef enum NvmeNsAllocAction { + NVME_NS_ALLOC_CHK, + NVME_NS_ALLOC, + NVME_NS_DEALLOC, +} NvmeNsAllocAction; +int nvme_cfg_update(NvmeCtrl *n, uint64_t ammount, NvmeNsAllocAction actio= n); =20 #endif /* HW_NVME_NVME_H */ diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c index 9d2643678b..5f4bb0e6a2 100644 --- a/hw/nvme/subsys.c +++ b/hw/nvme/subsys.c @@ -87,10 +87,13 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp) =20 subsys->ctrls[cntlid] =3D n; =20 - for (nsid =3D 1; nsid < ARRAY_SIZE(subsys->namespaces); nsid++) { - NvmeNamespace *ns =3D subsys->namespaces[nsid]; - if (ns && ns->params.shared && !ns->params.detached) { - nvme_attach_ns(n, ns); + if (!n->params.ns_directory) { + /* legacy implementation */ + for (nsid =3D 1; nsid < ARRAY_SIZE(subsys->namespaces); nsid++) { + NvmeNamespace *ns =3D subsys->namespaces[nsid]; + if (ns && ns->params.shared && !ns->params.detached) { + nvme_attach_ns(n, ns); + } } } =20 diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events index fccb79f489..28b025ac42 100644 --- a/hw/nvme/trace-events +++ b/hw/nvme/trace-events @@ -77,6 +77,8 @@ pci_nvme_aer(uint16_t cid) "cid %"PRIu16"" pci_nvme_aer_aerl_exceeded(void) "aerl exceeded" pci_nvme_aer_masked(uint8_t type, uint8_t mask) "type 0x%"PRIx8" mask 0x%"= PRIx8"" pci_nvme_aer_post_cqe(uint8_t typ, uint8_t info, uint8_t log_page) "type 0= x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8"" +pci_nvme_ns_mgmt(uint16_t cid, uint32_t nsid, uint8_t sel, uint8_t csi, ui= nt8_t psdt) "cid %"PRIu16", nsid=3D%"PRIu32", sel=3D0x%"PRIx8", csi=3D0x%"P= RIx8", psdt=3D0x%"PRIx8"" +pci_nvme_ns_create(uint16_t nsid, uint64_t nsze, uint64_t ncap, uint8_t fl= bas) "nsid %"PRIu16", nsze=3D%"PRIu64", ncap=3D%"PRIu64", flbas=3D%"PRIu8"" pci_nvme_ns_attachment(uint16_t cid, uint8_t sel) "cid %"PRIu16", sel=3D0x= %"PRIx8"" pci_nvme_ns_attachment_attach(uint16_t cntlid, uint32_t nsid) "cntlid=3D0x= %"PRIx16", nsid=3D0x%"PRIx32"" pci_nvme_enqueue_event(uint8_t typ, uint8_t info, uint8_t log_page) "type = 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8"" diff --git a/include/block/nvme.h b/include/block/nvme.h index 8027b7126b..9d2e121f1a 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -592,6 +592,7 @@ enum NvmeAdminCommands { NVME_ADM_CMD_SET_FEATURES =3D 0x09, NVME_ADM_CMD_GET_FEATURES =3D 0x0a, NVME_ADM_CMD_ASYNC_EV_REQ =3D 0x0c, + NVME_ADM_CMD_NS_MGMT =3D 0x0d, NVME_ADM_CMD_ACTIVATE_FW =3D 0x10, NVME_ADM_CMD_DOWNLOAD_FW =3D 0x11, NVME_ADM_CMD_NS_ATTACHMENT =3D 0x15, @@ -897,14 +898,18 @@ enum NvmeStatusCodes { NVME_FEAT_NOT_CHANGEABLE =3D 0x010e, NVME_FEAT_NOT_NS_SPEC =3D 0x010f, NVME_FW_REQ_SUSYSTEM_RESET =3D 0x0110, + NVME_NS_INSUFFICIENT_CAPAC =3D 0x0115, + NVME_NS_IDNTIFIER_UNAVAIL =3D 0x0116, NVME_NS_ALREADY_ATTACHED =3D 0x0118, NVME_NS_PRIVATE =3D 0x0119, NVME_NS_NOT_ATTACHED =3D 0x011a, + NVME_THIN_PROVISION_NOTSPRD =3D 0x011b, NVME_NS_CTRL_LIST_INVALID =3D 0x011c, NVME_INVALID_CTRL_ID =3D 0x011f, NVME_INVALID_SEC_CTRL_STATE =3D 0x0120, NVME_INVALID_NUM_RESOURCES =3D 0x0121, NVME_INVALID_RESOURCE_ID =3D 0x0122, + NVME_NS_ATTACH_MGMT_NOTSPRD =3D 0x0129, NVME_CONFLICTING_ATTRS =3D 0x0180, NVME_INVALID_PROT_INFO =3D 0x0181, NVME_WRITE_TO_RO =3D 0x0182, @@ -1184,6 +1189,10 @@ enum NvmeIdCtrlCmic { NVME_CMIC_MULTI_CTRL =3D 1 << 1, }; =20 +enum NvmeNsManagementOperation { + NVME_NS_MANAGEMENT_CREATE =3D 0x0, +}; + enum NvmeNsAttachmentOperation { NVME_NS_ATTACHMENT_ATTACH =3D 0x0, NVME_NS_ATTACHMENT_DETACH =3D 0x1, @@ -1345,6 +1354,26 @@ typedef struct QEMU_PACKED NvmeIdNs { uint8_t vs[3712]; } NvmeIdNs; =20 +typedef struct QEMU_PACKED NvmeIdNsMgmt { + uint64_t nsze; + uint64_t ncap; + uint8_t rsvd16[10]; + uint8_t flbas; + uint8_t rsvd27[2]; + uint8_t dps; + uint8_t nmic; + uint8_t rsvd31[61]; + uint32_t anagrpid; + uint8_t rsvd96[4]; + uint16_t nvmsetid; + uint16_t endgid; + uint8_t rsvd104[280]; + uint64_t lbstm; + uint8_t rsvd392[120]; + uint8_t rsvd512[512]; + uint8_t vs[3072]; +} NvmeIdNsMgmt; + #define NVME_ID_NS_NVM_ELBAF_PIF(elbaf) (((elbaf) >> 7) & 0x3) =20 typedef struct QEMU_PACKED NvmeIdNsNvm { @@ -1646,6 +1675,7 @@ static inline void _nvme_check_size(void) QEMU_BUILD_BUG_ON(sizeof(NvmeLBAF) !=3D 4); QEMU_BUILD_BUG_ON(sizeof(NvmeLBAFE) !=3D 16); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNs) !=3D 4096); + QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsMgmt) !=3D 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsNvm) !=3D 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsZoned) !=3D 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeSglDescriptor) !=3D 16); diff --git a/include/hw/nvme/ctrl-cfg.h b/include/hw/nvme/ctrl-cfg.h new file mode 100644 index 0000000000..1be44cb8df --- /dev/null +++ b/include/hw/nvme/ctrl-cfg.h @@ -0,0 +1,24 @@ +/* + * QEMU NVM Express Virtual Dynamic Namespace Management + * Common configuration handling for qemu-img tool and and qemu-system-xx + * + * + * Copyright (c) 2022 Solidigm + * + * Authors: + * Michael Kropaczek + * + * This work is licensed under the terms of the GNU GPL, version 2. See the + * COPYING file in the top-level directory. + * + */ + +#ifndef CTRL_CFG_DEF +#define NVME_STR_(s) #s +#define NVME_STRINGIFY(s) NVME_STR_(s) +#define NVME_MAX_NAMESPACES 256 +#define NVME_MAX_CONTROLLERS 256 +#else +CTRL_CFG_DEF(int, "tnvmcap", int128_get64(tnvmcap128), tnvmcap64) +CTRL_CFG_DEF(int, "unvmcap", int128_get64(unvmcap128), unvmcap64) +#endif diff --git a/include/hw/nvme/ns-cfg.h b/include/hw/nvme/ns-cfg.h new file mode 100644 index 0000000000..5f27a6cece --- /dev/null +++ b/include/hw/nvme/ns-cfg.h @@ -0,0 +1,28 @@ +/* + * QEMU NVM Express Virtual Dynamic Namespace Management + * Common configuration handling for qemu-img tool and qemu-system-xx + * + * + * Copyright (c) 2022 Solidigm + * + * Authors: + * Michael Kropaczek + * + * This work is licensed under the terms of the GNU GPL, version 2. See the + * COPYING file in the top-level directory. + * + */ + +#ifdef NS_CFG_DEF +NS_CFG_DEF(int, "params.nsid", (int64_t)ns->params.nsid, nsid) +NS_CFG_DEF(obj, "attached_ctrls", QOBJECT(ctrl_qlist), QOBJECT(ctrl_qlist)) +NS_CFG_DEF(int, "params.pi", (int64_t)ns->params.pi, 0) +NS_CFG_DEF(int, "lbasz", (int64_t)ns->lbasz, 0) +NS_CFG_DEF(int, "id_ns.nsze", le64_to_cpu(ns->id_ns.nsze), 0) +NS_CFG_DEF(int, "id_ns.ncap", le64_to_cpu(ns->id_ns.ncap), 0) +NS_CFG_DEF(int, "id_ns.nuse", le64_to_cpu(ns->id_ns.nuse), 0) +NS_CFG_DEF(int, "id_ns.nsfeat", (int64_t)ns->id_ns.nsfeat, 0) +NS_CFG_DEF(int, "id_ns.flbas", (int64_t)ns->id_ns.flbas, 0) +NS_CFG_DEF(int, "id_ns.nmic", (int64_t)ns->id_ns.nmic, 0) +NS_CFG_DEF(int, "ns_size", ns->size, 0) +#endif diff --git a/include/hw/nvme/nvme-cfg.h b/include/hw/nvme/nvme-cfg.h new file mode 100644 index 0000000000..cd1bab24d3 --- /dev/null +++ b/include/hw/nvme/nvme-cfg.h @@ -0,0 +1,168 @@ +/* + * QEMU NVM Express Virtual Dynamic Namespace Management + * Common configuration handling for qemu-img tool and qemu-system-xx + * + * + * Copyright (c) 2022 Solidigm + * + * Authors: + * Michael Kropaczek + * + * This work is licensed under the terms of the GNU GPL, version 2. See the + * COPYING file in the top-level directory. + * + */ + +#include "hw/nvme/ctrl-cfg.h" + +#define NS_CFG_MAXSIZE 1024 +#define NS_FILE_FMT "%s/nvme_%s_ns_%03d" +#define NS_IMG_EXT ".img" +#define NS_CFG_EXT ".cfg" +#define NS_CFG_TYPE "ns-cfg" + +#define NVME_FILE_FMT "%s/nvme_%s_ctrl" +#define NVME_CFG_EXT ".cfg" +#define NVME_CFG_TYPE "ctrl-cfg" + +#define NVME_CFG_MAXSIZE 512 +static inline int storage_path_check(char *ns_directory, char *serial, Err= or **errp) +{ + if (access(ns_directory, F_OK)) { + error_setg(errp, + "Path '%s' to nvme controller's storage area with= serial no: '%s' must exist", + ns_directory, serial); + return -1; + } + + return 0; +} + + +static inline char *c_create_cfg_name(char *ns_directory, char *serial, Er= ror **errp) +{ + char *file_name =3D NULL; + + if (!storage_path_check(ns_directory, serial, errp)) { + file_name =3D g_strdup_printf(NVME_FILE_FMT NVME_CFG_EXT, + ns_directory, serial); + } + + return file_name; +} + +static inline char *create_fmt_name(const char *fmt, char *ns_directory, c= har *serial, uint32_t nsid, Error **errp) +{ + char *file_name =3D NULL; + + if (!storage_path_check(ns_directory, serial, errp)) { + file_name =3D g_strdup_printf(fmt, ns_directory, serial, nsid); + } + + return file_name; +} + +static inline char *create_cfg_name(char *ns_directory, char *serial, uint= 32_t nsid, Error **errp) +{ + return create_fmt_name(NS_FILE_FMT NS_CFG_EXT, ns_directory, serial, n= sid, errp); +} + + +static inline char *create_image_name(char *ns_directory, char *serial, ui= nt32_t nsid, Error **errp) +{ + return create_fmt_name(NS_FILE_FMT NS_IMG_EXT, ns_directory, serial, n= sid, errp); +} + +static inline int cfg_save(char *ns_directory, char *serial, QDict *cfg, + const char *cfg_type, char *filename, size_t = maxsize) +{ + GString *json =3D NULL; + FILE *fp; + int ret =3D 0; + Error *local_err =3D NULL; + + json =3D qobject_to_json_pretty(QOBJECT(cfg), false); + + if (strlen(json->str) + 2 /* '\n'+'\0' */ > maxsize) { + error_setg(&local_err, "%s allowed max size %ld exceeded", cfg_typ= e, maxsize); + goto fail; + } + + if (filename) { + fp =3D fopen(filename, "w"); + if (fp =3D=3D NULL) { + error_setg(&local_err, "open %s: %s", filename, + strerror(errno)); + } else { + chmod(filename, 0644); + if (!fprintf(fp, "%s\n", json->str)) { + error_setg(&local_err, "could not write %s %s: %s", cfg_ty= pe, filename, + strerror(errno)); + } + fclose(fp); + } + } + +fail: + if (local_err) { + error_report_err(local_err); + ret =3D -1; + } + + g_string_free(json, true); + g_free(filename); + qobject_unref(cfg); + + return ret; +} + +static inline int nsid_cfg_save(char *ns_directory, char *serial, QDict *n= s_cfg, uint32_t nsid) +{ + Error *local_err =3D NULL; + char *filename =3D create_cfg_name(ns_directory, serial, nsid, &local_= err); + + if (local_err) { + error_report_err(local_err); + return -1; + } + + return cfg_save(ns_directory, serial, ns_cfg, NS_CFG_TYPE, filename, N= S_CFG_MAXSIZE); +} + +static inline int ns_cfg_default_save(char *ns_directory, char *serial, ui= nt32_t nsid) +{ + QDict *ns_cfg =3D qdict_new(); + QList *ctrl_qlist =3D qlist_new(); + +#define NS_CFG_DEF(type, key, value, default) \ + qdict_put_##type(ns_cfg, key, default); +#include "hw/nvme/ns-cfg.h" +#undef NS_CFG_DEF + + return nsid_cfg_save(ns_directory, serial, ns_cfg, nsid); +} + +static inline int c_cfg_save(char *ns_directory, char *serial, QDict *nvme= _cfg) +{ + Error *local_err =3D NULL; + char *filename =3D c_create_cfg_name(ns_directory, serial, &local_err); + + if (local_err) { + error_report_err(local_err); + return -1; + } + + return cfg_save(ns_directory, serial, nvme_cfg, NVME_CFG_TYPE, filenam= e, NVME_CFG_MAXSIZE); +} + +static inline int c_cfg_default_save(char *ns_directory, char *serial, uin= t64_t tnvmcap64, uint64_t unvmcap64) +{ + QDict *nvme_cfg =3D qdict_new(); + +#define CTRL_CFG_DEF(type, key, value, default) \ + qdict_put_##type(nvme_cfg, key, default); +#include "hw/nvme/ctrl-cfg.h" +#undef CTRL_CFG_DEF + + return c_cfg_save(ns_directory, serial, nvme_cfg); +} diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx index 1b1dab5b17..9aacb88fc9 100644 --- a/qemu-img-cmds.hx +++ b/qemu-img-cmds.hx @@ -57,6 +57,12 @@ SRST .. option:: create [--object OBJECTDEF] [-q] [-f FMT] [-b BACKING_FILE [-F= BACKING_FMT]] [-u] [-o OPTIONS] FILENAME [SIZE] ERST =20 +DEF("createns", nsimgs_create, + "createns -S nvme_ctrl_serial_number -C nvme_ctrl_total_capacity [-N <= NsId_max>] pathname") +SRST +.. option:: createns -S SERIAL_NUMBER -C TOTAL_CAPACITY [-N NSID_MAX] PATH= NAME +ERST + DEF("dd", img_dd, "dd [--image-opts] [-U] [-f fmt] [-O output_fmt] [bs=3Dblock_size] [co= unt=3Dblocks] [skip=3Dblocks] if=3Dinput of=3Doutput") SRST diff --git a/qemu-img.c b/qemu-img.c index 439d8de1e3..d54f86942f 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -34,6 +34,7 @@ #include "qapi/qobject-output-visitor.h" #include "qapi/qmp/qjson.h" #include "qapi/qmp/qdict.h" +#include "qapi/qmp/qlist.h" #include "qemu/cutils.h" #include "qemu/config-file.h" #include "qemu/option.h" @@ -49,10 +50,12 @@ #include "block/block_int.h" #include "block/blockjob.h" #include "block/qapi.h" +#include "block/qdict.h" #include "crypto/init.h" #include "trace/control.h" #include "qemu/throttle.h" #include "block/throttle-groups.h" +#include "hw/nvme/nvme-cfg.h" =20 #define QEMU_IMG_VERSION "qemu-img version " QEMU_FULL_VERSION \ "\n" QEMU_COPYRIGHT "\n" @@ -219,6 +222,14 @@ void help(void) " '-F' second image format\n" " '-s' run in Strict mode - fail on different image size or se= ctor allocation\n" "\n" + "Parameters to createns subcommand:\n" + " 'pathname' points to the storage area for namespaces backend= images and must exist,\n" + " and must match the -device nvme 'auto-ns-path=3D...' of th= e qemu-system-xx command\n" + " '-S' indicates NVMe serial number, must match the -device nv= me 'serial=3D...' of the qemu-system-xx command\n" + " '-C' indicates NVMe total capacity\n" + " '-N' sets a limit on possible NVMe namespaces number associa= ted with NVMe controller,\n" + " the default maximal value is " NVME_STRINGIFY(NVME_MAX= _NAMESPACES) " and cannot be exceeded\n" + "\n" "Parameters to dd subcommand:\n" " 'bs=3DBYTES' read and write up to BYTES bytes at a time " "(default: 512)\n" @@ -603,6 +614,127 @@ fail: return 1; } =20 +static int nsimgs_create(int argc, char **argv) +{ + int c; + char *auto_ns_path =3D NULL; + char *serial =3D NULL; + char *nsidMax =3D NULL; + char *tnvmcap =3D NULL; + uint64_t tnvmcap64 =3D 0L; + unsigned int nsidMaxi =3D NVME_MAX_NAMESPACES; + char *filename =3D NULL; + uint32_t i; + Error *local_err =3D NULL; + + for(;;) { + static const struct option long_options[] =3D { + {"help", no_argument, 0, 'h'}, + {"serial", required_argument, 0, 'S'}, + {"tnvmcap", required_argument, 0, 'C'}, + {"nsidmax", required_argument, 0, 'N'}, + {0, 0, 0, 0} + }; + c =3D getopt_long(argc, argv, "S:C:N:", + long_options, NULL); + if (c =3D=3D -1) { + break; + } + switch(c) { + case ':': + missing_argument(argv[optind - 1]); + break; + case '?': + unrecognized_option(argv[optind - 1]); + break; + case 'h': + help(); + break; + case 'S': + serial =3D optarg; + break; + case 'N': + nsidMax =3D optarg; + break; + case 'C': + tnvmcap =3D optarg; + break; + } + } + + if (optind >=3D argc) { + error_exit("Expecting path name"); + } + + if (!serial || !tnvmcap) { + error_exit("Both -S and -C must be specified"); + } + + tnvmcap64 =3D cvtnum_full("tnvmcap", tnvmcap, 0, INT64_MAX); + + if (nsidMax && (qemu_strtoui(nsidMax, NULL, 0, &nsidMaxi) < 0 || + nsidMaxi > NVME_MAX_NAMESPACES)) { + error_exit("-N 'NsIdMax' must be numeric and cannot exceed %d", + NVME_MAX_NAMESPACES); + } + + auto_ns_path =3D (optind < argc) ? argv[optind] : NULL; + + /* create backend images and config flles for namespaces */ + for (i =3D 1; !local_err && i <=3D NVME_MAX_NAMESPACES; i++) { + filename =3D create_image_name(auto_ns_path, serial, i, &local_err= ); + if (local_err) { + break; + } + + /* calling bdrv_img_create() in both cases if i <=3D nsidMaxi and = othewise, + * it checks shared resize permission, if likely locked by qemu-sy= stem-xx + * it will abort */ + bdrv_img_create(filename, "raw", NULL, NULL, NULL, + 0, BDRV_O_RDWR | BDRV_O_RESIZE, + true, &local_err); + if (local_err) { + break; + } + + if (i <=3D nsidMaxi) { /* backend image file was created */ + if (ns_cfg_default_save(auto_ns_path, serial, i)) { /* create + * namespa= ce + * config = file */ + break; + } + } else if (filename && !access(filename, F_OK)) { /* reducing the = number of files + * if i > nsidMa= xi */ + unlink(filename); + g_free(filename); + filename =3D create_cfg_name(auto_ns_path, serial, i, &local_e= rr); + if (local_err) { + break; + } + unlink(filename); + } + g_free(filename); + filename =3D NULL; + } + + if (local_err && filename) { + error_reportf_err(local_err, "Could not create ns-image [%s] ", + filename); + g_free(filename); + return 1; + } else if (c_cfg_default_save(auto_ns_path, serial, + tnvmcap64, tnvmcap64)) { /* create contr= oller + * config file = */ + error_reportf_err(local_err, "Could not create nvme-cfg "); + return 1; + } else if (local_err) { + error_report_err(local_err); + return 1; + } + + return 0; +} + static void dump_json_image_check(ImageCheck *check, bool quiet) { GString *str; --=20 2.37.3 From nobody Thu May 16 15:20:58 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=linux.dev ARC-Seal: i=1; a=rsa-sha256; t=1672689381; cv=none; d=zohomail.com; s=zohoarc; b=ndqR5GIOUjCfPt/rV/9dWqvOo4am4WmOSiyKQH1RGBmo5PlwiOGACTGNbMn3+UXXC6SsIiJjNvGPxbekvCa6ypic4LswkRtPdxUC4gEVGxAEUQB+g7Az1aztefmb9SS7fXUR0sOPuu1Akx/ZFfHCPAWAi6FrSYB4+ERoGCtDAVI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1672689381; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=RdOBA5DYYdMruHREGKhUjSzTLb2Irj4Oj7CAwbp6CoY=; b=IvkO8jPCQfwGwJRSbdEwpRSK0ZeNRDb90sat2l7plwoua9fZVj5uGiyglvUGgziSx/AdCpc3vGxOhmrkEKNkMmev2C7Vb7mGWMt0krGRFRCrbhsBDQFodZXfxFFeeUns29ykqN/lX3vM+liN8nzZVQv19Se5FmebfERTDOQy2x8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1672689381420163.80503150286881; Mon, 2 Jan 2023 11:56:21 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pCQuE-0005Cv-8m; Mon, 02 Jan 2023 14:55:34 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pCQu0-000527-53 for qemu-devel@nongnu.org; Mon, 02 Jan 2023 14:55:20 -0500 Received: from resqmta-h1p-028592.sys.comcast.net ([2001:558:fd02:2446::5]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pCQtv-0007g1-Mo for qemu-devel@nongnu.org; Mon, 02 Jan 2023 14:55:19 -0500 Received: from resomta-h1p-027919.sys.comcast.net ([96.102.179.208]) by resqmta-h1p-028592.sys.comcast.net with ESMTP id CMnupB6sxGwAVCQtqp12RN; Mon, 02 Jan 2023 19:55:10 +0000 Received: from localhost.localdomain ([71.205.181.50]) by resomta-h1p-027919.sys.comcast.net with ESMTPA id CQtMpowOQrMVsCQtWpVN6C; Mon, 02 Jan 2023 19:54:51 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20211018a; t=1672689310; bh=RdOBA5DYYdMruHREGKhUjSzTLb2Irj4Oj7CAwbp6CoY=; h=Received:Received:From:To:Subject:Date:Message-Id:MIME-Version: Xfinity-Spam-Result; b=n2oDPQOAItpi4KkuUNUvo+9GeBAluuOUGNEEUuFdag537qOrsweL+vbVukdToUmJ0 OWbBSCk9NpNIkxcTLZT5KV/rdvJfno/MNEBn2M5mxxvF4zCDsKmWSoGEaF/6X2qTH5 S7OUaHRfea5pDWZbZLkZ+hf1XnjEtIU+FQkVmTk650gCiZblRrl0zCU6ne9lftg5kp I3pp5uft8b4y3LNz9uO+dJyx2OcqmOn/gEaHuQuXkpFSzFgpyadfGPmn941Gzpan9Y b1EVvPXmpWmHKkETf/JvZdQdD5J4+3j+8/3sNPQRc8kYd6oogm+RXPvSggwXamih8M JXdiXOCYR2Zrg== X-Xfinity-VAAS: gggruggvucftvghtrhhoucdtuddrgedvhedrjedvgddufedvucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuvehomhgtrghsthdqtfgvshhipdfqfgfvpdfpqffurfetoffkrfenuceurghilhhouhhtmecufedtudenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomheplfhonhgrthhhrghnucffvghrrhhitghkuceojhhonhgrthhhrghnrdguvghrrhhitghksehlihhnuhigrdguvghvqeenucggtffrrghtthgvrhhnpedtteeljeffgfffveehhfetveefuedvheevffffhedtjeeuvdevgfeftddtheeftdenucfkphepjedurddvtdehrddukedurdehtdenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhephhgvlhhopehlohgtrghlhhhoshhtrdhlohgtrghlughomhgrihhnpdhinhgvthepjedurddvtdehrddukedurdehtddpmhgrihhlfhhrohhmpehjohhnrghthhgrnhdruggvrhhrihgtkheslhhinhhugidruggvvhdpnhgspghrtghpthhtohepjedprhgtphhtthhopehqvghmuhdquggvvhgvlhesnhhonhhgnhhurdhorhhgpdhrtghpthhtohepmhhitghhrggvlhdrkhhrohhprggtiigvkhesshholhhiughighhmrdgtohhmpdhrtghpthhtohepqhgvmhhuqdgslhhotghksehnohhnghhnuhdrohhrghdprhgtphhtthhopehksghushgthheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepihhtshesihhrrhgvlhgvvhgrnhhtrdgukhdprhgtphhtthhopehkfiholhhfsehrvgguhhgrthdrtghomhdprhgtphhtthhopehhrhgvihhtiiesrhgvughhrghtrdgtohhm X-Xfinity-VMeta: sc=-100.00;st=legit From: Jonathan Derrick To: qemu-devel@nongnu.org Cc: Michael Kropaczek , qemu-block@nongnu.org, Keith Busch , Klaus Jensen , Kevin Wolf , Hanna Reitz Subject: [PATCH v5 2/2] hw/nvme: Support for Namespaces Management from guest OS - delete-ns Date: Mon, 2 Jan 2023 12:54:03 -0700 Message-Id: <20230102195403.461-3-jonathan.derrick@linux.dev> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20230102195403.461-1-jonathan.derrick@linux.dev> References: <20230102195403.461-1-jonathan.derrick@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: softfail client-ip=2001:558:fd02:2446::5; envelope-from=jonathan.derrick@linux.dev; helo=resqmta-h1p-028592.sys.comcast.net X-Spam_score_int: -11 X-Spam_score: -1.2 X-Spam_bar: - X-Spam_report: (-1.2 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_HELO_PASS=-0.001, SPF_SOFTFAIL=0.665 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @comcastmailservice.net) X-ZM-MESSAGEID: 1672689382647100001 Content-Type: text/plain; charset="utf-8" From: Michael Kropaczek Added support for NVMEe NameSpaces Mangement allowing the guest OS to delete namespaces by issuing nvme delete-ns command. It is an extension to currently implemented Qemu nvme virtual device. Virtual devices representing namespaces will be created and/or deleted during Qemu's running session, at anytime. Signed-off-by: Michael Kropaczek --- docs/system/devices/nvme.rst | 9 ++-- hw/nvme/ctrl.c | 82 ++++++++++++++++++++++++++++++++---- hw/nvme/ns-backend.c | 5 +++ hw/nvme/ns.c | 71 +++++++++++++++++++++++++++++++ hw/nvme/nvme.h | 2 + hw/nvme/trace-events | 1 + include/block/nvme.h | 1 + 7 files changed, 159 insertions(+), 12 deletions(-) diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst index 6b3bee5e5d..f19072f1bc 100644 --- a/docs/system/devices/nvme.rst +++ b/docs/system/devices/nvme.rst @@ -103,12 +103,12 @@ Parameters: =20 ``auto-ns-path=3D`` If specified indicates a support for dynamic management of nvme namespac= es - by means of nvme create-ns command. This path points + by means of nvme create-ns and nvme delete-ns commands. This path points to the storage area for backend images must exist. Additionally it requi= res that parameter `ns-subsys` must be specified whereas parameter `drive` must not. The legacy namespace backend is disabled, instead, a pair of files 'nvme__ns_.cfg' and 'nvme__ns_.img' - will refer to respective namespaces. The create-ns, attach-ns + will refer to respective namespaces. The create-ns, delete-ns, attach-ns and detach-ns commands, issued at the guest side, will make changes to those files accordingly. For each namespace exists an image file in raw format and a config file @@ -140,8 +140,9 @@ Please note that ``nvme-ns`` device is not required to = support of dynamic namespaces management feature. It is not prohibited to assign a such devic= e to ``nvme`` device specified to support dynamic namespace management if one h= as an use case to do so, however, it will only coexist and be out of the scop= e of -Namespaces Management. NsIds will be consistently managed, creation (creat= e-ns) -of a namespace will not allocate the NsId already being taken. If ``nvme-n= s`` +Namespaces Management. Deletion (delete-ns) will render an error for this +namespace. NsIds will be consistently managed, creation (create-ns) of +a namespace will not allocate the NsId already being taken. If ``nvme-ns`` device conflicts with previously created one by create-ns (the same NsId), it will break QEMU's start up. More than one of NVMe controllers associated with NVMe subsystem are suppo= rted. diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 5ed35d7cf4..e0fac3c151 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -144,12 +144,12 @@ * * - `auto-ns-path` * If specified indicates a support for dynamic management of nvme names= paces - * by means of nvme create-ns command. This path pointing + * by means of nvme create-ns and nvme delete-ns commands. This path poi= nting * to a storage area for backend images must exist. Additionally it requ= ires * that parameter `ns-subsys` must be specified whereas parameter `drive` * must not. The legacy namespace backend is disabled, instead, a pair of * files 'nvme__ns_.cfg' and 'nvme__ns_.= img' - * will refer to respective namespaces. The create-ns, attach-ns + * will refer to respective namespaces. The create-ns, delete-ns, attach= -ns * and detach-ns commands, issued at the guest side, will make changes to * those files accordingly. * For each namespace exists an image file in raw format and a config fi= le @@ -5702,17 +5702,13 @@ static uint16_t nvme_ns_mgmt_create(NvmeCtrl *n, Nv= meRequest *req, uint32_t nsid } =20 if (nvme_cfg_update(n, ns->size, NVME_NS_ALLOC_CHK)) { - /* place for delete-ns */ - error_setg(&local_err, "Insufficient capacity, an orphaned ns[%"PR= Iu32"] will be left behind", nsid); - error_report_err(local_err); + nvme_ns_delete(n, nsid, NULL); return NVME_NS_INSUFFICIENT_CAPAC | NVME_DNR; } (void)nvme_cfg_update(n, ns->size, NVME_NS_ALLOC); if (nvme_cfg_save(n)) { (void)nvme_cfg_update(n, ns->size, NVME_NS_DEALLOC); - /* place for delete-ns */ - error_setg(&local_err, "Cannot save conf file, an orphaned ns[%"PR= Iu32"] will be left behind", nsid); - error_report_err(local_err); + nvme_ns_delete(n, nsid, NULL); return NVME_INVALID_FIELD | NVME_DNR; } =20 @@ -5726,6 +5722,66 @@ fail: return NVME_SUCCESS; } =20 +static uint16_t nvme_ns_mgmt_delete(NvmeCtrl *n, uint32_t nsid) +{ + NvmeNamespace *ns =3D NULL; + uint16_t first =3D nsid; + uint16_t last =3D nsid; + uint16_t i; + uint64_t image_size; + Error *local_err =3D NULL; + + if (!nsid) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + if (!n->params.ns_directory) { + error_setg(&local_err, "delete-ns not supported if 'auto-ns-path' = is not specified"); + goto fail; + } else if (n->namespace.blkconf.blk) { + error_setg(&local_err, "delete-ns not supported if 'drive' is spec= ified"); + goto fail; + } + + if (nsid =3D=3D NVME_NSID_BROADCAST) { + first =3D 1; + last =3D NVME_MAX_NAMESPACES; + } + + for (i =3D first; i <=3D last; i++) { + ns =3D nvme_subsys_ns(n->subsys, (uint32_t)i); + if (n->params.ns_directory && ns && ns_auto_check(n, ns, (uint32_t= )i)) { + error_setg(&local_err, "ns[%"PRIu32"] cannot be deleted, confi= gured via '-device nvme-ns...'", i); + error_report_err(local_err); + if (first !=3D last) { + local_err =3D NULL; /* we are skipping */ + } + } else if (ns) { + image_size =3D ns->size; + nvme_ns_delete(n, (uint16_t)i, &local_err); + if (local_err) { + goto fail; + } + (void)nvme_cfg_update(n, image_size, NVME_NS_DEALLOC); + if (nvme_cfg_save(n)) { + error_setg(&local_err, "Could not save nvme-cfg"); + goto fail; + } + } else if (first =3D=3D last) { + return NVME_INVALID_FIELD | NVME_DNR; + } + } + +fail: + if (local_err) { + error_report_err(local_err); + return NVME_INVALID_FIELD | NVME_DNR; + } + + nvme_update_dmrsl(n); + return NVME_SUCCESS; +} + static uint16_t nvme_ns_mgmt(NvmeCtrl *n, NvmeRequest *req) { NvmeIdCtrl *id =3D &n->id_ctrl; @@ -5760,6 +5816,16 @@ static uint16_t nvme_ns_mgmt(NvmeCtrl *n, NvmeReques= t *req) return NVME_INVALID_FIELD | NVME_DNR; } break; + case NVME_NS_MANAGEMENT_DELETE: + switch (csi) { + case NVME_CSI_NVM: + return nvme_ns_mgmt_delete(n, nsid); + case NVME_CSI_ZONED: + /* fall through for now */ + default: + return NVME_INVALID_FIELD | NVME_DNR; + } + break; default: return NVME_INVALID_FIELD | NVME_DNR; } diff --git a/hw/nvme/ns-backend.c b/hw/nvme/ns-backend.c index 06de8f262c..57d0b695fa 100644 --- a/hw/nvme/ns-backend.c +++ b/hw/nvme/ns-backend.c @@ -71,6 +71,11 @@ void ns_blockdev_activate(BlockBackend *blk, uint64_t i= mage_size, Error **errp) errp); } =20 +void ns_blockdev_deactivate(BlockBackend *blk, Error **errp) +{ + ns_blockdev_activate(blk, 0, errp); +} + int ns_storage_path_check(NvmeCtrl *n, Error **errp) { return storage_path_check(n->params.ns_directory, n->params.serial, e= rrp); diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c index 06cc6c8c71..653d136eae 100644 --- a/hw/nvme/ns.c +++ b/hw/nvme/ns.c @@ -610,6 +610,77 @@ fail: return NULL; } =20 +static void nvme_ns_unrealize(DeviceState *dev); +void nvme_ns_delete(NvmeCtrl *n, uint32_t nsid, Error **errp) +{ + NvmeNamespace *ns =3D NULL; + NvmeSubsystem *subsys =3D n->subsys; + int i; + int ret =3D 0; + Error *local_err =3D NULL; + + trace_pci_nvme_ns_delete(nsid); + + if (n->cntlid > 0 && !n->subsys) { + error_setg(&local_err, "Secondary controller without subsystem "); + return; + } + + if (subsys) { + ns =3D nvme_subsys_ns(subsys, (uint32_t)nsid); + if (ns) { + if (ns->params.shared) { + for (i =3D 0; i < ARRAY_SIZE(subsys->ctrls); i++) { + NvmeCtrl *ctrl =3D subsys->ctrls[i]; + + if (ctrl && ctrl->namespaces[nsid]) { + ctrl->namespaces[nsid] =3D NULL; + ns->attached--; + } + } + } + subsys->namespaces[nsid] =3D NULL; + } + } + + if (!ns) { + ns =3D nvme_ns(n, (uint32_t)nsid); + if (ns) { + n->namespaces[nsid] =3D NULL; + ns->attached--; + } else { + error_setg(errp, "Namespace %d does not exist", nsid); + return; + } + } + + if (ns->attached > 0) { + error_setg(errp, "Could not detach all ns references for ns[%d], s= till %d left", nsid, ns->attached); + return; + } + + /* here is actual deletion */ + nvme_ns_unrealize(&ns->parent_obj); + qdev_unrealize(&ns->parent_obj); + ns_blockdev_deactivate(ns->blkconf.blk, &local_err); + if (local_err) { + error_propagate(errp, local_err); + return; + } + + ns_cfg_clear(ns); + ret =3D ns_cfg_save(n, ns, nsid); + if (ret =3D=3D -1) { + error_setg(errp, "Unable to save ns-cnf"); + return; + } else if (ret =3D=3D 1) { /* should not occur here, check and error = message prior to call to nvme_ns_delete() */ + return; + } + + /* disassociating refernces to the back-end and keeping it as preloade= d */ + ns->blkconf.blk =3D NULL; +} + int nvme_ns_setup(NvmeNamespace *ns, Error **errp) { if (nvme_ns_check_constraints(ns, errp)) { diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 4df57096ef..c7a782d7d1 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -280,6 +280,7 @@ void nvme_ns_shutdown(NvmeNamespace *ns); void nvme_ns_cleanup(NvmeNamespace *ns); void nvme_validate_flbas(uint8_t flbas, Error **errp); NvmeNamespace *nvme_ns_create(NvmeCtrl *n, uint32_t nsid, NvmeIdNsMgmt *id= _ns, Error **errp); +void nvme_ns_delete(NvmeCtrl *n, uint32_t nsid, Error **errp); =20 typedef struct NvmeAsyncEvent { QTAILQ_ENTRY(NvmeAsyncEvent) entry; @@ -581,6 +582,7 @@ static inline NvmeSecCtrlEntry *nvme_sctrl_for_cntlid(N= vmeCtrl *n, =20 BlockBackend *ns_blockdev_init(const char *file, Error **errp); void ns_blockdev_activate(BlockBackend *blk, uint64_t image_size, Error *= *errp); +void ns_blockdev_deactivate(BlockBackend *blk, Error **errp); int nvme_ns_backend_setup(NvmeCtrl *n, Error **errp); void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns); uint16_t nvme_bounce_data(NvmeCtrl *n, void *ptr, uint32_t len, diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events index 28b025ac42..0dd0c23208 100644 --- a/hw/nvme/trace-events +++ b/hw/nvme/trace-events @@ -79,6 +79,7 @@ pci_nvme_aer_masked(uint8_t type, uint8_t mask) "type 0x%= "PRIx8" mask 0x%"PRIx8" pci_nvme_aer_post_cqe(uint8_t typ, uint8_t info, uint8_t log_page) "type 0= x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8"" pci_nvme_ns_mgmt(uint16_t cid, uint32_t nsid, uint8_t sel, uint8_t csi, ui= nt8_t psdt) "cid %"PRIu16", nsid=3D%"PRIu32", sel=3D0x%"PRIx8", csi=3D0x%"P= RIx8", psdt=3D0x%"PRIx8"" pci_nvme_ns_create(uint16_t nsid, uint64_t nsze, uint64_t ncap, uint8_t fl= bas) "nsid %"PRIu16", nsze=3D%"PRIu64", ncap=3D%"PRIu64", flbas=3D%"PRIu8"" +pci_nvme_ns_delete(uint16_t nsid) "nsid %"PRIu16"" pci_nvme_ns_attachment(uint16_t cid, uint8_t sel) "cid %"PRIu16", sel=3D0x= %"PRIx8"" pci_nvme_ns_attachment_attach(uint16_t cntlid, uint32_t nsid) "cntlid=3D0x= %"PRIx16", nsid=3D0x%"PRIx32"" pci_nvme_enqueue_event(uint8_t typ, uint8_t info, uint8_t log_page) "type = 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8"" diff --git a/include/block/nvme.h b/include/block/nvme.h index 9d2e121f1a..0fe7fe9bb1 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -1191,6 +1191,7 @@ enum NvmeIdCtrlCmic { =20 enum NvmeNsManagementOperation { NVME_NS_MANAGEMENT_CREATE =3D 0x0, + NVME_NS_MANAGEMENT_DELETE =3D 0x1, }; =20 enum NvmeNsAttachmentOperation { --=20 2.37.3