From nobody Fri Nov 1 01:03:57 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=linux.dev ARC-Seal: i=1; a=rsa-sha256; t=1666899699; cv=none; d=zohomail.com; s=zohoarc; b=HFCPmyfyNlwmaKRU/OTu5jCVp29dFLmbh4Pny3H8ue+LNvVVW2XjzuP+8EfuxYE78kU+mAsmU2QuQcWSMLjgbMHf/vSFjh7w3NVJOtdqdGj6tRyKpqBSn7AwWbb/yLg/+KQZBq8DSLxKMtd3eNY7qD97LclNr6XTNI8uxQPQZ1o= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1666899699; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=wdHsXl0Ah3VWs+ND4zqpaE+Wh2LAPSwoAZu1tSEx4Tw=; b=QNmFB+CJCs8+j9olPlVzb1woEojG9JNea3al39vdRAlKaipKdbvrAX5HITURSrchyrNPNQBJwSMHsmDcLVIABdQGvu58I9374p43gE5Do/GTl9RaD67/FeVZ6d7+RzbMm4xn/XyTbjC17X6BUWhdUOm2zMw8IZtj8YDhvvrsS9w= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1666899699432353.61085550396535; Thu, 27 Oct 2022 12:41:39 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oo7CS-0008Ov-Qx; Thu, 27 Oct 2022 14:01:52 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oo7CO-00086D-Bz for qemu-devel@nongnu.org; Thu, 27 Oct 2022 14:01:49 -0400 Received: from resqmta-c1p-023464.sys.comcast.net ([2001:558:fd00:56::b]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oo7CI-0005sZ-94 for qemu-devel@nongnu.org; Thu, 27 Oct 2022 14:01:48 -0400 Received: from resomta-c1p-023266.sys.comcast.net ([96.102.18.234]) by resqmta-c1p-023464.sys.comcast.net with ESMTP id o3gooH8gkHRb3o7CEoHEOj; Thu, 27 Oct 2022 18:01:38 +0000 Received: from jderrick-mobl4.Home ([75.163.75.236]) by resomta-c1p-023266.sys.comcast.net with ESMTPA id o7BQo587p62udo7BsodDDo; Thu, 27 Oct 2022 18:01:35 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20211018a; t=1666893698; bh=wdHsXl0Ah3VWs+ND4zqpaE+Wh2LAPSwoAZu1tSEx4Tw=; h=Received:Received:From:To:Subject:Date:Message-Id:MIME-Version: Xfinity-Spam-Result; b=bkJgTdAMDbEz6u/HBQfP+tKXmT80aGAise3X0FGT8Z+hGKr6rogVWv42EL/K/xN6l 3Oom8MNvTMi3rK6GVqcDvZpS5gXfu2OE5YuqU18oToD56r30A5VU+Ynq3JIgaw4FF/ ojM/JzN937Siz67XKVA5yWPsUNzNg+dDkLduVsSpYbAu1mMx7QFjfS/I6EmagWkL0p AMy+BHRZWGLwIKGduEiZIbP+m4vcU4Pev7LSPco8ycZT0gxUtk7Y3rUnUSdt48ZYG9 bCJvGp65gWBN+4nVMNiiWa5u21UbcDjY1AUL1rbQed9lhEcNqrb0pWG3dTt9OwNhQA O3k29+/YjciWA== X-Xfinity-VAAS: gggruggvucftvghtrhhoucdtuddrgedvgedrtdeggdduudejucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuvehomhgtrghsthdqtfgvshhipdfqfgfvpdfpqffurfetoffkrfenuceurghilhhouhhtmecufedtudenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomheplfhonhgrthhhrghnucffvghrrhhitghkuceojhhonhgrthhhrghnrdguvghrrhhitghksehlihhnuhigrdguvghvqeenucggtffrrghtthgvrhhnpedtteeljeffgfffveehhfetveefuedvheevffffhedtjeeuvdevgfeftddtheeftdenucfkphepjeehrdduieefrdejhedrvdefieenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhephhgvlhhopehjuggvrhhrihgtkhdqmhhosghlgedrjfhomhgvpdhinhgvthepjeehrdduieefrdejhedrvdefiedpmhgrihhlfhhrohhmpehjohhnrghthhgrnhdruggvrhhrihgtkheslhhinhhugidruggvvhdpnhgspghrtghpthhtohepkedprhgtphhtthhopehqvghmuhdquggvvhgvlhesnhhonhhgnhhurdhorhhgpdhrtghpthhtohepjhhonhgrthhhrghnrdguvghrrhhitghksehlihhnuhigrdguvghvpdhrtghpthhtohepmhhitghhrggvlhdrkhhrohhprggtiigvkhesshholhhiughighhmrdgtohhmpdhrtghpthhtohepqhgvmhhuqdgslhhotghksehnohhnghhnuhdrohhrghdprhgtphhtthhopehksghushgthheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepihhtshesihhrrhgvlhgvvhgrnhhtrdgukhdprhgtphhtthhopehkfiholhhfsehrvgguhhgrthdrtghomhdprhgtphhtthhopehhrhgvihhtiiesrhgvughhrghtrdgtohhm X-Xfinity-VMeta: sc=-100.00;st=legit From: Jonathan Derrick To: qemu-devel@nongnu.org Cc: Jonathan Derrick , Michael Kropaczek , qemu-block@nongnu.org, Keith Busch , Klaus Jensen , Kevin Wolf , Hanna Reitz Subject: [PATCH v3 1/2] hw/nvme: Support for Namespaces Management from guest OS - create-ns Date: Thu, 27 Oct 2022 13:00:45 -0500 Message-Id: <20221027180046.250-2-jonathan.derrick@linux.dev> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221027180046.250-1-jonathan.derrick@linux.dev> References: <20221027180046.250-1-jonathan.derrick@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: softfail client-ip=2001:558:fd00:56::b; envelope-from=jonathan.derrick@linux.dev; helo=resqmta-c1p-023464.sys.comcast.net X-Spam_score_int: -11 X-Spam_score: -1.2 X-Spam_bar: - X-Spam_report: (-1.2 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_HELO_PASS=-0.001, SPF_SOFTFAIL=0.665 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Qemu-devel" Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @comcastmailservice.net) X-ZM-MESSAGEID: 1666899700954100001 Content-Type: text/plain; charset="utf-8" From: Michael Kropaczek Added support for NVMEe NameSpaces Mangement allowing the guest OS to create namespaces by issuing nvme create-ns command. It is an extension to currently implemented Qemu nvme virtual device. Virtual devices representing namespaces will be created and/or deleted during Qemu's running session, at anytime. Signed-off-by: Michael Kropaczek --- docs/system/devices/nvme.rst | 55 +++++++- hw/nvme/cfg_key_checker.c | 51 ++++++++ hw/nvme/ctrl-cfg.c | 181 +++++++++++++++++++++++++++ hw/nvme/ctrl.c | 204 +++++++++++++++++++++++++++++- hw/nvme/meson.build | 2 +- hw/nvme/ns-backend.c | 234 +++++++++++++++++++++++++++++++++++ hw/nvme/ns.c | 234 ++++++++++++++++++++++++++++++----- hw/nvme/nvme.h | 31 ++++- hw/nvme/trace-events | 2 + include/block/nvme.h | 30 +++++ include/hw/nvme/ctrl-cfg.h | 24 ++++ include/hw/nvme/ns-cfg.h | 28 +++++ include/hw/nvme/nvme-cfg.h | 201 ++++++++++++++++++++++++++++++ qemu-img-cmds.hx | 6 + qemu-img.c | 134 ++++++++++++++++++++ 15 files changed, 1380 insertions(+), 37 deletions(-) create mode 100644 hw/nvme/cfg_key_checker.c create mode 100644 hw/nvme/ctrl-cfg.c create mode 100644 hw/nvme/ns-backend.c create mode 100644 include/hw/nvme/ctrl-cfg.h create mode 100644 include/hw/nvme/ns-cfg.h create mode 100644 include/hw/nvme/nvme-cfg.h diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst index 30f841ef62..13e2fbc0d6 100644 --- a/docs/system/devices/nvme.rst +++ b/docs/system/devices/nvme.rst @@ -92,6 +92,59 @@ There are a number of parameters available: attach the namespace to a specific ``nvme`` device (identified by an ``i= d`` parameter on the controller device). =20 +Additional Namespaces managed by guest OS Namespaces Management +--------------------------------------------------------------------- + +.. code-block:: console + + -device nvme,id=3Dnvme-ctrl,serial=3D1234,subsys=3Dnvme-subsys,auto-ns-= path=3Dpath + +Parameters: + +``auto-ns-path=3D`` + If specified indicates a support for dynamic management of nvme namespac= es + by means of nvme create-ns command. This path points + to the storage area for backend images must exist. Additionally it requi= res + that parameter `ns-subsys` must be specified whereas parameter `drive` + must not. The legacy namespace backend is disabled, instead, a pair of + files 'nvme__ns_.cfg' and 'nvme__ns_.img' + will refer to respective namespaces. The create-ns, attach-ns + and detach-ns commands, issued at the guest side, will make changes to + those files accordingly. + For each namespace exists an image file in raw format and a config file + containing namespace parameters and state of the attachment allowing QEMU + to configure namespaces accordingly during start up. If for instance an + image file has a size of 0 bytes this will be interpreted as non existent + namespace. Issuing create-ns command will change the status in the config + files and and will re-size the image file accordingly so the image file + will be associated with the respective namespace. The main config file + nvme__ctrl.cfg keeps the track of allocated capacity to the + namespaces within the nvme controller. + As it is the case of a typical hard drive, backend images together with + config files need to be created. For this reason the qemu-img tool has + been extended by adding createns command. + + qemu-img createns {-S -C } + [-N ] {} + + Parameters: + -S and -C and are mandatory, `-S` must match `serial` parameter + and must match `auto-ns-path` parameter of "-device nvme,..." + specification. + -N is optional, if specified it will set a limit to the number of potent= ial + namespaces and will reduce the number of backend images and config files + accordingly. As a default, a set of images of 0 bytes size and default + config files for 256 namespaces will be created, a total of 513 files. + +Please note that ``nvme-ns`` device is not required to support of dynamic +namespaces management feature. It is not prohibited to assign a such devic= e to +``nvme`` device specified to support dynamic namespace management if one h= as +an use case to do so, however, it will only coexist and be out of the scop= e of +Namespaces Management. NsIds will be consistently managed, creation (creat= e-ns) +of a namespace will not allocate the NsId already being taken. If ``nvme-n= s`` +device conflicts with previously created one by create-ns (the same NsId), +it will break QEMU's start up. + NVM Subsystems -------------- =20 @@ -320,4 +373,4 @@ controller are: =20 .. code-block:: console =20 - echo 0000:01:00.1 > /sys/bus/pci/drivers/nvme/bind \ No newline at end of file + echo 0000:01:00.1 > /sys/bus/pci/drivers/nvme/bind diff --git a/hw/nvme/cfg_key_checker.c b/hw/nvme/cfg_key_checker.c new file mode 100644 index 0000000000..5f19126b29 --- /dev/null +++ b/hw/nvme/cfg_key_checker.c @@ -0,0 +1,51 @@ +/* + * QEMU NVM Express Virtual Dynamic Namespace Management + * + * + * Copyright (c) 2022 Solidigm + * + * Authors: + * Michael Kropaczek + * + * This work is licensed under the terms of the GNU GPL, version 2. See the + * COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include "qemu/error-report.h" +#include "qapi/qmp/qnum.h" +#include "qapi/qmp/qbool.h" +#include "qapi/error.h" +#include "block/qdict.h" + +#include "nvme.h" + +/* Here is a need for wrapping of original Qemu dictionary retrieval + * APIs. In rare cases, when nvme cfg files were tampered with or the + * Qemu version was upgraded and a new key is expected to be existent, + * but is missing, it will cause segfault crash. + * Builtin assert statements are not covering sufficiently such cases + * and additionally a possibility of error handling is lacking */ +#define NVME_KEY_CHECK_ERROR_FMT "key[%s] is expected to be existent" +int64_t qdict_get_int_chkd(const QDict *qdict, const char *key, Error **er= rp) +{ + QObject *qobject =3D qdict_get(qdict, key); + if (qobject) { + return qnum_get_int(qobject_to(QNum, qobject)); + } + + error_setg(errp, NVME_KEY_CHECK_ERROR_FMT, key); + return 0; +} + +bool qdict_get_bool_chkd(const QDict *qdict, const char *key, Error **errp) +{ + QObject *qobject =3D qdict_get(qdict, key); + if (qobject) { + return qbool_get_bool(qobject_to(QBool, qobject)); + } + + error_setg(errp, NVME_KEY_CHECK_ERROR_FMT, key); + return false; +} diff --git a/hw/nvme/ctrl-cfg.c b/hw/nvme/ctrl-cfg.c new file mode 100644 index 0000000000..8dbf25bfb2 --- /dev/null +++ b/hw/nvme/ctrl-cfg.c @@ -0,0 +1,181 @@ +/* + * QEMU NVM Express Virtual Dynamic Namespace Management + * + * + * Copyright (c) 2022 Solidigm + * + * Authors: + * Michael Kropaczek + * + * This work is licensed under the terms of the GNU GPL, version 2. See the + * COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include "qemu/error-report.h" +#include "qapi/error.h" +#include "qapi/qmp/qjson.h" +#include "qapi/qmp/qstring.h" +#include "sysemu/block-backend.h" +#include "block/qdict.h" +#include "qemu/int128.h" +#include "hw/nvme/nvme-cfg.h" + +#include "nvme.h" +#include "trace.h" + +static char *nvme_create_cfg_name(NvmeCtrl *n, Error **errp) +{ + return c_create_cfg_name(n->params.ns_directory, n->params.serial, err= p); +} + +int nvme_cfg_save(NvmeCtrl *n) +{ + NvmeIdCtrl *id =3D &n->id_ctrl; + QDict *nvme_cfg =3D NULL; + Int128 tnvmcap128; + Int128 unvmcap128; + + nvme_cfg =3D qdict_new(); + + memcpy(&tnvmcap128, id->tnvmcap, sizeof(tnvmcap128)); + memcpy(&unvmcap128, id->unvmcap, sizeof(unvmcap128)); + +#define CTRL_CFG_DEF(type, key, value, default) \ + qdict_put_##type(nvme_cfg, key, value); +#include "hw/nvme/ctrl-cfg.h" +#undef CTRL_CFG_DEF + + return c_cfg_save(n->params.ns_directory, n->params.serial, nvme_cfg); +} + +int nvme_cfg_update(NvmeCtrl *n, uint64_t amount, NvmeNsAllocAction action) +{ + int ret =3D 0; + NvmeIdCtrl *id =3D &n->id_ctrl; + Int128 tnvmcap128; + Int128 unvmcap128; + Int128 amount128 =3D int128_make64(amount); + + memcpy(&tnvmcap128, id->tnvmcap, sizeof(tnvmcap128)); + memcpy(&unvmcap128, id->unvmcap, sizeof(unvmcap128)); + + switch (action) { + case NVME_NS_ALLOC_CHK: + if (int128_ge(unvmcap128, amount128)) { + return 0; /* no update */ + } else { + ret =3D -1; + } + break; + case NVME_NS_ALLOC: + if (int128_ge(unvmcap128, amount128)) { + unvmcap128 =3D int128_sub(unvmcap128, amount128); + } else { + ret =3D -1; + } + break; + case NVME_NS_DEALLOC: + unvmcap128 =3D int128_add(unvmcap128, amount128); + if (int128_ge(unvmcap128, tnvmcap128)) { + unvmcap128 =3D tnvmcap128; + } + break; + default:; + } + + if (ret =3D=3D 0) { + memcpy(id->unvmcap, &unvmcap128, sizeof(id->unvmcap)); + } + + return ret; +} + +/* Note: id->tnvmcap and id->unvmcap are pointing to 16 bytes arrays, + * but those are interpreted as 128bits int objects. + * It is OK here to use Int128 because backend's namespace images ca= nnot + * exceed size of 64bit max value */ +static int nvme_cfg_validate(NvmeCtrl *n, uint64_t tnvmcap, uint64_t unvmc= ap, + Error **errp) +{ + int ret =3D 0; + NvmeIdCtrl *id =3D &n->id_ctrl; + Int128 tnvmcap128; + Int128 unvmcap128; + Error *local_err =3D NULL; + + if (unvmcap > tnvmcap) { + error_setg(&local_err, "nvme-cfg file is corrupted, free to alloca= te[%"PRIu64 + "] > total capacity[%"PRIu64"]", + unvmcap, tnvmcap); + } else if (tnvmcap =3D=3D (uint64_t) 0) { + error_setg(&local_err, "nvme-cfg file error: total capacity cannot= be zero"); + } + + if (local_err) { + error_propagate(errp, local_err); + ret =3D -1; + } else { + tnvmcap128 =3D int128_make64(tnvmcap); + unvmcap128 =3D int128_make64(unvmcap); + memcpy(id->tnvmcap, &tnvmcap128, sizeof(id->tnvmcap)); + memcpy(id->unvmcap, &unvmcap128, sizeof(id->unvmcap)); + } + + return ret; +} + +int nvme_cfg_load(NvmeCtrl *n) +{ + QObject *nvme_cfg_obj =3D NULL; + QDict *nvme_cfg =3D NULL; + int ret =3D 0; + char *filename; + uint64_t tnvmcap; + uint64_t unvmcap; + FILE *fp; + char buf[NVME_CFG_MAXSIZE] =3D {}; + Error *local_err =3D NULL; + + filename =3D nvme_create_cfg_name(n, &local_err); + if (!local_err && !access(filename, F_OK)) { + fp =3D fopen(filename, "r"); + if (fp =3D=3D NULL) { + error_setg(&local_err, "open %s: %s", filename, + strerror(errno)); + } else { + if (!fread(buf, sizeof(buf), 1, fp)) { + nvme_cfg_obj =3D qobject_from_json(buf, NULL); + if (!nvme_cfg_obj) { + error_setg(&local_err, "Could not parse the JSON for n= vme-cfg"); + } else { + nvme_cfg =3D qobject_to(QDict, nvme_cfg_obj); + qdict_flatten(nvme_cfg); + + tnvmcap =3D qdict_get_int_chkd(nvme_cfg, "tnvmcap", &l= ocal_err); + if (!local_err) { + unvmcap =3D qdict_get_int_chkd(nvme_cfg, "unvmcap"= , &local_err); + } + if (!local_err) { + nvme_cfg_validate(n, tnvmcap, unvmcap, &local_err); + } + qobject_unref(nvme_cfg_obj); + } + } else { + error_setg(&local_err, "Could not read nvme-cfg"); + } + fclose(fp); + } + } else if (!local_err) { + error_setg(&local_err, "Missing nvme-cfg file"); + } + + if (local_err) { + error_report_err(local_err); + ret =3D -1; + } + + g_free(filename); + return ret; +} diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 87aeba0564..d2b9d65eb9 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -40,7 +40,9 @@ * sriov_vi_flexible=3D \ * sriov_max_vi_per_vf=3D \ * sriov_max_vq_per_vf=3D \ - * subsys=3D + * subsys=3D, \ + * auto-ns-path=3D + * * -device nvme-ns,drive=3D,bus=3D,nsid=3D,\ * zoned=3D, \ * subsys=3D,detached=3D @@ -140,6 +142,60 @@ * a secondary controller. The default 0 resolves to * `(sriov_vq_flexible / sriov_max_vfs)`. * + * - `auto-ns-path` + * If specified indicates a support for dynamic management of nvme names= paces + * by means of nvme create-ns command. This path pointing + * to a storage area for backend images must exist. Additionally it requ= ires + * that parameter `ns-subsys` must be specified whereas parameter `drive` + * must not. The legacy namespace backend is disabled, instead, a pair of + * files 'nvme__ns_.cfg' and 'nvme__ns_.= img' + * will refer to respective namespaces. The create-ns, attach-ns + * and detach-ns commands, issued at the guest side, will make changes to + * those files accordingly. + * For each namespace exists an image file in raw format and a config fi= le + * containing namespace parameters and a state of the attachment allowin= g QEMU + * to configure namespace during its start up accordingly. If for instan= ce an + * image file has a size of 0 bytes, this will be interpreted as non exi= stent + * namespace. Issuing create-ns command will change the status in the co= nfig + * files and and will re-size the image file accordingly so the image fi= le + * will be associated with the respective namespace. The main config file + * nvme__ctrl.cfg keeps the track of allocated capacity to the + * namespaces within the nvme controller. + * As it is the case of a typical hard drive, backend images together wi= th + * config files need to be created. For this reason the qemu-img tool has + * been extended by adding createns command. + * + * qemu-img createns {-S -C } + * [-N ] {} + * + * Parameters: + * -S and -C and are mandatory, `-S` must match `serial` parameter + * and must match `auto-ns-path` parameter of "-device nvme,..." + * specification. + * -N is optional, if specified, it will set a limit to the number of po= tential + * namespaces and will reduce the number of backend images and config fi= les + * accordingly. As a default, a set of images of 0 bytes size and default + * config files for 256 namespaces will be created, a total of 513 files. + * + * Note 1: + * If the main "-drive" is not specified with 'if=3Dvirtio', then = SeaBIOS + * must be built with disabled "Parallelize hardware init" to allow + * a proper boot. Without it, it is probable that non deterministic + * order of collecting of potential block devices for a boot will = not + * catch that one with guest OS. Deterministic order however will = fill + * up the list of potential boot devices starting with a typical A= TA + * devices usually containing guest OS. + * SeaBIOS has a limited space to store all potential boot block d= evices + * if there are more than 11 namespaces. (other types require less + * memory so the number of 11 does not apply universally) + * (above Note refers to SeaBIOS rel-1.16.0) + * Note 2: + * If the main "-drive" referring to guest OS is specified with + * 'if=3Dvirtio', then there is no need to build SeaBIOS with disa= bled + * "Parallelize hardware init". + * Block boot device 'Virtio disk PCI:xx:xx.x" will appear as a fi= rst + * listed instead of an ATA device. + * * nvme namespace device parameters * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * - `shared` @@ -262,6 +318,7 @@ static const uint32_t nvme_cse_acs[256] =3D { [NVME_ADM_CMD_SET_FEATURES] =3D NVME_CMD_EFF_CSUPP, [NVME_ADM_CMD_GET_FEATURES] =3D NVME_CMD_EFF_CSUPP, [NVME_ADM_CMD_ASYNC_EV_REQ] =3D NVME_CMD_EFF_CSUPP, + [NVME_ADM_CMD_NS_MGMT] =3D NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_= NIC, [NVME_ADM_CMD_NS_ATTACHMENT] =3D NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_= NIC, [NVME_ADM_CMD_VIRT_MNGMT] =3D NVME_CMD_EFF_CSUPP, [NVME_ADM_CMD_DBBUF_CONFIG] =3D NVME_CMD_EFF_CSUPP, @@ -5660,6 +5717,121 @@ static void nvme_select_iocs_ns(NvmeCtrl *n, NvmeNa= mespace *ns) } } =20 +static NvmeNamespace *nvme_ns_mgmt_create(NvmeCtrl *n, uint32_t nsid, Nvme= IdNsMgmt *id_ns, Error **errp) +{ + NvmeNamespace *ns =3D NULL; + Error *local_err =3D NULL; + + if (!n->params.ns_directory) { + error_setg(&local_err, "create-ns not supported if 'auto-ns-path' = is not specified"); + } else if (n->namespace.blkconf.blk) { + error_setg(&local_err, "create-ns not supported if 'drive' is spec= ified"); + } else { + ns =3D nvme_ns_create(n, nsid, id_ns, &local_err); + } + + if (local_err) { + error_propagate(errp, local_err); + ns =3D NULL; + } + + return ns; +} + +static uint16_t nvme_ns_mgmt(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeIdCtrl *id =3D &n->id_ctrl; + NvmeNamespace *ns; + NvmeIdNsMgmt id_ns =3D {}; + uint8_t flags =3D req->cmd.flags; + uint32_t nsid =3D le32_to_cpu(req->cmd.nsid); + uint32_t dw10 =3D le32_to_cpu(req->cmd.cdw10); + uint32_t dw11 =3D le32_to_cpu(req->cmd.cdw11); + uint8_t sel =3D dw10 & 0xf; + uint8_t csi =3D (dw11 >> 24) & 0xf; + uint16_t i; + uint16_t ret; + Error *local_err =3D NULL; + + trace_pci_nvme_ns_mgmt(nvme_cid(req), nsid, sel, csi, NVME_CMD_FLAGS_P= SDT(flags)); + + if (!(le16_to_cpu(id->oacs) & NVME_OACS_NS_MGMT)) { + return NVME_NS_ATTACH_MGMT_NOTSPRD | NVME_DNR; + } + + switch (sel) { + case NVME_NS_MANAGEMENT_CREATE: + switch (csi) { + case NVME_CSI_NVM: + if (nsid) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + ret =3D nvme_h2c(n, (uint8_t *)&id_ns, sizeof(id_ns), req); + if (ret) { + return ret; + } + + uint64_t nsze =3D le64_to_cpu(id_ns.nsze); + uint64_t ncap =3D le64_to_cpu(id_ns.ncap); + + if (ncap > nsze) { + return NVME_INVALID_FIELD | NVME_DNR; + } else if (ncap !=3D nsze) { + return NVME_THIN_PROVISION_NOTSPRD | NVME_DNR; + } + + nvme_validate_flbas(id_ns.flbas, &local_err); + if (local_err) { + error_report_err(local_err); + return NVME_INVALID_FORMAT | NVME_DNR; + } + + for (i =3D 1; i <=3D NVME_MAX_NAMESPACES; i++) { + if (nvme_ns(n, (uint32_t)i) || nvme_subsys_ns(n->subsys, (= uint32_t)i)) { + continue; + } + break; + } + if (i > n->nsidMax || i > NVME_MAX_NAMESPACES) { + return NVME_NS_IDNTIFIER_UNAVAIL | NVME_DNR; + } + nsid =3D i; + + /* create ns here */ + ns =3D nvme_ns_mgmt_create(n, nsid, &id_ns, &local_err); + if (!ns || local_err) { + if (local_err) { + error_report_err(local_err); + } + return NVME_INVALID_FIELD | NVME_DNR; + } + + if (nvme_cfg_update(n, ns->size, NVME_NS_ALLOC_CHK)) { + /* place for delete-ns */ + return NVME_NS_INSUFFICIENT_CAPAC | NVME_DNR; + } + (void)nvme_cfg_update(n, ns->size, NVME_NS_ALLOC); + if (nvme_cfg_save(n)) { + (void)nvme_cfg_update(n, ns->size, NVME_NS_DEALLOC); + /* place for delete-ns */ + return NVME_INVALID_FIELD | NVME_DNR; + } + req->cqe.result =3D cpu_to_le32(nsid); + break; + case NVME_CSI_ZONED: + /* fall through for now */ + default: + return NVME_INVALID_FIELD | NVME_DNR; + } + break; + default: + return NVME_INVALID_FIELD | NVME_DNR; + } + + return NVME_SUCCESS; +} + static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req) { NvmeNamespace *ns; @@ -5672,6 +5844,7 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeR= equest *req) uint16_t *ids =3D &list[1]; uint16_t ret; int i; + Error *local_err; =20 trace_pci_nvme_ns_attachment(nvme_cid(req), dw10 & 0xf); =20 @@ -5710,6 +5883,13 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, Nvme= Request *req) return NVME_NS_PRIVATE | NVME_DNR; } =20 + ns->params.detached =3D false; + if (ns_cfg_save(n, ns, nsid) =3D=3D -1) { /* save ns= cfg */ + error_setg(&local_err, "Unable to save ns-cnf"); + error_report_err(local_err); + return NVME_INVALID_FIELD | NVME_DNR; + } + nvme_attach_ns(ctrl, ns); nvme_select_iocs_ns(ctrl, ns); =20 @@ -5720,6 +5900,13 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, Nvme= Request *req) return NVME_NS_NOT_ATTACHED | NVME_DNR; } =20 + ns->params.detached =3D true; + if (ns_cfg_save(n, ns, nsid) =3D=3D -1) { /* save ns= cfg */ + error_setg(&local_err, "Unable to save ns-cnf"); + error_report_err(local_err); + return NVME_INVALID_FIELD | NVME_DNR; + } + ctrl->namespaces[nsid] =3D NULL; ns->attached--; =20 @@ -6211,6 +6398,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeReque= st *req) return nvme_get_feature(n, req); case NVME_ADM_CMD_ASYNC_EV_REQ: return nvme_aer(n, req); + case NVME_ADM_CMD_NS_MGMT: + return nvme_ns_mgmt(n, req); case NVME_ADM_CMD_NS_ATTACHMENT: return nvme_ns_attachment(n, req); case NVME_ADM_CMD_VIRT_MNGMT: @@ -7052,7 +7241,7 @@ static void nvme_check_constraints(NvmeCtrl *n, Error= **errp) params->max_ioqpairs =3D params->num_queues - 1; } =20 - if (n->namespace.blkconf.blk && n->subsys) { + if (n->namespace.blkconf.blk && n->subsys && !params->ns_directory) { error_setg(errp, "subsystem support is unavailable with legacy " "namespace ('drive' property)"); return; @@ -7602,7 +7791,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **= errp) nvme_init_ctrl(n, pci_dev); =20 /* setup a namespace if the controller drive property was given */ - if (n->namespace.blkconf.blk) { + if (n->namespace.blkconf.blk && !n->params.ns_directory) { ns =3D &n->namespace; ns->params.nsid =3D 1; =20 @@ -7611,6 +7800,14 @@ static void nvme_realize(PCIDevice *pci_dev, Error *= *errp) } =20 nvme_attach_ns(n, ns); + } else if (!n->namespace.blkconf.blk && n->params.ns_directory) { + if (nvme_cfg_load(n)) { + error_setg(errp, "Could not process nvme-cfg"); + return; + } + if (nvme_ns_backend_setup(n, errp)) { + return; + } } } =20 @@ -7655,6 +7852,7 @@ static void nvme_exit(PCIDevice *pci_dev) =20 static Property nvme_props[] =3D { DEFINE_BLOCK_PROPERTIES(NvmeCtrl, namespace.blkconf), + DEFINE_PROP_STRING("auto-ns-path", NvmeCtrl,params.ns_directory), DEFINE_PROP_LINK("pmrdev", NvmeCtrl, pmr.dev, TYPE_MEMORY_BACKEND, HostMemoryBackend *), DEFINE_PROP_LINK("subsys", NvmeCtrl, subsys, TYPE_NVME_SUBSYS, diff --git a/hw/nvme/meson.build b/hw/nvme/meson.build index 3cf40046ee..8900831701 100644 --- a/hw/nvme/meson.build +++ b/hw/nvme/meson.build @@ -1 +1 @@ -softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('ctrl.c', 'dif.c', = 'ns.c', 'subsys.c')) +softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('ctrl.c', 'dif.c', = 'ns.c', 'subsys.c', 'ns-backend.c', 'cfg_key_checker.c', 'ctrl-cfg.c')) diff --git a/hw/nvme/ns-backend.c b/hw/nvme/ns-backend.c new file mode 100644 index 0000000000..82f9fcd5d9 --- /dev/null +++ b/hw/nvme/ns-backend.c @@ -0,0 +1,234 @@ +/* + * QEMU NVM Express Virtual Dynamic Namespace Management + * + * + * Copyright (c) 2022 Solidigm + * + * Authors: + * Michael Kropaczek + * + * This work is licensed under the terms of the GNU GPL, version 2. See the + * COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include "qemu/units.h" +#include "qemu/error-report.h" +#include "qapi/error.h" +#include "qapi/qmp/qjson.h" +#include "qapi/qmp/qstring.h" +#include "sysemu/sysemu.h" +#include "sysemu/block-backend.h" +#include "block/qdict.h" +#include "hw/nvme/nvme-cfg.h" + +#include "nvme.h" +#include "trace.h" + +/* caller will take ownership */ +static QDict *ns_get_bs_default_opts(bool read_only) +{ + QDict *bs_opts =3D qdict_new(); + + qdict_set_default_str(bs_opts, BDRV_OPT_CACHE_DIRECT, "off"); + qdict_set_default_str(bs_opts, BDRV_OPT_CACHE_NO_FLUSH, "off"); + qdict_set_default_str(bs_opts, BDRV_OPT_READ_ONLY, + read_only ? "on" : "off"); + qdict_set_default_str(bs_opts, BDRV_OPT_AUTO_READ_ONLY, "on"); + qdict_set_default_str(bs_opts, "driver", "raw"); + + return bs_opts; +} + +BlockBackend *ns_blockdev_init(const char *file, Error **errp) +{ + BlockBackend *blk =3D NULL; + bool read_only =3D false; + Error *local_err =3D NULL; + QDict *bs_opts; + + if (access(file, F_OK)) { + error_setg(&local_err, "%s not found, please create one", file); + } + + if (!local_err) { + bs_opts =3D ns_get_bs_default_opts(read_only); + blk =3D blk_new_open(file, NULL, bs_opts, BDRV_O_RDWR | BDRV_O_RES= IZE, &local_err); + } + + if (local_err) { + error_propagate(errp, local_err); + } + + return blk; +} + +void ns_blockdev_activate(BlockBackend *blk, uint64_t image_size, Error *= *errp) +{ + int ret; + + ret =3D blk_set_perm(blk, BLK_PERM_WRITE | BLK_PERM_RESIZE, BLK_PERM_W= RITE_UNCHANGED, errp); + if (ret < 0) { + return; + } + ret =3D blk_truncate(blk, image_size, false, PREALLOC_MODE_OFF, 0, + errp); +} + +int ns_storage_path_check(NvmeCtrl *n, Error **errp) +{ + return storage_path_check(n->params.ns_directory, n->params.serial, e= rrp); +} + +/* caller will take ownership */ +char *ns_create_image_name(NvmeCtrl *n, uint32_t nsid, Error **errp) +{ + return create_image_name(n->params.ns_directory, n->params.serial, nsi= d, errp); +} + +static char *ns_create_cfg_name(NvmeCtrl *n, uint32_t nsid, Error **errp) +{ + return create_cfg_name(n->params.ns_directory, n->params.serial, nsid,= errp); +} + +int ns_auto_check(NvmeCtrl *n, NvmeNamespace *ns, uint32_t nsid) +{ + int ret =3D 0; + BlockBackend *blk =3D ns->blkconf.blk; + char *file_name_img =3D NULL; + + file_name_img =3D ns_create_image_name(n, nsid, NULL); + + if (!blk) { + } else if (!file_name_img || strcmp(blk_bs(blk)->filename, file_name_i= mg)) { + ret =3D -1; + } + + g_free(file_name_img); + + return ret; +} + +void ns_cfg_clear(NvmeNamespace *ns) +{ + ns->params.pi =3D 0; + ns->lbasz =3D 0; + ns->id_ns.nsze =3D 0; + ns->id_ns.ncap =3D 0; + ns->id_ns.nuse =3D 0; + ns->id_ns.nsfeat =3D 0; + ns->id_ns.flbas =3D 0; + ns->id_ns.nmic=3D 0; + ns->size =3D 0; +} + +int ns_cfg_save(NvmeCtrl *n, NvmeNamespace *ns, uint32_t nsid) +{ + QDict *ns_cfg =3D NULL; + Error *local_err =3D NULL; + + if (ns_auto_check(n, ns, nsid)) { + error_setg(&local_err, "ns-cfg not saved: ns[%"PRIu32"] configured= via '-device nvme-ns'", nsid); + error_report_err(local_err); + return 1; /* not an error */ + } + + ns_cfg =3D qdict_new(); + +#define NS_CFG_DEF(type, key, value, default) \ + qdict_put_##type(ns_cfg, key, value); +#include "hw/nvme/ns-cfg.h" +#undef NS_CFG_DEF + + return nsid_cfg_save(n->params.ns_directory, n->params.serial, ns_cfg,= nsid); +} + +int ns_cfg_load(NvmeCtrl *n, NvmeNamespace *ns, uint32_t nsid) +{ + QObject *ns_cfg_obj =3D NULL; + QDict *ns_cfg =3D NULL; + int ret =3D 0; + char *filename; + FILE *fp; + char buf[NS_CFG_MAXSIZE] =3D {}; + Error *local_err =3D NULL; + + if (ns_auto_check(n, ns, nsid)) { + error_setg(&local_err, "ns-cfg not loaded: ns[%"PRIu32"] configure= d via '-device nvme-ns'", nsid); + error_report_err(local_err); + return 1; /* not an error */ + } + + filename =3D ns_create_cfg_name(n, nsid, &local_err); + if (!local_err && !access(filename, F_OK)) { + fp =3D fopen(filename, "r"); + if (fp =3D=3D NULL) { + error_setg(&local_err, "open %s: %s", filename, + strerror(errno)); + } else { + if (!fread(buf, sizeof(buf), 1, fp)) { + ns_cfg_obj =3D qobject_from_json(buf, NULL); + if (!ns_cfg_obj) { + error_setg(&local_err, "Could not parse the JSON for n= s-cfg"); + } else { + ns_cfg =3D qobject_to(QDict, ns_cfg_obj); + qdict_flatten(ns_cfg); + + ns->params.nsid =3D (uint32_t)qdict_get_int_chkd(ns_cf= g, "params.nsid", &local_err); /* (uint32_t) */ + if (!local_err) { + ns->params.detached =3D qdict_get_bool_chkd(ns_cfg= ,"params.detached", &local_err); /* (bool) */ + } + if (!local_err) { + ns->params.pi =3D (uint8_t)qdict_get_int_chkd(ns_c= fg, "params.pi", &local_err); /* (uint8_t) */ + } + if (!local_err) { + ns->lbasz =3D (size_t)qdict_get_int_chkd(ns_cfg, "= lbasz", &local_err); /* (size_t) */ + } + if (!local_err) { + ns->id_ns.nsze =3D cpu_to_le64(qdict_get_int_chkd(= ns_cfg, "id_ns.nsze", &local_err)); /* (uint64_t) */ + } + if (!local_err) { + ns->id_ns.ncap =3D cpu_to_le64(qdict_get_int_chkd(= ns_cfg, "id_ns.ncap", &local_err)); /* (uint64_t) */ + } + if (!local_err) { + ns->id_ns.nuse =3D cpu_to_le64(qdict_get_int_chkd(= ns_cfg, "id_ns.nuse", &local_err)); /* (uint64_t) */ + } + if (!local_err) { + ns->id_ns.nsfeat =3D (uint8_t)qdict_get_int_chkd(n= s_cfg, "id_ns.nsfeat", &local_err); /* (uint8_t) */ + } + if (!local_err) { + ns->id_ns.flbas =3D (uint8_t)qdict_get_int_chkd(ns= _cfg, "id_ns.flbas", &local_err); /* (uint8_t) */ + } + if (!local_err) { + ns->id_ns.nmic =3D (uint8_t)qdict_get_int_chkd(ns_= cfg, "id_ns.nmic", &local_err); /* (uint8_t) */ + } + if (!local_err) { + /* ns->size below will be overwritten after nvme_n= s_backend_sanity_chk() */ + ns->size =3D qdict_get_int_chkd(ns_cfg, "ns_size",= &local_err); /* (uint64_t) */ + } + + qobject_unref(ns_cfg_obj); + + /* it is expected that ns-cfg file is consistent with = paired ns-img file + * here a simple check preventing against a crash */ + nvme_validate_flbas(ns->id_ns.flbas, &local_err); + } + } else { + error_setg(&local_err, "Could not read ns-cfg"); + } + fclose(fp); + } + } + else if (!local_err){ + error_setg(&local_err, "Missing ns-cfg file"); + } + + if (local_err) { + error_report_err(local_err); + ret =3D -1; + } + + g_free(filename); + return ret; +} diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c index 62a1f97be0..2aa7b01c3d 100644 --- a/hw/nvme/ns.c +++ b/hw/nvme/ns.c @@ -3,9 +3,11 @@ * * Copyright (c) 2019 CNEX Labs * Copyright (c) 2020 Samsung Electronics + * Copyright (c) 2022 Solidigm * * Authors: * Klaus Jensen + * Michael Kropaczek * * This work is licensed under the terms of the GNU GPL, version 2. See the * COPYING file in the top-level directory. @@ -55,6 +57,26 @@ void nvme_ns_init_format(NvmeNamespace *ns) id_ns->npda =3D id_ns->npdg =3D npdg - 1; } =20 +#define NVME_LBAF_DFLT_CNT 8 +#define NVME_LBAF_DFLT_SIZE 16 +static unsigned int ns_get_default_lbafs(void *lbafp) +{ + static const NvmeLBAF lbaf[NVME_LBAF_DFLT_SIZE] =3D { + [0] =3D { .ds =3D 9 }, + [1] =3D { .ds =3D 9, .ms =3D 8 }, + [2] =3D { .ds =3D 9, .ms =3D 16 }, + [3] =3D { .ds =3D 9, .ms =3D 64 }, + [4] =3D { .ds =3D 12 }, + [5] =3D { .ds =3D 12, .ms =3D 8 }, + [6] =3D { .ds =3D 12, .ms =3D 16 }, + [7] =3D { .ds =3D 12, .ms =3D 64 }, + }; + + memcpy(lbafp, &lbaf[0], sizeof(lbaf)); + + return NVME_LBAF_DFLT_CNT; +} + static int nvme_ns_init(NvmeNamespace *ns, Error **errp) { static uint64_t ns_count; @@ -64,6 +86,11 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp) uint16_t ms; int i; =20 + ms =3D ns->params.ms; + if (ms && NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)) { + return -1; + } + ns->csi =3D NVME_CSI_NVM; ns->status =3D 0x0; =20 @@ -89,7 +116,6 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **errp) id_ns->eui64 =3D cpu_to_be64(ns->params.eui64); =20 ds =3D 31 - clz32(ns->blkconf.logical_block_size); - ms =3D ns->params.ms; =20 id_ns->mc =3D NVME_ID_NS_MC_EXTENDED | NVME_ID_NS_MC_SEPARATE; =20 @@ -105,39 +131,25 @@ static int nvme_ns_init(NvmeNamespace *ns, Error **er= rp) =20 ns->pif =3D ns->params.pif; =20 - static const NvmeLBAF lbaf[16] =3D { - [0] =3D { .ds =3D 9 }, - [1] =3D { .ds =3D 9, .ms =3D 8 }, - [2] =3D { .ds =3D 9, .ms =3D 16 }, - [3] =3D { .ds =3D 9, .ms =3D 64 }, - [4] =3D { .ds =3D 12 }, - [5] =3D { .ds =3D 12, .ms =3D 8 }, - [6] =3D { .ds =3D 12, .ms =3D 16 }, - [7] =3D { .ds =3D 12, .ms =3D 64 }, - }; + ns->nlbaf =3D ns_get_default_lbafs(&id_ns->lbaf); =20 - ns->nlbaf =3D 8; - - memcpy(&id_ns->lbaf, &lbaf, sizeof(lbaf)); - - for (i =3D 0; i < ns->nlbaf; i++) { - NvmeLBAF *lbaf =3D &id_ns->lbaf[i]; - if (lbaf->ds =3D=3D ds) { - if (lbaf->ms =3D=3D ms) { - id_ns->flbas |=3D i; - goto lbaf_found; + if (ms) { /* ms from params */ + for (i =3D 0; i < ns->nlbaf; i++) { + NvmeLBAF *lbaf =3D &id_ns->lbaf[i]; + if (lbaf->ds =3D=3D ds && lbaf->ms =3D=3D ms) { + id_ns->flbas |=3D i; + goto lbaf_found; } } + /* add non-standard lba format */ + id_ns->lbaf[ns->nlbaf].ds =3D ds; + id_ns->lbaf[ns->nlbaf].ms =3D ms; + ns->nlbaf++; + id_ns->flbas |=3D i; + } else { + i =3D NVME_ID_NS_FLBAS_INDEX(id_ns->flbas); } =20 - /* add non-standard lba format */ - id_ns->lbaf[ns->nlbaf].ds =3D ds; - id_ns->lbaf[ns->nlbaf].ms =3D ms; - ns->nlbaf++; - - id_ns->flbas |=3D i; - - lbaf_found: id_ns_nvm->elbaf[i] =3D (ns->pif & 0x3) << 7; id_ns->nlbaf =3D ns->nlbaf - 1; @@ -482,6 +494,112 @@ static int nvme_ns_check_constraints(NvmeNamespace *n= s, Error **errp) return 0; } =20 +static void nvme_ns_backend_sanity_chk(NvmeNamespace *ns, BlockBackend *b= lk, Error **errp) +{ + uint64_t ns_size_img =3D ns->size; + uint64_t ns_size_cfg =3D blk_getlength(blk); + + if (ns_size_cfg !=3D ns_size_img) { + error_setg(errp, "ns-backend sanity check for nsid [%"PRIu32"] fai= led", ns->params.nsid); + } +} + +void nvme_validate_flbas(uint8_t flbas, Error **errp) +{ + uint8_t nlbaf; + NvmeLBAF lbaf[NVME_LBAF_DFLT_SIZE]; + + nlbaf =3D ns_get_default_lbafs(&lbaf[0]); + flbas =3D NVME_ID_NS_FLBAS_INDEX(flbas); + if (flbas >=3D nlbaf) { + error_setg(errp, "FLBA size index is out of range, max supported [= %"PRIu8"]", nlbaf - 1); + } +} + +NvmeNamespace * nvme_ns_create(NvmeCtrl *n, uint32_t nsid, NvmeIdNsMgmt *i= d_ns, Error **errp) +{ + NvmeNamespace *ns =3D NULL; + DeviceState *dev =3D NULL; + uint64_t nsze =3D le64_to_cpu(id_ns->nsze); + uint64_t ncap =3D le64_to_cpu(id_ns->ncap); + uint8_t flbas =3D id_ns->flbas; + uint8_t dps =3D id_ns->dps; + uint8_t nmic =3D id_ns->nmic; + uint32_t anagrpid =3D le32_to_cpu(id_ns->anagrpid); + uint16_t endgid =3D le16_to_cpu(id_ns->endgid); + NvmeLBAF lbaf[NVME_LBAF_DFLT_SIZE]; + size_t lbasz; + uint64_t image_size; + Error *local_err =3D NULL; + BlockBackend *blk =3D NULL; + + /* currently not managed */ + (void)anagrpid; + (void)endgid; + + trace_pci_nvme_ns_create(nsid, nsze, ncap, flbas); + + flbas =3D NVME_ID_NS_FLBAS_INDEX(flbas); + + ns_get_default_lbafs(&lbaf[0]); + lbasz =3D 1 << lbaf[flbas].ds; + image_size =3D (lbasz + lbaf[flbas].ms) * nsze; + + dev =3D qdev_try_new(TYPE_NVME_NS); + if (!dev) { + error_setg(&local_err, "Unable to allocate ns QOM (dev)"); + } + + if (!local_err) { + ns =3D NVME_NS(dev); + if (ns) { + ns->params.nsid =3D nsid; + ns->params.detached =3D true; + ns->params.pi =3D dps; + ns->id_ns.nsfeat =3D 0x0; /* reporting no support for THIN= P */ + ns->lbasz =3D lbasz; + ns->id_ns.flbas =3D id_ns->flbas; + ns->id_ns.nsze =3D cpu_to_le64(nsze); + ns->id_ns.ncap =3D cpu_to_le64(ncap); + ns->id_ns.nuse =3D cpu_to_le64(ncap); /* at this time no u= sage recording */ + ns->id_ns.nmic =3D nmic; + + blk =3D n->preloaded_blk[nsid]; + if (blk) { + ns_blockdev_activate(blk, image_size, &local_err); + if (!local_err) { + ns->blkconf.blk =3D blk; + qdev_realize_and_unref(dev, &n->bus.parent_bus, &local= _err); /* causes by extension + = * a call to + = * nvme_ns_realize() */ + n->preloaded_blk[nsid] =3D NULL; + } + } else { + error_setg(&local_err, "Unable to find preloaded back-end = reference"); + } + dev =3D NULL; + + if (!local_err && ns_cfg_save(n, ns, nsid)) { /*= save ns cfg */ + error_setg(&local_err, "Unable to save ns-cnf"); + } + } + } + + if (local_err) { + if (dev) { + if (blk) { + n->preloaded_blk[nsid] =3D blk; + blk =3D NULL; + } + object_unref(OBJECT(dev)); + } + error_propagate(errp, local_err); + ns =3D NULL; + } + + return ns; +} + int nvme_ns_setup(NvmeNamespace *ns, Error **errp) { if (nvme_ns_check_constraints(ns, errp)) { @@ -505,6 +623,64 @@ int nvme_ns_setup(NvmeNamespace *ns, Error **errp) return 0; } =20 +int nvme_ns_backend_setup(NvmeCtrl *n, Error **errp) +{ + DeviceState *dev =3D NULL; + BlockBackend *blk; + NvmeNamespace *ns; + uint16_t i; + int ret =3D 0; + char *exact_filename; + Error *local_err =3D NULL; + + for (i =3D 1; i <=3D NVME_MAX_NAMESPACES && !local_err; i++ ) { + blk =3D NULL; + exact_filename =3D ns_create_image_name(n, i, &local_err); + if (access(exact_filename, F_OK)) { /* skip if not found */ + g_free(exact_filename); + continue; + } + + n->nsidMax =3D i; + + dev =3D qdev_try_new(TYPE_NVME_NS); + if (dev) { + blk =3D ns_blockdev_init(exact_filename, &local_err); + } else { + error_setg(&local_err, "Unable to create a new device entry"); + } + + g_free(exact_filename); + + if (blk && !local_err) { + ns =3D NVME_NS(dev); + if (ns) { + if (ns_cfg_load(n, ns, i) =3D=3D -1) { /* load ns cfg = */ + error_setg(&local_err, "Unable to load ns-cfg for ns [= %"PRIu16"]", i); + } else if (blk_getlength(blk)) { + nvme_ns_backend_sanity_chk(ns, blk, &local_err); + if (!local_err) { + ns->blkconf.blk =3D blk; + qdev_realize_and_unref(dev, &n->bus.parent_bus, &l= ocal_err); /* causes by extension + = * a call to + = * nvme_ns_realize() */ + } + n->preloaded_blk[i] =3D NULL; + } else { + n->preloaded_blk[i] =3D blk; + } + } + } + } + + if (local_err) { + error_propagate(errp, local_err); + ret =3D -1; + } + + return ret; +} + void nvme_ns_drain(NvmeNamespace *ns) { blk_drain(ns->blkconf.blk); diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 79f5c281c2..c6194773e6 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -23,9 +23,8 @@ #include "hw/block/block.h" =20 #include "block/nvme.h" +#include "hw/nvme/ctrl-cfg.h" =20 -#define NVME_MAX_CONTROLLERS 256 -#define NVME_MAX_NAMESPACES 256 #define NVME_EUI64_DEFAULT ((uint64_t)0x5254000000000000) =20 QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > NVME_NSID_BROADCAST - 1); @@ -279,6 +278,8 @@ int nvme_ns_setup(NvmeNamespace *ns, Error **errp); void nvme_ns_drain(NvmeNamespace *ns); void nvme_ns_shutdown(NvmeNamespace *ns); void nvme_ns_cleanup(NvmeNamespace *ns); +void nvme_validate_flbas(uint8_t flbas, Error **errp); +NvmeNamespace * nvme_ns_create(NvmeCtrl *n, uint32_t nsid, NvmeIdNsMgmt *i= d_ns, Error **errp); =20 typedef struct NvmeAsyncEvent { QTAILQ_ENTRY(NvmeAsyncEvent) entry; @@ -339,6 +340,7 @@ static inline const char *nvme_adm_opc_str(uint8_t opc) case NVME_ADM_CMD_SET_FEATURES: return "NVME_ADM_CMD_SET_FEATURES"; case NVME_ADM_CMD_GET_FEATURES: return "NVME_ADM_CMD_GET_FEATURES"; case NVME_ADM_CMD_ASYNC_EV_REQ: return "NVME_ADM_CMD_ASYNC_EV_REQ"; + case NVME_ADM_CMD_NS_MGMT: return "NVME_ADM_CMD_NS_MGMT"; case NVME_ADM_CMD_NS_ATTACHMENT: return "NVME_ADM_CMD_NS_ATTACHMENT= "; case NVME_ADM_CMD_VIRT_MNGMT: return "NVME_ADM_CMD_VIRT_MNGMT"; case NVME_ADM_CMD_DBBUF_CONFIG: return "NVME_ADM_CMD_DBBUF_CONFIG"; @@ -427,6 +429,7 @@ typedef struct NvmeParams { uint16_t sriov_vi_flexible; uint8_t sriov_max_vq_per_vf; uint8_t sriov_max_vi_per_vf; + char *ns_directory; /* if empty (default) one legacy ns will b= e created */ } NvmeParams; =20 typedef struct NvmeCtrl { @@ -485,8 +488,9 @@ typedef struct NvmeCtrl { =20 NvmeSubsystem *subsys; =20 - NvmeNamespace namespace; + NvmeNamespace namespace; /* if ns_directory is empt= y this will be used */ NvmeNamespace *namespaces[NVME_MAX_NAMESPACES + 1]; + BlockBackend *preloaded_blk[NVME_MAX_NAMESPACES + 1]; NvmeSQueue **sq; NvmeCQueue **cq; NvmeSQueue admin_sq; @@ -509,6 +513,7 @@ typedef struct NvmeCtrl { uint16_t vqrfap; uint16_t virfap; } next_pri_ctrl_cap; /* These override pri_ctrl_cap after reset */ + uint16_t nsidMax; } NvmeCtrl; =20 typedef enum NvmeResetType { @@ -575,6 +580,9 @@ static inline NvmeSecCtrlEntry *nvme_sctrl_for_cntlid(N= vmeCtrl *n, return NULL; } =20 +BlockBackend *ns_blockdev_init(const char *file, Error **errp); +void ns_blockdev_activate(BlockBackend *blk, uint64_t image_size, Error *= *errp); +int nvme_ns_backend_setup(NvmeCtrl *n, Error **errp); void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns); uint16_t nvme_bounce_data(NvmeCtrl *n, void *ptr, uint32_t len, NvmeTxDirection dir, NvmeRequest *req); @@ -583,5 +591,22 @@ uint16_t nvme_bounce_mdata(NvmeCtrl *n, void *ptr, uin= t32_t len, void nvme_rw_complete_cb(void *opaque, int ret); uint16_t nvme_map_dptr(NvmeCtrl *n, NvmeSg *sg, size_t len, NvmeCmd *cmd); +char *ns_create_image_name(NvmeCtrl *n, uint32_t nsid, Error **errp); +int ns_storage_path_check(NvmeCtrl *n, Error **errp); +int ns_auto_check(NvmeCtrl *n, NvmeNamespace *ns, uint32_t nsid); +int ns_cfg_save(NvmeCtrl *n, NvmeNamespace *ns, uint32_t nsid); +int ns_cfg_load(NvmeCtrl *n, NvmeNamespace *ns, uint32_t nsid); +int64_t qdict_get_int_chkd(const QDict *qdict, const char *key, Error **er= rp); +bool qdict_get_bool_chkd(const QDict *qdict, const char *key, Error **errp= ); +void ns_cfg_clear(NvmeNamespace *ns); +int nvme_cfg_save(NvmeCtrl *n); +int nvme_cfg_load(NvmeCtrl *n); + +typedef enum NvmeNsAllocAction { + NVME_NS_ALLOC_CHK, + NVME_NS_ALLOC, + NVME_NS_DEALLOC, +} NvmeNsAllocAction; +int nvme_cfg_update(NvmeCtrl *n, uint64_t ammount, NvmeNsAllocAction actio= n); =20 #endif /* HW_NVME_NVME_H */ diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events index fccb79f489..28b025ac42 100644 --- a/hw/nvme/trace-events +++ b/hw/nvme/trace-events @@ -77,6 +77,8 @@ pci_nvme_aer(uint16_t cid) "cid %"PRIu16"" pci_nvme_aer_aerl_exceeded(void) "aerl exceeded" pci_nvme_aer_masked(uint8_t type, uint8_t mask) "type 0x%"PRIx8" mask 0x%"= PRIx8"" pci_nvme_aer_post_cqe(uint8_t typ, uint8_t info, uint8_t log_page) "type 0= x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8"" +pci_nvme_ns_mgmt(uint16_t cid, uint32_t nsid, uint8_t sel, uint8_t csi, ui= nt8_t psdt) "cid %"PRIu16", nsid=3D%"PRIu32", sel=3D0x%"PRIx8", csi=3D0x%"P= RIx8", psdt=3D0x%"PRIx8"" +pci_nvme_ns_create(uint16_t nsid, uint64_t nsze, uint64_t ncap, uint8_t fl= bas) "nsid %"PRIu16", nsze=3D%"PRIu64", ncap=3D%"PRIu64", flbas=3D%"PRIu8"" pci_nvme_ns_attachment(uint16_t cid, uint8_t sel) "cid %"PRIu16", sel=3D0x= %"PRIx8"" pci_nvme_ns_attachment_attach(uint16_t cntlid, uint32_t nsid) "cntlid=3D0x= %"PRIx16", nsid=3D0x%"PRIx32"" pci_nvme_enqueue_event(uint8_t typ, uint8_t info, uint8_t log_page) "type = 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8"" diff --git a/include/block/nvme.h b/include/block/nvme.h index 8027b7126b..9d2e121f1a 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -592,6 +592,7 @@ enum NvmeAdminCommands { NVME_ADM_CMD_SET_FEATURES =3D 0x09, NVME_ADM_CMD_GET_FEATURES =3D 0x0a, NVME_ADM_CMD_ASYNC_EV_REQ =3D 0x0c, + NVME_ADM_CMD_NS_MGMT =3D 0x0d, NVME_ADM_CMD_ACTIVATE_FW =3D 0x10, NVME_ADM_CMD_DOWNLOAD_FW =3D 0x11, NVME_ADM_CMD_NS_ATTACHMENT =3D 0x15, @@ -897,14 +898,18 @@ enum NvmeStatusCodes { NVME_FEAT_NOT_CHANGEABLE =3D 0x010e, NVME_FEAT_NOT_NS_SPEC =3D 0x010f, NVME_FW_REQ_SUSYSTEM_RESET =3D 0x0110, + NVME_NS_INSUFFICIENT_CAPAC =3D 0x0115, + NVME_NS_IDNTIFIER_UNAVAIL =3D 0x0116, NVME_NS_ALREADY_ATTACHED =3D 0x0118, NVME_NS_PRIVATE =3D 0x0119, NVME_NS_NOT_ATTACHED =3D 0x011a, + NVME_THIN_PROVISION_NOTSPRD =3D 0x011b, NVME_NS_CTRL_LIST_INVALID =3D 0x011c, NVME_INVALID_CTRL_ID =3D 0x011f, NVME_INVALID_SEC_CTRL_STATE =3D 0x0120, NVME_INVALID_NUM_RESOURCES =3D 0x0121, NVME_INVALID_RESOURCE_ID =3D 0x0122, + NVME_NS_ATTACH_MGMT_NOTSPRD =3D 0x0129, NVME_CONFLICTING_ATTRS =3D 0x0180, NVME_INVALID_PROT_INFO =3D 0x0181, NVME_WRITE_TO_RO =3D 0x0182, @@ -1184,6 +1189,10 @@ enum NvmeIdCtrlCmic { NVME_CMIC_MULTI_CTRL =3D 1 << 1, }; =20 +enum NvmeNsManagementOperation { + NVME_NS_MANAGEMENT_CREATE =3D 0x0, +}; + enum NvmeNsAttachmentOperation { NVME_NS_ATTACHMENT_ATTACH =3D 0x0, NVME_NS_ATTACHMENT_DETACH =3D 0x1, @@ -1345,6 +1354,26 @@ typedef struct QEMU_PACKED NvmeIdNs { uint8_t vs[3712]; } NvmeIdNs; =20 +typedef struct QEMU_PACKED NvmeIdNsMgmt { + uint64_t nsze; + uint64_t ncap; + uint8_t rsvd16[10]; + uint8_t flbas; + uint8_t rsvd27[2]; + uint8_t dps; + uint8_t nmic; + uint8_t rsvd31[61]; + uint32_t anagrpid; + uint8_t rsvd96[4]; + uint16_t nvmsetid; + uint16_t endgid; + uint8_t rsvd104[280]; + uint64_t lbstm; + uint8_t rsvd392[120]; + uint8_t rsvd512[512]; + uint8_t vs[3072]; +} NvmeIdNsMgmt; + #define NVME_ID_NS_NVM_ELBAF_PIF(elbaf) (((elbaf) >> 7) & 0x3) =20 typedef struct QEMU_PACKED NvmeIdNsNvm { @@ -1646,6 +1675,7 @@ static inline void _nvme_check_size(void) QEMU_BUILD_BUG_ON(sizeof(NvmeLBAF) !=3D 4); QEMU_BUILD_BUG_ON(sizeof(NvmeLBAFE) !=3D 16); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNs) !=3D 4096); + QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsMgmt) !=3D 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsNvm) !=3D 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsZoned) !=3D 4096); QEMU_BUILD_BUG_ON(sizeof(NvmeSglDescriptor) !=3D 16); diff --git a/include/hw/nvme/ctrl-cfg.h b/include/hw/nvme/ctrl-cfg.h new file mode 100644 index 0000000000..1be44cb8df --- /dev/null +++ b/include/hw/nvme/ctrl-cfg.h @@ -0,0 +1,24 @@ +/* + * QEMU NVM Express Virtual Dynamic Namespace Management + * Common configuration handling for qemu-img tool and and qemu-system-xx + * + * + * Copyright (c) 2022 Solidigm + * + * Authors: + * Michael Kropaczek + * + * This work is licensed under the terms of the GNU GPL, version 2. See the + * COPYING file in the top-level directory. + * + */ + +#ifndef CTRL_CFG_DEF +#define NVME_STR_(s) #s +#define NVME_STRINGIFY(s) NVME_STR_(s) +#define NVME_MAX_NAMESPACES 256 +#define NVME_MAX_CONTROLLERS 256 +#else +CTRL_CFG_DEF(int, "tnvmcap", int128_get64(tnvmcap128), tnvmcap64) +CTRL_CFG_DEF(int, "unvmcap", int128_get64(unvmcap128), unvmcap64) +#endif diff --git a/include/hw/nvme/ns-cfg.h b/include/hw/nvme/ns-cfg.h new file mode 100644 index 0000000000..782a843fb5 --- /dev/null +++ b/include/hw/nvme/ns-cfg.h @@ -0,0 +1,28 @@ +/* + * QEMU NVM Express Virtual Dynamic Namespace Management + * Common configuration handling for qemu-img tool and qemu-system-xx + * + * + * Copyright (c) 2022 Solidigm + * + * Authors: + * Michael Kropaczek + * + * This work is licensed under the terms of the GNU GPL, version 2. See the + * COPYING file in the top-level directory. + * + */ + +#ifdef NS_CFG_DEF +NS_CFG_DEF(int, "params.nsid", (int64_t)ns->params.nsid, nsid) +NS_CFG_DEF(bool, "params.detached", ns->params.detached, true) +NS_CFG_DEF(int, "params.pi", (int64_t)ns->params.pi, 0) +NS_CFG_DEF(int, "lbasz", (int64_t)ns->lbasz, 0) +NS_CFG_DEF(int, "id_ns.nsze", le64_to_cpu(ns->id_ns.nsze), 0) +NS_CFG_DEF(int, "id_ns.ncap", le64_to_cpu(ns->id_ns.ncap), 0) +NS_CFG_DEF(int, "id_ns.nuse", le64_to_cpu(ns->id_ns.nuse), 0) +NS_CFG_DEF(int, "id_ns.nsfeat", (int64_t)ns->id_ns.nsfeat, 0) +NS_CFG_DEF(int, "id_ns.flbas", (int64_t)ns->id_ns.flbas, 0) +NS_CFG_DEF(int, "id_ns.nmic", (int64_t)ns->id_ns.nmic, 0) +NS_CFG_DEF(int, "ns_size", ns->size, 0) +#endif diff --git a/include/hw/nvme/nvme-cfg.h b/include/hw/nvme/nvme-cfg.h new file mode 100644 index 0000000000..6b1faf5945 --- /dev/null +++ b/include/hw/nvme/nvme-cfg.h @@ -0,0 +1,201 @@ +/* + * QEMU NVM Express Virtual Dynamic Namespace Management + * Common configuration handling for qemu-img tool and qemu-system-xx + * + * + * Copyright (c) 2022 Solidigm + * + * Authors: + * Michael Kropaczek + * + * This work is licensed under the terms of the GNU GPL, version 2. See the + * COPYING file in the top-level directory. + * + */ + +#include "hw/nvme/ctrl-cfg.h" + +#define NS_CFG_MAXSIZE 1024 +#define NS_FILE_FMT "%s/nvme_%s_ns_%03d" +#define NS_IMG_EXT ".img" +#define NS_CFG_EXT ".cfg" + +#define NVME_FILE_FMT "%s/nvme_%s_ctrl" +#define NVME_CFG_EXT ".cfg" + +#define NVME_CFG_MAXSIZE 512 +static inline int storage_path_check(char *ns_directory, char *serial, Err= or **errp) +{ + int ret =3D 0; + Error *local_err =3D NULL; + + ret =3D access(ns_directory, F_OK); + if (ret < 0) { + error_setg(&local_err, + "Path '%s' to nvme controller's storage area with= serial no: '%s' must exist", + ns_directory, serial); + } + + if (local_err) { + error_propagate(errp, local_err); + ret =3D -1; + } + + return ret; +} + + +static inline char *c_create_cfg_name(char *ns_directory, char *serial, Er= ror **errp) +{ + char *file_name =3D NULL; + Error *local_err =3D NULL; + + storage_path_check(ns_directory, serial, &local_err); + if (local_err) { + error_propagate(errp, local_err); + } else { + file_name =3D g_strdup_printf(NVME_FILE_FMT NVME_CFG_EXT, + ns_directory, serial); + } + + return file_name; +} + +static inline char *create_fmt_name(const char *fmt, char *ns_directory, c= har *serial, uint32_t nsid, Error **errp) +{ + char *file_name =3D NULL; + Error *local_err =3D NULL; + + storage_path_check(ns_directory, serial, errp); + if (local_err) { + error_propagate(errp, local_err); + } else { + file_name =3D g_strdup_printf(fmt, ns_directory, serial, nsid); + } + + return file_name; +} + +static inline char *create_cfg_name(char *ns_directory, char *serial, uint= 32_t nsid, Error **errp) +{ + return create_fmt_name(NS_FILE_FMT NS_CFG_EXT, ns_directory, serial, n= sid, errp); +} + + +static inline char *create_image_name(char *ns_directory, char *serial, ui= nt32_t nsid, Error **errp) +{ + return create_fmt_name(NS_FILE_FMT NS_IMG_EXT, ns_directory, serial, n= sid, errp); +} + +static inline int nsid_cfg_save(char *ns_directory, char *serial, QDict *n= s_cfg, uint32_t nsid) +{ + GString *json =3D NULL; + char *filename; + FILE *fp; + int ret =3D 0; + Error *local_err =3D NULL; + + json =3D qobject_to_json_pretty(QOBJECT(ns_cfg), false); + + if (strlen(json->str) + 2 /* '\n'+'\0' */ > NS_CFG_MAXSIZE) { + error_setg(&local_err, "ns-cfg allowed max size %d exceeded", NS_C= FG_MAXSIZE); + } + + filename =3D create_cfg_name(ns_directory, serial, nsid, &local_err); + if (!local_err) { + fp =3D fopen(filename, "w"); + if (fp =3D=3D NULL) { + error_setg(&local_err, "open %s: %s", filename, + strerror(errno)); + } else { + chmod(filename, 0644); + if (!fprintf(fp, "%s\n", json->str)) { + error_setg(&local_err, "could not write ns-cfg %s: %s", fi= lename, + strerror(errno)); + } + fclose(fp); + } + } + + if (local_err) { + error_report_err(local_err); + ret =3D -1; + } + + g_string_free(json, true); + g_free(filename); + qobject_unref(ns_cfg); + + return ret; +} + +static inline int ns_cfg_default_save(char *ns_directory, char *serial, ui= nt32_t nsid) +{ + QDict *ns_cfg =3D NULL; + + ns_cfg =3D qdict_new(); + +#define NS_CFG_DEF(type, key, value, default) \ + qdict_put_##type(ns_cfg, key, default); +#include "hw/nvme/ns-cfg.h" +#undef NS_CFG_DEF + + return nsid_cfg_save(ns_directory, serial, ns_cfg, nsid); +} + +static inline int c_cfg_save(char *ns_directory, char *serial, QDict *nvme= _cfg) +{ + GString *json =3D NULL; + char *filename; + FILE *fp; + int ret =3D 0; + Error *local_err =3D NULL; + + json =3D qobject_to_json_pretty(QOBJECT(nvme_cfg), false); + + if (strlen(json->str) + 2 /* '\n'+'\0' */ > NVME_CFG_MAXSIZE) { + error_setg(&local_err, "ctrl-cfg allowed max size %d exceeded", + NVME_CFG_MAXSIZE); + } + + filename =3D c_create_cfg_name(ns_directory, serial, &local_err); + if (!local_err) { + fp =3D fopen(filename, "w"); + if (fp =3D=3D NULL) { + error_setg(&local_err, "open %s: %s", filename, + strerror(errno)); + } else { + chmod(filename, 0644); + if (!fprintf(fp, "%s\n", json->str)) { + error_setg(&local_err, "could not write ctrl-cfg %s: %s", + filename, strerror(errno)); + } + fclose(fp); + } + } + + if (local_err) { + error_report_err(local_err); + ret =3D -1; + } + + g_string_free(json, true); + g_free(filename); + qobject_unref(nvme_cfg); + + return ret; +} + +static inline int c_cfg_default_save(char *ns_directory, char *serial, uin= t64_t tnvmcap64, uint64_t unvmcap64) +{ + QDict *nvme_cfg =3D NULL; + + nvme_cfg =3D qdict_new(); + +#define CTRL_CFG_DEF(type, key, value, default) \ + qdict_put_##type(nvme_cfg, key, default); +#include "hw/nvme/ctrl-cfg.h" +#undef CTRL_CFG_DEF + + return c_cfg_save(ns_directory, serial, nvme_cfg); +} diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx index 1b1dab5b17..9aacb88fc9 100644 --- a/qemu-img-cmds.hx +++ b/qemu-img-cmds.hx @@ -57,6 +57,12 @@ SRST .. option:: create [--object OBJECTDEF] [-q] [-f FMT] [-b BACKING_FILE [-F= BACKING_FMT]] [-u] [-o OPTIONS] FILENAME [SIZE] ERST =20 +DEF("createns", nsimgs_create, + "createns -S nvme_ctrl_serial_number -C nvme_ctrl_total_capacity [-N <= NsId_max>] pathname") +SRST +.. option:: createns -S SERIAL_NUMBER -C TOTAL_CAPACITY [-N NSID_MAX] PATH= NAME +ERST + DEF("dd", img_dd, "dd [--image-opts] [-U] [-f fmt] [-O output_fmt] [bs=3Dblock_size] [co= unt=3Dblocks] [skip=3Dblocks] if=3Dinput of=3Doutput") SRST diff --git a/qemu-img.c b/qemu-img.c index ace3adf8ae..6d8072ade2 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -49,10 +49,12 @@ #include "block/block_int.h" #include "block/blockjob.h" #include "block/qapi.h" +#include "block/qdict.h" #include "crypto/init.h" #include "trace/control.h" #include "qemu/throttle.h" #include "block/throttle-groups.h" +#include "hw/nvme/nvme-cfg.h" =20 #define QEMU_IMG_VERSION "qemu-img version " QEMU_FULL_VERSION \ "\n" QEMU_COPYRIGHT "\n" @@ -219,6 +221,14 @@ void help(void) " '-F' second image format\n" " '-s' run in Strict mode - fail on different image size or se= ctor allocation\n" "\n" + "Parameters to createns subcommand:\n" + " 'pathname' points to the storage area for namespaces backend= images and must exist,\n" + " and must match the -device nvme 'auto-ns-path=3D...' of th= e qemu-system-xx command\n" + " '-S' indicates NVMe serial number, must match the -device nv= me 'serial=3D...' of the qemu-system-xx command\n" + " '-C' indicates NVMe total capacity\n" + " '-N' sets a limit on possible NVMe namespaces number associa= ted with NVMe controller,\n" + " the default maximal value is " NVME_STRINGIFY(NVME_MAX= _NAMESPACES) " and cannot be exceeded\n" + "\n" "Parameters to dd subcommand:\n" " 'bs=3DBYTES' read and write up to BYTES bytes at a time " "(default: 512)\n" @@ -603,6 +613,130 @@ fail: return 1; } =20 +static int nsimgs_create(int argc, char **argv) +{ + int c; + char *auto_ns_path =3D NULL; + char *serial =3D NULL; + char *nsidMax =3D NULL; + char *tnvmcap =3D NULL; + uint64_t tnvmcap64 =3D 0L; + unsigned int nsidMaxi =3D NVME_MAX_NAMESPACES; + char *filename =3D NULL; + uint32_t i; + Error *local_err =3D NULL; + + for(;;) { + static const struct option long_options[] =3D { + {"help", no_argument, 0, 'h'}, + {"serial", required_argument, 0, 'S'}, + {"tnvmcap", required_argument, 0, 'C'}, + {"nsidmax", required_argument, 0, 'N'}, + {0, 0, 0, 0} + }; + c =3D getopt_long(argc, argv, "S:C:N:", + long_options, NULL); + if (c =3D=3D -1) { + break; + } + switch(c) { + case ':': + missing_argument(argv[optind - 1]); + break; + case '?': + unrecognized_option(argv[optind - 1]); + break; + case 'h': + help(); + break; + case 'S': + serial =3D optarg; + break; + case 'N': + nsidMax =3D optarg; + break; + case 'C': + tnvmcap =3D optarg; + break; + } + } + + if (optind >=3D argc) { + error_exit("Expecting path name"); + } + + if (!serial || !tnvmcap) { + error_exit("Both -S and -C must be specified"); + } + + tnvmcap64 =3D cvtnum_full("tnvmcap", tnvmcap, 0, INT64_MAX); + + if (nsidMax && (qemu_strtoui(nsidMax, NULL, 0, &nsidMaxi) < 0 || + nsidMaxi > NVME_MAX_NAMESPACES)) { + error_exit("-N 'NsIdMax' must be numeric and cannot exceed %d", + NVME_MAX_NAMESPACES); + } + + auto_ns_path =3D (optind < argc) ? argv[optind] : NULL; + + /* create backend images and config flles for namespaces */ + for (i =3D 1; !local_err && i <=3D NVME_MAX_NAMESPACES; i++) { + filename =3D create_image_name(auto_ns_path, serial, i, &local_err= ); + if (local_err) { + break; + } + + /* calling bdrv_img_create() in both cases if i <=3D nsidMaxi and = othewise, + * it checks shared resize permission, if likely locked by qemu-sy= stem-xx + * it will abort */ + bdrv_img_create(filename, "raw", NULL, NULL, NULL, + 0, BDRV_O_RDWR | BDRV_O_RESIZE, + true, &local_err); + if (local_err) { + break; + } + + if (i <=3D nsidMaxi) { /* backend image file was created */ + if (ns_cfg_default_save(auto_ns_path, serial, i)) { /* create + * namespa= ce + * config = file */ + break; + } + } else if (!access(filename, F_OK)) { /* reducing the number of fi= les + * if i > nsidMaxi */ + unlink(filename); + g_free(filename); + filename =3D create_cfg_name(auto_ns_path, serial, i, &local_e= rr); + if (local_err) { + break; + } + unlink(filename); + } + g_free(filename); + filename =3D NULL; + } + + if (local_err && filename) { + error_reportf_err(local_err, "Could not create ns-image [%s] ", + filename); + g_free(filename); + goto fail; + } else if (c_cfg_default_save(auto_ns_path, serial, + tnvmcap64, tnvmcap64)) { /* create contr= oller + * config file = */ + error_reportf_err(local_err, "Could not create nvme-cfg "); + goto fail; + } else if (local_err) { + error_report_err(local_err); + goto fail; + } + + return 0; + +fail: + return 1; +} + static void dump_json_image_check(ImageCheck *check, bool quiet) { GString *str; --=20 2.37.3 From nobody Fri Nov 1 01:03:57 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=linux.dev ARC-Seal: i=1; a=rsa-sha256; t=1666903623; cv=none; d=zohomail.com; s=zohoarc; b=m/GOA5D70OK9Zruzd0qRQe52qVtb2UhiVU2OYYMJ8z0yOb5nK47+Yp83u+hGsnY8YcbIoAdB3bIn3Xr/2mJNb1uTMXNJ3I2osGBkdUhPHlLozyLhtufwz2dsLt/IUou4T7oWZ+06PGwobPLZ7LwoFDwTVYRlE5FNRWQWCaUDt+E= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1666903623; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=rziYTYinIhVEwE/s4LSd6GoYFKM6qWjGhNOnddqSKk4=; b=fV3iaHwtV8VzI2JqDJdD8Nc1qocC/t2Z7F589bcgnwFw44hmUBX3pqA5+Zt+Tmg2uboD9IzZKI9ODrFBkh6G1K4kXYZo4JDDqRGDJ/ILL5VOq3y0sNQfrzSDBj44sSEKnHdl+f8gs/ip/Jrhey8mqjhMYxhZQfG8ApJi4594EaY= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1666903623902749.2596390208242; Thu, 27 Oct 2022 13:47:03 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oo7D2-0002dH-47; Thu, 27 Oct 2022 14:02:28 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oo7Cy-0002Rr-Ul for qemu-devel@nongnu.org; Thu, 27 Oct 2022 14:02:24 -0400 Received: from resqmta-c1p-023465.sys.comcast.net ([2001:558:fd00:56::5]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oo7Cv-0005sX-NA for qemu-devel@nongnu.org; Thu, 27 Oct 2022 14:02:24 -0400 Received: from resomta-c1p-023266.sys.comcast.net ([96.102.18.234]) by resqmta-c1p-023465.sys.comcast.net with ESMTP id o3goos7Lcq2C4o7CvoHwG8; Thu, 27 Oct 2022 18:02:21 +0000 Received: from jderrick-mobl4.Home ([75.163.75.236]) by resomta-c1p-023266.sys.comcast.net with ESMTPA id o7BQo587p62udo7CEodDFQ; Thu, 27 Oct 2022 18:01:57 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20211018a; t=1666893741; bh=rziYTYinIhVEwE/s4LSd6GoYFKM6qWjGhNOnddqSKk4=; h=Received:Received:From:To:Subject:Date:Message-Id:MIME-Version: Xfinity-Spam-Result; b=kk5hSGVy7TMhfxdq53Qv7uUsk03Tet857Xh/talW4hkXymNpO/nJvQASsqgnObrD7 qxQvKCFav0wlrL6WlBY6CWlhybsypfiqkaH1VUwaXwz/TCFl3HfKcPz6IK3Wl8itTn goVrVC2cn1hS4gb6v2cyVWaPLHcU8KWZJ993fvFaMQiu014pAKUPPR6Qx8FQoqpb61 ddnNVE1fzF5XrA5lD1LEuWj+imAQCRhw525a3/iKmcYjsTN/+ZOlyTVck5q1zIaKG9 u0B8KC8ka0eRepLSkJ7oveA2HGzkf5GFPyV4DNdwysox7bbRnwH+AfP9JxM8par9vJ myTH36vHNS0qg== X-Xfinity-VAAS: gggruggvucftvghtrhhoucdtuddrgedvgedrtdeggdduudejucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuvehomhgtrghsthdqtfgvshhipdfqfgfvpdfpqffurfetoffkrfenuceurghilhhouhhtmecufedtudenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomheplfhonhgrthhhrghnucffvghrrhhitghkuceojhhonhgrthhhrghnrdguvghrrhhitghksehlihhnuhigrdguvghvqeenucggtffrrghtthgvrhhnpedtteeljeffgfffveehhfetveefuedvheevffffhedtjeeuvdevgfeftddtheeftdenucfkphepjeehrdduieefrdejhedrvdefieenucevlhhushhtvghrufhiiigvpedunecurfgrrhgrmhephhgvlhhopehjuggvrhhrihgtkhdqmhhosghlgedrjfhomhgvpdhinhgvthepjeehrdduieefrdejhedrvdefiedpmhgrihhlfhhrohhmpehjohhnrghthhgrnhdruggvrhhrihgtkheslhhinhhugidruggvvhdpnhgspghrtghpthhtohepkedprhgtphhtthhopehqvghmuhdquggvvhgvlhesnhhonhhgnhhurdhorhhgpdhrtghpthhtohepjhhonhgrthhhrghnrdguvghrrhhitghksehlihhnuhigrdguvghvpdhrtghpthhtohepmhhitghhrggvlhdrkhhrohhprggtiigvkhesshholhhiughighhmrdgtohhmpdhrtghpthhtohepqhgvmhhuqdgslhhotghksehnohhnghhnuhdrohhrghdprhgtphhtthhopehksghushgthheskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepihhtshesihhrrhgvlhgvvhgrnhhtrdgukhdprhgtphhtthhopehkfiholhhfsehrvgguhhgrthdrtghomhdprhgtphhtthhopehhrhgvihhtiiesrhgvughhrghtrdgtohhm X-Xfinity-VMeta: sc=-100.00;st=legit From: Jonathan Derrick To: qemu-devel@nongnu.org Cc: Jonathan Derrick , Michael Kropaczek , qemu-block@nongnu.org, Keith Busch , Klaus Jensen , Kevin Wolf , Hanna Reitz Subject: [PATCH v3 2/2] hw/nvme: Support for Namespaces Management from guest OS - delete-ns Date: Thu, 27 Oct 2022 13:00:46 -0500 Message-Id: <20221027180046.250-3-jonathan.derrick@linux.dev> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221027180046.250-1-jonathan.derrick@linux.dev> References: <20221027180046.250-1-jonathan.derrick@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: softfail client-ip=2001:558:fd00:56::5; envelope-from=jonathan.derrick@linux.dev; helo=resqmta-c1p-023465.sys.comcast.net X-Spam_score_int: -11 X-Spam_score: -1.2 X-Spam_bar: - X-Spam_report: (-1.2 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_HELO_PASS=-0.001, SPF_SOFTFAIL=0.665 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Qemu-devel" Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @comcastmailservice.net) X-ZM-MESSAGEID: 1666903625683100002 Content-Type: text/plain; charset="utf-8" From: Michael Kropaczek Added support for NVMEe NameSpaces Mangement allowing the guest OS to delete namespaces by issuing nvme delete-ns command. It is an extension to currently implemented Qemu nvme virtual device. Virtual devices representing namespaces will be created and/or deleted during Qemu's running session, at anytime. Signed-off-by: Michael Kropaczek --- docs/system/devices/nvme.rst | 9 ++-- hw/nvme/ctrl.c | 86 ++++++++++++++++++++++++++++++++++-- hw/nvme/ns-backend.c | 5 +++ hw/nvme/ns.c | 74 +++++++++++++++++++++++++++++++ hw/nvme/nvme.h | 2 + hw/nvme/trace-events | 1 + include/block/nvme.h | 1 + 7 files changed, 170 insertions(+), 8 deletions(-) diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst index 13e2fbc0d6..97b2453a00 100644 --- a/docs/system/devices/nvme.rst +++ b/docs/system/devices/nvme.rst @@ -103,12 +103,12 @@ Parameters: =20 ``auto-ns-path=3D`` If specified indicates a support for dynamic management of nvme namespac= es - by means of nvme create-ns command. This path points + by means of nvme create-ns and nvme delete-ns commands. This path points to the storage area for backend images must exist. Additionally it requi= res that parameter `ns-subsys` must be specified whereas parameter `drive` must not. The legacy namespace backend is disabled, instead, a pair of files 'nvme__ns_.cfg' and 'nvme__ns_.img' - will refer to respective namespaces. The create-ns, attach-ns + will refer to respective namespaces. The create-ns, delete-ns, attach-ns and detach-ns commands, issued at the guest side, will make changes to those files accordingly. For each namespace exists an image file in raw format and a config file @@ -140,8 +140,9 @@ Please note that ``nvme-ns`` device is not required to = support of dynamic namespaces management feature. It is not prohibited to assign a such devic= e to ``nvme`` device specified to support dynamic namespace management if one h= as an use case to do so, however, it will only coexist and be out of the scop= e of -Namespaces Management. NsIds will be consistently managed, creation (creat= e-ns) -of a namespace will not allocate the NsId already being taken. If ``nvme-n= s`` +Namespaces Management. Deletion (delete-ns) will render an error for this +namespace. NsIds will be consistently managed, creation (create-ns) of +a namespace will not allocate the NsId already being taken. If ``nvme-ns`` device conflicts with previously created one by create-ns (the same NsId), it will break QEMU's start up. =20 diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index d2b9d65eb9..87eb88486a 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -144,12 +144,12 @@ * * - `auto-ns-path` * If specified indicates a support for dynamic management of nvme names= paces - * by means of nvme create-ns command. This path pointing + * by means of nvme create-ns and nvme delete-ns commands. This path poi= nting * to a storage area for backend images must exist. Additionally it requ= ires * that parameter `ns-subsys` must be specified whereas parameter `drive` * must not. The legacy namespace backend is disabled, instead, a pair of * files 'nvme__ns_.cfg' and 'nvme__ns_.= img' - * will refer to respective namespaces. The create-ns, attach-ns + * will refer to respective namespaces. The create-ns, delete-ns, attach= -ns * and detach-ns commands, issued at the guest side, will make changes to * those files accordingly. * For each namespace exists an image file in raw format and a config fi= le @@ -5738,6 +5738,23 @@ static NvmeNamespace *nvme_ns_mgmt_create(NvmeCtrl *= n, uint32_t nsid, NvmeIdNsMg return ns; } =20 +static void nvme_ns_mgmt_delete(NvmeCtrl *n, uint32_t nsid, Error **errp) +{ + Error *local_err =3D NULL; + + if (!n->params.ns_directory) { + error_setg(&local_err, "delete-ns not supported if 'auto-ns-path' = is not specified"); + } else if (n->namespace.blkconf.blk) { + error_setg(&local_err, "delete-ns not supported if 'drive' is spec= ified"); + } else { + nvme_ns_delete(n, nsid, &local_err); + } + + if (local_err) { + error_propagate(errp, local_err); + } +} + static uint16_t nvme_ns_mgmt(NvmeCtrl *n, NvmeRequest *req) { NvmeIdCtrl *id =3D &n->id_ctrl; @@ -5750,6 +5767,7 @@ static uint16_t nvme_ns_mgmt(NvmeCtrl *n, NvmeRequest= *req) uint8_t sel =3D dw10 & 0xf; uint8_t csi =3D (dw11 >> 24) & 0xf; uint16_t i; + uint64_t image_size; uint16_t ret; Error *local_err =3D NULL; =20 @@ -5807,14 +5825,15 @@ static uint16_t nvme_ns_mgmt(NvmeCtrl *n, NvmeReque= st *req) return NVME_INVALID_FIELD | NVME_DNR; } =20 + /* ns->size is the real image size after creation */ if (nvme_cfg_update(n, ns->size, NVME_NS_ALLOC_CHK)) { - /* place for delete-ns */ + nvme_ns_mgmt_delete(n, nsid, NULL); return NVME_NS_INSUFFICIENT_CAPAC | NVME_DNR; } (void)nvme_cfg_update(n, ns->size, NVME_NS_ALLOC); if (nvme_cfg_save(n)) { (void)nvme_cfg_update(n, ns->size, NVME_NS_DEALLOC); - /* place for delete-ns */ + nvme_ns_mgmt_delete(n, nsid, NULL); return NVME_INVALID_FIELD | NVME_DNR; } req->cqe.result =3D cpu_to_le32(nsid); @@ -5825,6 +5844,65 @@ static uint16_t nvme_ns_mgmt(NvmeCtrl *n, NvmeReques= t *req) return NVME_INVALID_FIELD | NVME_DNR; } break; + case NVME_NS_MANAGEMENT_DELETE: + switch (csi) { + case NVME_CSI_NVM: + if (!nsid) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + if (nsid !=3D NVME_NSID_BROADCAST) { + ns =3D nvme_subsys_ns(n->subsys, nsid); + if (n->params.ns_directory && ns && ns_auto_check(n, ns, n= sid)) { + error_setg(&local_err, "ns[%"PRIu32"] cannot be delete= d, configured via '-device nvme-ns...'", nsid); + } else if (ns) { + image_size =3D ns->size; + nvme_ns_mgmt_delete(n, nsid, &local_err); + if (!local_err) { + (void)nvme_cfg_update(n, image_size, NVME_NS_DEALL= OC); + if (nvme_cfg_save(n)) { + error_setg(&local_err, "Could not save nvme-cf= g"); + } + } + } else { + return NVME_INVALID_FIELD | NVME_DNR; + } + } else { + for (i =3D 1; i <=3D NVME_MAX_NAMESPACES; i++) { + ns =3D nvme_subsys_ns(n->subsys, (uint32_t)i); + if (n->params.ns_directory && ns && ns_auto_check(n, n= s, (uint32_t)i)) { + error_setg(&local_err, "ns[%"PRIu32"] cannot be de= leted, configured via '-device nvme-ns...'", i); + error_report_err(local_err); + local_err =3D NULL; /* we are skipping */ + } else if (ns) { + image_size =3D ns->size; + nvme_ns_mgmt_delete(n, (uint16_t)i, &local_err); + if (!local_err) { + (void)nvme_cfg_update(n, image_size, NVME_NS_D= EALLOC); + if (nvme_cfg_save(n)) { + error_setg(&local_err, "Could not save nvm= e-cfg"); + } + } + } + if (local_err) { + break; + } + } + } + + if (local_err) { + error_report_err(local_err); + return NVME_INVALID_FIELD | NVME_DNR; + } + + nvme_update_dmrsl(n); + break; + case NVME_CSI_ZONED: + /* fall through for now */ + default: + return NVME_INVALID_FIELD | NVME_DNR; + } + break; default: return NVME_INVALID_FIELD | NVME_DNR; } diff --git a/hw/nvme/ns-backend.c b/hw/nvme/ns-backend.c index 82f9fcd5d9..8b9c1e5a3d 100644 --- a/hw/nvme/ns-backend.c +++ b/hw/nvme/ns-backend.c @@ -76,6 +76,11 @@ void ns_blockdev_activate(BlockBackend *blk, uint64_t i= mage_size, Error **errp) errp); } =20 +void ns_blockdev_deactivate(BlockBackend *blk, Error **errp) +{ + ns_blockdev_activate(blk, 0, errp); +} + int ns_storage_path_check(NvmeCtrl *n, Error **errp) { return storage_path_check(n->params.ns_directory, n->params.serial, e= rrp); diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c index 2aa7b01c3d..73630c27c3 100644 --- a/hw/nvme/ns.c +++ b/hw/nvme/ns.c @@ -592,6 +592,8 @@ NvmeNamespace * nvme_ns_create(NvmeCtrl *n, uint32_t ns= id, NvmeIdNsMgmt *id_ns, blk =3D NULL; } object_unref(OBJECT(dev)); + } else if (ns) { /* in a very rare case when ns_cfg= _save() failed */ + nvme_ns_delete(n, nsid, NULL); } error_propagate(errp, local_err); ns =3D NULL; @@ -600,6 +602,78 @@ NvmeNamespace * nvme_ns_create(NvmeCtrl *n, uint32_t n= sid, NvmeIdNsMgmt *id_ns, return ns; } =20 +static void nvme_ns_unrealize(DeviceState *dev); + +void nvme_ns_delete(NvmeCtrl *n, uint32_t nsid, Error **errp) +{ + NvmeNamespace *ns =3D NULL; + NvmeSubsystem *subsys =3D n->subsys; + int i; + int ret =3D 0; + Error *local_err =3D NULL; + + trace_pci_nvme_ns_delete(nsid); + + if (subsys) { + ns =3D nvme_subsys_ns(subsys, (uint32_t)nsid); + if (ns) { + if (ns->params.shared) { + for (i =3D 0; i < ARRAY_SIZE(subsys->ctrls); i++) { + NvmeCtrl *ctrl =3D subsys->ctrls[i]; + + if (ctrl && ctrl->namespaces[nsid]) { + ctrl->namespaces[nsid] =3D NULL; + ns->attached--; + } + } + } + subsys->namespaces[nsid] =3D NULL; + } + } + + if (!ns) { + ns =3D nvme_ns(n, (uint32_t)nsid); + } + + if (!ns) { + error_setg(&local_err, "Namespace %d does not exist", nsid); + error_propagate(errp, local_err); + return; + } + + n->namespaces[nsid] =3D NULL; + if (ns->attached > 0) { + error_setg(&local_err, "Could not detach all ns references for ns[= %d], still %d left", nsid, ns->attached); + error_propagate(errp, local_err); + return; + } + + /* here is actual deletion */ + nvme_ns_unrealize(&ns->parent_obj); + qdev_unrealize(&ns->parent_obj); + ns_blockdev_deactivate(ns->blkconf.blk, &local_err); + if (local_err) { + error_propagate(errp, local_err); + return; + } + + ns->params.detached =3D true; + ns_cfg_clear(ns); + ret =3D ns_cfg_save(n, ns, nsid); + if (ret =3D=3D -1) { + error_setg(&local_err, "Unable to save ns-cnf"); + error_propagate(errp, local_err); + return; + } else if (ret =3D=3D 1) { /* should not occur here, check and error = message prior to call to nvme_ns_delete() */ + return; + } + + /* disassociating refernces to the back-end and keeping it as preloade= d */ + n->preloaded_blk[nsid] =3D ns->blkconf.blk; + ns->blkconf.blk =3D NULL; + +} + int nvme_ns_setup(NvmeNamespace *ns, Error **errp) { if (nvme_ns_check_constraints(ns, errp)) { diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index c6194773e6..56cfb99b39 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -280,6 +280,7 @@ void nvme_ns_shutdown(NvmeNamespace *ns); void nvme_ns_cleanup(NvmeNamespace *ns); void nvme_validate_flbas(uint8_t flbas, Error **errp); NvmeNamespace * nvme_ns_create(NvmeCtrl *n, uint32_t nsid, NvmeIdNsMgmt *i= d_ns, Error **errp); +void nvme_ns_delete(NvmeCtrl *n, uint32_t nsid, Error **errp); =20 typedef struct NvmeAsyncEvent { QTAILQ_ENTRY(NvmeAsyncEvent) entry; @@ -582,6 +583,7 @@ static inline NvmeSecCtrlEntry *nvme_sctrl_for_cntlid(N= vmeCtrl *n, =20 BlockBackend *ns_blockdev_init(const char *file, Error **errp); void ns_blockdev_activate(BlockBackend *blk, uint64_t image_size, Error *= *errp); +void ns_blockdev_deactivate(BlockBackend *blk, Error **errp); int nvme_ns_backend_setup(NvmeCtrl *n, Error **errp); void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns); uint16_t nvme_bounce_data(NvmeCtrl *n, void *ptr, uint32_t len, diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events index 28b025ac42..0dd0c23208 100644 --- a/hw/nvme/trace-events +++ b/hw/nvme/trace-events @@ -79,6 +79,7 @@ pci_nvme_aer_masked(uint8_t type, uint8_t mask) "type 0x%= "PRIx8" mask 0x%"PRIx8" pci_nvme_aer_post_cqe(uint8_t typ, uint8_t info, uint8_t log_page) "type 0= x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8"" pci_nvme_ns_mgmt(uint16_t cid, uint32_t nsid, uint8_t sel, uint8_t csi, ui= nt8_t psdt) "cid %"PRIu16", nsid=3D%"PRIu32", sel=3D0x%"PRIx8", csi=3D0x%"P= RIx8", psdt=3D0x%"PRIx8"" pci_nvme_ns_create(uint16_t nsid, uint64_t nsze, uint64_t ncap, uint8_t fl= bas) "nsid %"PRIu16", nsze=3D%"PRIu64", ncap=3D%"PRIu64", flbas=3D%"PRIu8"" +pci_nvme_ns_delete(uint16_t nsid) "nsid %"PRIu16"" pci_nvme_ns_attachment(uint16_t cid, uint8_t sel) "cid %"PRIu16", sel=3D0x= %"PRIx8"" pci_nvme_ns_attachment_attach(uint16_t cntlid, uint32_t nsid) "cntlid=3D0x= %"PRIx16", nsid=3D0x%"PRIx32"" pci_nvme_enqueue_event(uint8_t typ, uint8_t info, uint8_t log_page) "type = 0x%"PRIx8" info 0x%"PRIx8" lid 0x%"PRIx8"" diff --git a/include/block/nvme.h b/include/block/nvme.h index 9d2e121f1a..0fe7fe9bb1 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -1191,6 +1191,7 @@ enum NvmeIdCtrlCmic { =20 enum NvmeNsManagementOperation { NVME_NS_MANAGEMENT_CREATE =3D 0x0, + NVME_NS_MANAGEMENT_DELETE =3D 0x1, }; =20 enum NvmeNsAttachmentOperation { --=20 2.37.3