From nobody Sat Feb 7 04:40:33 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1678116563292864.6282444278803; Mon, 6 Mar 2023 07:29:23 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pZBvY-0006Zj-Uo; Mon, 06 Mar 2023 09:35:00 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pZBvU-0006Wj-8M; Mon, 06 Mar 2023 09:34:59 -0500 Received: from out3-smtp.messagingengine.com ([66.111.4.27]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pZBvQ-0005WG-Co; Mon, 06 Mar 2023 09:34:56 -0500 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 8DEC45C0311; Mon, 6 Mar 2023 09:34:51 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Mon, 06 Mar 2023 09:34:51 -0500 Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 6 Mar 2023 09:34:49 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=irrelevant.dk; h=cc:cc:content-transfer-encoding:content-type:date:date:from :from:in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm3; t=1678113291; x= 1678199691; bh=QtpKsY2NLBlbVd8VXFpUiKSxX2+OGscmypDo0ljlgP8=; b=B XzpJ5oPxoXu4hfb3bfrBdSRIE5lhYmDS7D5XPWv9DlnxQKCrEbbOuIsv6fNuIGpS T0rR8vq1Nv4b70i1AO4TsehpG/kI1yHBgXIPATJJnppZe/Fa0lYIKGgDLEX7AR7I TZLeOkI8bnWltKFzxc0LXUZrWD1zGZ3nf9lqVnAT2aB7Ue9FugMtCMtnqVtgLYz4 oqfkun3jmz69ygmH8HRi21n/6/3apBq/3+2QVpQqEp7qHyEGPRQS36TciylvOSuZ LvcyIlbeGSvlPhc37kS9BR6EPN4EE4IglSxdfhJfl6J8RRKTdzZlPv8evbZkGgf5 PzoPnjWx3F6qZNdLhI4HA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1678113291; x= 1678199691; bh=QtpKsY2NLBlbVd8VXFpUiKSxX2+OGscmypDo0ljlgP8=; b=V hdSFvRYBZyJkwjW3EmNh+ot0Rl4EmWLXE+qhwkG4GmzsiFTbPuFSBb+fbMZ/fupl hE9DZwmGrxUO1HThvv7oon8D2s1Utl/h2NHsEbVZUJLOmCFmGbQ0LgU+pbCA6mIf AkOwB0pGZOOilhOxhQau8jPUqlDoEp/3ajDFZOyJBztpvh5yAJCrIfA5nChM5IRS 1ZAEH1rB9Eot15zxOAjEsHjhGYy85WRl/9ylg770DKjfNV3YeRNGjXF0Wphk2w6j fRIPvtGk1xb+nTy+0Tqiov2C1hTvxCDILpFHKn6EWjIujv74xGUFXPsyDzX3mIrJ LuJlq1YFM3j6cYpUrmHQg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrvddtkedgfeekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvvefufffkofgjfhgggfestdekredtredttdenucfhrhhomhepmfhlrghu shculfgvnhhsvghnuceoihhtshesihhrrhgvlhgvvhgrnhhtrdgukheqnecuggftrfgrth htvghrnheptdevfffhueetueduheetieehffeuteeggfevkeduudeuieffieelieejfeev ueejnecuffhomhgrihhnpedvtdduledqtdekrdhorhhgnecuvehluhhsthgvrhfuihiivg eptdenucfrrghrrghmpehmrghilhhfrhhomhepihhtshesihhrrhgvlhgvvhgrnhhtrdgu kh X-ME-Proxy: Feedback-ID: idc91472f:Fastmail From: Klaus Jensen To: qemu-devel@nongnu.org, Peter Maydell Cc: Klaus Jensen , Keith Busch , Kevin Wolf , Hanna Reitz , Stefan Hajnoczi , =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= , qemu-block@nongnu.org, Fam Zheng , Jesper Devantier , Klaus Jensen Subject: [PULL 5/5] hw/nvme: flexible data placement emulation Date: Mon, 6 Mar 2023 15:34:33 +0100 Message-Id: <20230306143433.10351-6-its@irrelevant.dk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230306143433.10351-1-its@irrelevant.dk> References: <20230306143433.10351-1-its@irrelevant.dk> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=48530; i=k.jensen@samsung.com; h=from:subject; bh=jTB0qDhcUjBcVGm+hIYyuDPy/aRVOOgEIAvgO5SuRSg=; b=owJ4nAFtAZL+kA0DAAoBTeGvMW1PDekByyZiAGQF+fkp8GVfPJeF0uiDKHswokJnVISsJmsVw s0nOlbaFxtiNIkBMwQAAQoAHRYhBFIoM6p14tzmokdmwE3hrzFtTw3pBQJkBfn5AAoJEE3hrzFt Tw3pHQ0IAJim92kUG5MgY9QRv3E9F6owRASFlkMg5ReACi5Ttpw+Iow8LPuVlnlcX/PPmeZpXcT ycZ8p+kJ4E53wU8AQhla2C3J7A17HOHYrNcjuRbfytAo8/Eh8SsEp73TAmwvRxXUWESmG3N6sL7 31K2QLbhtoK7/g+2MMEU8W/6Th7fbrDn/ZtNfEf8aRyqcV3N95UdlEa9llp4Oh2H7ljrEIBALkX HF89K6sTrHxDudMBPsSI6iLgQgIXk3YESyvN1bH9VYXtp5I3iptcKBezGEMRv2yw0LCTooFujYZ Vst9nC+5WfVJvvZoS28/qmFA5TP5Y+g/YG4EmaomNsrhKvIVI08n+6QG X-Developer-Key: i=k.jensen@samsung.com; a=openpgp; fpr=DDCA4D9C9EF931CC3468427263D56FC5E55DA838 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=66.111.4.27; envelope-from=its@irrelevant.dk; helo=out3-smtp.messagingengine.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1678116564422100003 Content-Type: text/plain; charset="utf-8" From: Jesper Devantier Add emulation of TP4146 ("Flexible Data Placement"). Reviewed-by: Keith Busch Signed-off-by: Jesper Devantier Signed-off-by: Klaus Jensen --- hw/nvme/ctrl.c | 698 ++++++++++++++++++++++++++++++++++++++++++- hw/nvme/ns.c | 143 +++++++++ hw/nvme/nvme.h | 85 +++++- hw/nvme/subsys.c | 94 +++++- hw/nvme/trace-events | 1 + include/block/nvme.h | 173 ++++++++++- 6 files changed, 1179 insertions(+), 15 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 0ead0ee30815..49c1210fce2b 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -238,6 +238,8 @@ static const bool nvme_feature_support[NVME_FID_MAX] = =3D { [NVME_TIMESTAMP] =3D true, [NVME_HOST_BEHAVIOR_SUPPORT] =3D true, [NVME_COMMAND_SET_PROFILE] =3D true, + [NVME_FDP_MODE] =3D true, + [NVME_FDP_EVENTS] =3D true, }; =20 static const uint32_t nvme_feature_cap[NVME_FID_MAX] =3D { @@ -249,6 +251,8 @@ static const uint32_t nvme_feature_cap[NVME_FID_MAX] = =3D { [NVME_TIMESTAMP] =3D NVME_FEAT_CAP_CHANGE, [NVME_HOST_BEHAVIOR_SUPPORT] =3D NVME_FEAT_CAP_CHANGE, [NVME_COMMAND_SET_PROFILE] =3D NVME_FEAT_CAP_CHANGE, + [NVME_FDP_MODE] =3D NVME_FEAT_CAP_CHANGE, + [NVME_FDP_EVENTS] =3D NVME_FEAT_CAP_CHANGE | NVME_FEAT_C= AP_NS, }; =20 static const uint32_t nvme_cse_acs[256] =3D { @@ -281,6 +285,8 @@ static const uint32_t nvme_cse_iocs_nvm[256] =3D { [NVME_CMD_VERIFY] =3D NVME_CMD_EFF_CSUPP, [NVME_CMD_COPY] =3D NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_= LBCC, [NVME_CMD_COMPARE] =3D NVME_CMD_EFF_CSUPP, + [NVME_CMD_IO_MGMT_RECV] =3D NVME_CMD_EFF_CSUPP, + [NVME_CMD_IO_MGMT_SEND] =3D NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_= LBCC, }; =20 static const uint32_t nvme_cse_iocs_zoned[256] =3D { @@ -299,12 +305,66 @@ static const uint32_t nvme_cse_iocs_zoned[256] =3D { =20 static void nvme_process_sq(void *opaque); static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType rst); +static inline uint64_t nvme_get_timestamp(const NvmeCtrl *n); =20 static uint16_t nvme_sqid(NvmeRequest *req) { return le16_to_cpu(req->sq->sqid); } =20 +static inline uint16_t nvme_make_pid(NvmeNamespace *ns, uint16_t rg, + uint16_t ph) +{ + uint16_t rgif =3D ns->endgrp->fdp.rgif; + + if (!rgif) { + return ph; + } + + return (rg << (16 - rgif)) | ph; +} + +static inline bool nvme_ph_valid(NvmeNamespace *ns, uint16_t ph) +{ + return ph < ns->fdp.nphs; +} + +static inline bool nvme_rg_valid(NvmeEnduranceGroup *endgrp, uint16_t rg) +{ + return rg < endgrp->fdp.nrg; +} + +static inline uint16_t nvme_pid2ph(NvmeNamespace *ns, uint16_t pid) +{ + uint16_t rgif =3D ns->endgrp->fdp.rgif; + + if (!rgif) { + return pid; + } + + return pid & ((1 << (15 - rgif)) - 1); +} + +static inline uint16_t nvme_pid2rg(NvmeNamespace *ns, uint16_t pid) +{ + uint16_t rgif =3D ns->endgrp->fdp.rgif; + + if (!rgif) { + return 0; + } + + return pid >> (16 - rgif); +} + +static inline bool nvme_parse_pid(NvmeNamespace *ns, uint16_t pid, + uint16_t *ph, uint16_t *rg) +{ + *rg =3D nvme_pid2rg(ns, pid); + *ph =3D nvme_pid2ph(ns, pid); + + return nvme_ph_valid(ns, *ph) && nvme_rg_valid(ns->endgrp, *rg); +} + static void nvme_assign_zone_state(NvmeNamespace *ns, NvmeZone *zone, NvmeZoneState state) { @@ -378,6 +438,69 @@ static uint16_t nvme_aor_check(NvmeNamespace *ns, uint= 32_t act, uint32_t opn) return nvme_zns_check_resources(ns, act, opn, 0); } =20 +static NvmeFdpEvent *nvme_fdp_alloc_event(NvmeCtrl *n, NvmeFdpEventBuffer = *ebuf) +{ + NvmeFdpEvent *ret =3D NULL; + bool is_full =3D ebuf->next =3D=3D ebuf->start && ebuf->nelems; + + ret =3D &ebuf->events[ebuf->next++]; + if (unlikely(ebuf->next =3D=3D NVME_FDP_MAX_EVENTS)) { + ebuf->next =3D 0; + } + if (is_full) { + ebuf->start =3D ebuf->next; + } else { + ebuf->nelems++; + } + + memset(ret, 0, sizeof(NvmeFdpEvent)); + ret->timestamp =3D nvme_get_timestamp(n); + + return ret; +} + +static inline int log_event(NvmeRuHandle *ruh, uint8_t event_type) +{ + return (ruh->event_filter >> nvme_fdp_evf_shifts[event_type]) & 0x1; +} + +static bool nvme_update_ruh(NvmeCtrl *n, NvmeNamespace *ns, uint16_t pid) +{ + NvmeEnduranceGroup *endgrp =3D ns->endgrp; + NvmeRuHandle *ruh; + NvmeReclaimUnit *ru; + NvmeFdpEvent *e =3D NULL; + uint16_t ph, rg, ruhid; + + if (!nvme_parse_pid(ns, pid, &ph, &rg)) { + return false; + } + + ruhid =3D ns->fdp.phs[ph]; + + ruh =3D &endgrp->fdp.ruhs[ruhid]; + ru =3D &ruh->rus[rg]; + + if (ru->ruamw) { + if (log_event(ruh, FDP_EVT_RU_NOT_FULLY_WRITTEN)) { + e =3D nvme_fdp_alloc_event(n, &endgrp->fdp.host_events); + e->type =3D FDP_EVT_RU_NOT_FULLY_WRITTEN; + e->flags =3D FDPEF_PIV | FDPEF_NSIDV | FDPEF_LV; + e->pid =3D cpu_to_le16(pid); + e->nsid =3D cpu_to_le32(ns->params.nsid); + e->rgid =3D cpu_to_le16(rg); + e->ruhid =3D cpu_to_le16(ruhid); + } + + /* log (eventual) GC overhead of prematurely swapping the RU */ + nvme_fdp_stat_inc(&endgrp->fdp.mbmw, nvme_l2b(ns, ru->ruamw)); + } + + ru->ruamw =3D ruh->ruamw; + + return true; +} + static bool nvme_addr_is_cmb(NvmeCtrl *n, hwaddr addr) { hwaddr hi, lo; @@ -3322,6 +3445,41 @@ invalid: return status | NVME_DNR; } =20 +static void nvme_do_write_fdp(NvmeCtrl *n, NvmeRequest *req, uint64_t slba, + uint32_t nlb) +{ + NvmeNamespace *ns =3D req->ns; + NvmeRwCmd *rw =3D (NvmeRwCmd *)&req->cmd; + uint64_t data_size =3D nvme_l2b(ns, nlb); + uint32_t dw12 =3D le32_to_cpu(req->cmd.cdw12); + uint8_t dtype =3D (dw12 >> 20) & 0xf; + uint16_t pid =3D le16_to_cpu(rw->dspec); + uint16_t ph, rg, ruhid; + NvmeReclaimUnit *ru; + + if (dtype !=3D NVME_DIRECTIVE_DATA_PLACEMENT || + !nvme_parse_pid(ns, pid, &ph, &rg)) { + ph =3D 0; + rg =3D 0; + } + + ruhid =3D ns->fdp.phs[ph]; + ru =3D &ns->endgrp->fdp.ruhs[ruhid].rus[rg]; + + nvme_fdp_stat_inc(&ns->endgrp->fdp.hbmw, data_size); + nvme_fdp_stat_inc(&ns->endgrp->fdp.mbmw, data_size); + + while (nlb) { + if (nlb < ru->ruamw) { + ru->ruamw -=3D nlb; + break; + } + + nlb -=3D ru->ruamw; + nvme_update_ruh(n, ns, pid); + } +} + static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest *req, bool append, bool wrz) { @@ -3431,6 +3589,8 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeReques= t *req, bool append, if (!(zone->d.za & NVME_ZA_ZRWA_VALID)) { zone->w_ptr +=3D nlb; } + } else if (ns->endgrp && ns->endgrp->fdp.enabled) { + nvme_do_write_fdp(n, req, slba, nlb); } =20 data_offset =3D nvme_l2b(ns, slba); @@ -4088,6 +4248,126 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, Nv= meRequest *req) return status; } =20 +static uint16_t nvme_io_mgmt_recv_ruhs(NvmeCtrl *n, NvmeRequest *req, + size_t len) +{ + NvmeNamespace *ns =3D req->ns; + NvmeEnduranceGroup *endgrp; + NvmeRuhStatus *hdr; + NvmeRuhStatusDescr *ruhsd; + unsigned int nruhsd; + uint16_t rg, ph, *ruhid; + size_t trans_len; + g_autofree uint8_t *buf =3D NULL; + + if (!n->subsys) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + if (ns->params.nsid =3D=3D 0 || ns->params.nsid =3D=3D 0xffffffff) { + return NVME_INVALID_NSID | NVME_DNR; + } + + if (!n->subsys->endgrp.fdp.enabled) { + return NVME_FDP_DISABLED | NVME_DNR; + } + + endgrp =3D ns->endgrp; + + nruhsd =3D ns->fdp.nphs * endgrp->fdp.nrg; + trans_len =3D sizeof(NvmeRuhStatus) + nruhsd * sizeof(NvmeRuhStatusDes= cr); + buf =3D g_malloc(trans_len); + + trans_len =3D MIN(trans_len, len); + + hdr =3D (NvmeRuhStatus *)buf; + ruhsd =3D (NvmeRuhStatusDescr *)(buf + sizeof(NvmeRuhStatus)); + + hdr->nruhsd =3D cpu_to_le16(nruhsd); + + ruhid =3D ns->fdp.phs; + + for (ph =3D 0; ph < ns->fdp.nphs; ph++, ruhid++) { + NvmeRuHandle *ruh =3D &endgrp->fdp.ruhs[*ruhid]; + + for (rg =3D 0; rg < endgrp->fdp.nrg; rg++, ruhsd++) { + uint16_t pid =3D nvme_make_pid(ns, rg, ph); + + ruhsd->pid =3D cpu_to_le16(pid); + ruhsd->ruhid =3D *ruhid; + ruhsd->earutr =3D 0; + ruhsd->ruamw =3D cpu_to_le64(ruh->rus[rg].ruamw); + } + } + + return nvme_c2h(n, buf, trans_len, req); +} + +static uint16_t nvme_io_mgmt_recv(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeCmd *cmd =3D &req->cmd; + uint32_t cdw10 =3D le32_to_cpu(cmd->cdw10); + uint32_t numd =3D le32_to_cpu(cmd->cdw11); + uint8_t mo =3D (cdw10 & 0xff); + size_t len =3D (numd + 1) << 2; + + switch (mo) { + case NVME_IOMR_MO_NOP: + return 0; + case NVME_IOMR_MO_RUH_STATUS: + return nvme_io_mgmt_recv_ruhs(n, req, len); + default: + return NVME_INVALID_FIELD | NVME_DNR; + }; +} + +static uint16_t nvme_io_mgmt_send_ruh_update(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeCmd *cmd =3D &req->cmd; + NvmeNamespace *ns =3D req->ns; + uint32_t cdw10 =3D le32_to_cpu(cmd->cdw10); + uint16_t ret =3D NVME_SUCCESS; + uint32_t npid =3D (cdw10 >> 1) + 1; + unsigned int i =3D 0; + g_autofree uint16_t *pids =3D NULL; + uint32_t maxnpid =3D n->subsys->endgrp.fdp.nrg * n->subsys->endgrp.fdp= .nruh; + + if (unlikely(npid >=3D MIN(NVME_FDP_MAXPIDS, maxnpid))) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + pids =3D g_new(uint16_t, npid); + + ret =3D nvme_h2c(n, pids, npid * sizeof(uint16_t), req); + if (ret) { + return ret; + } + + for (; i < npid; i++) { + if (!nvme_update_ruh(n, ns, pids[i])) { + return NVME_INVALID_FIELD | NVME_DNR; + } + } + + return ret; +} + +static uint16_t nvme_io_mgmt_send(NvmeCtrl *n, NvmeRequest *req) +{ + NvmeCmd *cmd =3D &req->cmd; + uint32_t cdw10 =3D le32_to_cpu(cmd->cdw10); + uint8_t mo =3D (cdw10 & 0xff); + + switch (mo) { + case NVME_IOMS_MO_NOP: + return 0; + case NVME_IOMS_MO_RUH_UPDATE: + return nvme_io_mgmt_send_ruh_update(n, req); + default: + return NVME_INVALID_FIELD | NVME_DNR; + }; +} + static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req) { NvmeNamespace *ns; @@ -4164,6 +4444,10 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest= *req) return nvme_zone_mgmt_send(n, req); case NVME_CMD_ZONE_MGMT_RECV: return nvme_zone_mgmt_recv(n, req); + case NVME_CMD_IO_MGMT_RECV: + return nvme_io_mgmt_recv(n, req); + case NVME_CMD_IO_MGMT_SEND: + return nvme_io_mgmt_send(n, req); default: assert(false); } @@ -4623,6 +4907,207 @@ static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint8= _t csi, uint32_t buf_len, return nvme_c2h(n, ((uint8_t *)&log) + off, trans_len, req); } =20 +static size_t sizeof_fdp_conf_descr(size_t nruh, size_t vss) +{ + size_t entry_siz =3D sizeof(NvmeFdpDescrHdr) + nruh * sizeof(NvmeRuhDe= scr) + + vss; + return ROUND_UP(entry_siz, 8); +} + +static uint16_t nvme_fdp_confs(NvmeCtrl *n, uint32_t endgrpid, uint32_t bu= f_len, + uint64_t off, NvmeRequest *req) +{ + uint32_t log_size, trans_len; + g_autofree uint8_t *buf =3D NULL; + NvmeFdpDescrHdr *hdr; + NvmeRuhDescr *ruhd; + NvmeEnduranceGroup *endgrp; + NvmeFdpConfsHdr *log; + size_t nruh, fdp_descr_size; + int i; + + if (endgrpid !=3D 1 || !n->subsys) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + endgrp =3D &n->subsys->endgrp; + + if (endgrp->fdp.enabled) { + nruh =3D endgrp->fdp.nruh; + } else { + nruh =3D 1; + } + + fdp_descr_size =3D sizeof_fdp_conf_descr(nruh, FDPVSS); + log_size =3D sizeof(NvmeFdpConfsHdr) + fdp_descr_size; + + if (off >=3D log_size) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + trans_len =3D MIN(log_size - off, buf_len); + + buf =3D g_malloc0(log_size); + log =3D (NvmeFdpConfsHdr *)buf; + hdr =3D (NvmeFdpDescrHdr *)(log + 1); + ruhd =3D (NvmeRuhDescr *)(buf + sizeof(*log) + sizeof(*hdr)); + + log->num_confs =3D cpu_to_le16(0); + log->size =3D cpu_to_le32(log_size); + + hdr->descr_size =3D cpu_to_le16(fdp_descr_size); + if (endgrp->fdp.enabled) { + hdr->fdpa =3D FIELD_DP8(hdr->fdpa, FDPA, VALID, 1); + hdr->fdpa =3D FIELD_DP8(hdr->fdpa, FDPA, RGIF, endgrp->fdp.rgif); + hdr->nrg =3D cpu_to_le16(endgrp->fdp.nrg); + hdr->nruh =3D cpu_to_le16(endgrp->fdp.nruh); + hdr->maxpids =3D cpu_to_le16(NVME_FDP_MAXPIDS - 1); + hdr->nnss =3D cpu_to_le32(NVME_MAX_NAMESPACES); + hdr->runs =3D cpu_to_le64(endgrp->fdp.runs); + + for (i =3D 0; i < nruh; i++) { + ruhd->ruht =3D NVME_RUHT_INITIALLY_ISOLATED; + ruhd++; + } + } else { + /* 1 bit for RUH in PIF -> 2 RUHs max. */ + hdr->nrg =3D cpu_to_le16(1); + hdr->nruh =3D cpu_to_le16(1); + hdr->maxpids =3D cpu_to_le16(NVME_FDP_MAXPIDS - 1); + hdr->nnss =3D cpu_to_le32(1); + hdr->runs =3D cpu_to_le64(96 * MiB); + + ruhd->ruht =3D NVME_RUHT_INITIALLY_ISOLATED; + } + + return nvme_c2h(n, (uint8_t *)buf + off, trans_len, req); +} + +static uint16_t nvme_fdp_ruh_usage(NvmeCtrl *n, uint32_t endgrpid, + uint32_t dw10, uint32_t dw12, + uint32_t buf_len, uint64_t off, + NvmeRequest *req) +{ + NvmeRuHandle *ruh; + NvmeRuhuLog *hdr; + NvmeRuhuDescr *ruhud; + NvmeEnduranceGroup *endgrp; + g_autofree uint8_t *buf =3D NULL; + uint32_t log_size, trans_len; + uint16_t i; + + if (endgrpid !=3D 1 || !n->subsys) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + endgrp =3D &n->subsys->endgrp; + + if (!endgrp->fdp.enabled) { + return NVME_FDP_DISABLED | NVME_DNR; + } + + log_size =3D sizeof(NvmeRuhuLog) + endgrp->fdp.nruh * sizeof(NvmeRuhuD= escr); + + if (off >=3D log_size) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + trans_len =3D MIN(log_size - off, buf_len); + + buf =3D g_malloc0(log_size); + hdr =3D (NvmeRuhuLog *)buf; + ruhud =3D (NvmeRuhuDescr *)(hdr + 1); + + ruh =3D endgrp->fdp.ruhs; + hdr->nruh =3D cpu_to_le16(endgrp->fdp.nruh); + + for (i =3D 0; i < endgrp->fdp.nruh; i++, ruhud++, ruh++) { + ruhud->ruha =3D ruh->ruha; + } + + return nvme_c2h(n, (uint8_t *)buf + off, trans_len, req); +} + +static uint16_t nvme_fdp_stats(NvmeCtrl *n, uint32_t endgrpid, uint32_t bu= f_len, + uint64_t off, NvmeRequest *req) +{ + NvmeEnduranceGroup *endgrp; + NvmeFdpStatsLog log =3D {}; + uint32_t trans_len; + + if (off >=3D sizeof(NvmeFdpStatsLog)) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + if (endgrpid !=3D 1 || !n->subsys) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + if (!n->subsys->endgrp.fdp.enabled) { + return NVME_FDP_DISABLED | NVME_DNR; + } + + endgrp =3D &n->subsys->endgrp; + + trans_len =3D MIN(sizeof(log) - off, buf_len); + + /* spec value is 128 bit, we only use 64 bit */ + log.hbmw[0] =3D cpu_to_le64(endgrp->fdp.hbmw); + log.mbmw[0] =3D cpu_to_le64(endgrp->fdp.mbmw); + log.mbe[0] =3D cpu_to_le64(endgrp->fdp.mbe); + + return nvme_c2h(n, (uint8_t *)&log + off, trans_len, req); +} + +static uint16_t nvme_fdp_events(NvmeCtrl *n, uint32_t endgrpid, + uint32_t buf_len, uint64_t off, + NvmeRequest *req) +{ + NvmeEnduranceGroup *endgrp; + NvmeCmd *cmd =3D &req->cmd; + bool host_events =3D (cmd->cdw10 >> 8) & 0x1; + uint32_t log_size, trans_len; + NvmeFdpEventBuffer *ebuf; + g_autofree NvmeFdpEventsLog *elog =3D NULL; + NvmeFdpEvent *event; + + if (endgrpid !=3D 1 || !n->subsys) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + endgrp =3D &n->subsys->endgrp; + + if (!endgrp->fdp.enabled) { + return NVME_FDP_DISABLED | NVME_DNR; + } + + if (host_events) { + ebuf =3D &endgrp->fdp.host_events; + } else { + ebuf =3D &endgrp->fdp.ctrl_events; + } + + log_size =3D sizeof(NvmeFdpEventsLog) + ebuf->nelems * sizeof(NvmeFdpE= vent); + trans_len =3D MIN(log_size - off, buf_len); + elog =3D g_malloc0(log_size); + elog->num_events =3D cpu_to_le32(ebuf->nelems); + event =3D (NvmeFdpEvent *)(elog + 1); + + if (ebuf->nelems && ebuf->start =3D=3D ebuf->next) { + unsigned int nelems =3D (NVME_FDP_MAX_EVENTS - ebuf->start); + /* wrap over, copy [start;NVME_FDP_MAX_EVENTS[ and [0; next[ */ + memcpy(event, &ebuf->events[ebuf->start], + sizeof(NvmeFdpEvent) * nelems); + memcpy(event + nelems, ebuf->events, + sizeof(NvmeFdpEvent) * ebuf->next); + } else if (ebuf->start < ebuf->next) { + memcpy(event, &ebuf->events[ebuf->start], + sizeof(NvmeFdpEvent) * (ebuf->next - ebuf->start)); + } + + return nvme_c2h(n, (uint8_t *)elog + off, trans_len, req); +} + static uint16_t nvme_get_log(NvmeCtrl *n, NvmeRequest *req) { NvmeCmd *cmd =3D &req->cmd; @@ -4635,13 +5120,14 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeReque= st *req) uint8_t lsp =3D (dw10 >> 8) & 0xf; uint8_t rae =3D (dw10 >> 15) & 0x1; uint8_t csi =3D le32_to_cpu(cmd->cdw14) >> 24; - uint32_t numdl, numdu; + uint32_t numdl, numdu, lspi; uint64_t off, lpol, lpou; size_t len; uint16_t status; =20 numdl =3D (dw10 >> 16); numdu =3D (dw11 & 0xffff); + lspi =3D (dw11 >> 16); lpol =3D dw12; lpou =3D dw13; =20 @@ -4672,6 +5158,14 @@ static uint16_t nvme_get_log(NvmeCtrl *n, NvmeReques= t *req) return nvme_cmd_effects(n, csi, len, off, req); case NVME_LOG_ENDGRP: return nvme_endgrp_info(n, rae, len, off, req); + case NVME_LOG_FDP_CONFS: + return nvme_fdp_confs(n, lspi, len, off, req); + case NVME_LOG_FDP_RUH_USAGE: + return nvme_fdp_ruh_usage(n, lspi, dw10, dw12, len, off, req); + case NVME_LOG_FDP_STATS: + return nvme_fdp_stats(n, lspi, len, off, req); + case NVME_LOG_FDP_EVENTS: + return nvme_fdp_events(n, lspi, len, off, req); default: trace_pci_nvme_err_invalid_log_page(nvme_cid(req), lid); return NVME_INVALID_FIELD | NVME_DNR; @@ -5258,6 +5752,84 @@ static uint16_t nvme_get_feature_timestamp(NvmeCtrl = *n, NvmeRequest *req) return nvme_c2h(n, (uint8_t *)×tamp, sizeof(timestamp), req); } =20 +static int nvme_get_feature_fdp(NvmeCtrl *n, uint32_t endgrpid, + uint32_t *result) +{ + *result =3D 0; + + if (!n->subsys || !n->subsys->endgrp.fdp.enabled) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + *result =3D FIELD_DP16(0, FEAT_FDP, FDPE, 1); + *result =3D FIELD_DP16(*result, FEAT_FDP, CONF_NDX, 0); + + return NVME_SUCCESS; +} + +static uint16_t nvme_get_feature_fdp_events(NvmeCtrl *n, NvmeNamespace *ns, + NvmeRequest *req, uint32_t *re= sult) +{ + NvmeCmd *cmd =3D &req->cmd; + uint32_t cdw11 =3D le32_to_cpu(cmd->cdw11); + uint16_t ph =3D cdw11 & 0xffff; + uint8_t noet =3D (cdw11 >> 16) & 0xff; + uint16_t ruhid, ret; + uint32_t nentries =3D 0; + uint8_t s_events_ndx =3D 0; + size_t s_events_siz =3D sizeof(NvmeFdpEventDescr) * noet; + g_autofree NvmeFdpEventDescr *s_events =3D g_malloc0(s_events_siz); + NvmeRuHandle *ruh; + NvmeFdpEventDescr *s_event; + + if (!n->subsys || !n->subsys->endgrp.fdp.enabled) { + return NVME_FDP_DISABLED | NVME_DNR; + } + + if (!nvme_ph_valid(ns, ph)) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + ruhid =3D ns->fdp.phs[ph]; + ruh =3D &n->subsys->endgrp.fdp.ruhs[ruhid]; + + assert(ruh); + + if (unlikely(noet =3D=3D 0)) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + for (uint8_t event_type =3D 0; event_type < FDP_EVT_MAX; event_type++)= { + uint8_t shift =3D nvme_fdp_evf_shifts[event_type]; + if (!shift && event_type) { + /* + * only first entry (event_type =3D=3D 0) has a shift value of= 0 + * other entries are simply unpopulated. + */ + continue; + } + + nentries++; + + s_event =3D &s_events[s_events_ndx]; + s_event->evt =3D event_type; + s_event->evta =3D (ruh->event_filter >> shift) & 0x1; + + /* break if all `noet` entries are filled */ + if ((++s_events_ndx) =3D=3D noet) { + break; + } + } + + ret =3D nvme_c2h(n, s_events, s_events_siz, req); + if (ret) { + return ret; + } + + *result =3D nentries; + return NVME_SUCCESS; +} + static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeRequest *req) { NvmeCmd *cmd =3D &req->cmd; @@ -5270,6 +5842,7 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeReq= uest *req) uint16_t iv; NvmeNamespace *ns; int i; + uint16_t endgrpid =3D 0, ret =3D NVME_SUCCESS; =20 static const uint32_t nvme_feature_default[NVME_FID_MAX] =3D { [NVME_ARBITRATION] =3D NVME_ARB_AB_NOLIMIT, @@ -5367,6 +5940,33 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeRe= quest *req) case NVME_HOST_BEHAVIOR_SUPPORT: return nvme_c2h(n, (uint8_t *)&n->features.hbs, sizeof(n->features.hbs), req); + case NVME_FDP_MODE: + endgrpid =3D dw11 & 0xff; + + if (endgrpid !=3D 0x1) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + ret =3D nvme_get_feature_fdp(n, endgrpid, &result); + if (ret) { + return ret; + } + goto out; + case NVME_FDP_EVENTS: + if (!nvme_nsid_valid(n, nsid)) { + return NVME_INVALID_NSID | NVME_DNR; + } + + ns =3D nvme_ns(n, nsid); + if (unlikely(!ns)) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + ret =3D nvme_get_feature_fdp_events(n, ns, req, &result); + if (ret) { + return ret; + } + goto out; default: break; } @@ -5399,6 +5999,20 @@ defaults: if (iv =3D=3D n->admin_cq.vector) { result |=3D NVME_INTVC_NOCOALESCING; } + break; + case NVME_FDP_MODE: + endgrpid =3D dw11 & 0xff; + + if (endgrpid !=3D 0x1) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + ret =3D nvme_get_feature_fdp(n, endgrpid, &result); + if (ret) { + return ret; + } + goto out; + break; default: result =3D nvme_feature_default[fid]; @@ -5407,7 +6021,7 @@ defaults: =20 out: req->cqe.result =3D cpu_to_le32(result); - return NVME_SUCCESS; + return ret; } =20 static uint16_t nvme_set_feature_timestamp(NvmeCtrl *n, NvmeRequest *req) @@ -5425,6 +6039,51 @@ static uint16_t nvme_set_feature_timestamp(NvmeCtrl = *n, NvmeRequest *req) return NVME_SUCCESS; } =20 +static uint16_t nvme_set_feature_fdp_events(NvmeCtrl *n, NvmeNamespace *ns, + NvmeRequest *req) +{ + NvmeCmd *cmd =3D &req->cmd; + uint32_t cdw11 =3D le32_to_cpu(cmd->cdw11); + uint16_t ph =3D cdw11 & 0xffff; + uint8_t noet =3D (cdw11 >> 16) & 0xff; + uint16_t ret, ruhid; + uint8_t enable =3D le32_to_cpu(cmd->cdw12) & 0x1; + uint8_t event_mask =3D 0; + unsigned int i; + g_autofree uint8_t *events =3D g_malloc0(noet); + NvmeRuHandle *ruh =3D NULL; + + assert(ns); + + if (!n->subsys || !n->subsys->endgrp.fdp.enabled) { + return NVME_FDP_DISABLED | NVME_DNR; + } + + if (!nvme_ph_valid(ns, ph)) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + ruhid =3D ns->fdp.phs[ph]; + ruh =3D &n->subsys->endgrp.fdp.ruhs[ruhid]; + + ret =3D nvme_h2c(n, events, noet, req); + if (ret) { + return ret; + } + + for (i =3D 0; i < noet; i++) { + event_mask |=3D (1 << nvme_fdp_evf_shifts[events[i]]); + } + + if (enable) { + ruh->event_filter |=3D event_mask; + } else { + ruh->event_filter =3D ruh->event_filter & ~event_mask; + } + + return NVME_SUCCESS; +} + static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req) { NvmeNamespace *ns =3D NULL; @@ -5584,6 +6243,11 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRe= quest *req) return NVME_CMD_SET_CMB_REJECTED | NVME_DNR; } break; + case NVME_FDP_MODE: + /* spec: abort with cmd seq err if there's one or more NS' in endg= rp */ + return NVME_CMD_SEQ_ERROR | NVME_DNR; + case NVME_FDP_EVENTS: + return nvme_set_feature_fdp_events(n, ns, req); default: return NVME_FEAT_NOT_CHANGEABLE | NVME_DNR; } @@ -6159,6 +6823,7 @@ static uint16_t nvme_directive_send(NvmeCtrl *n, Nvme= Request *req) =20 static uint16_t nvme_directive_receive(NvmeCtrl *n, NvmeRequest *req) { + NvmeNamespace *ns; uint32_t dw10 =3D le32_to_cpu(req->cmd.cdw10); uint32_t dw11 =3D le32_to_cpu(req->cmd.cdw11); uint32_t nsid =3D le32_to_cpu(req->cmd.nsid); @@ -6180,7 +6845,30 @@ static uint16_t nvme_directive_receive(NvmeCtrl *n, = NvmeRequest *req) return NVME_INVALID_FIELD | NVME_DNR; } =20 - return nvme_c2h(n, (uint8_t *)&id, trans_len, req); + ns =3D nvme_ns(n, nsid); + if (!ns) { + return NVME_INVALID_FIELD | NVME_DNR; + } + + switch (dtype) { + case NVME_DIRECTIVE_IDENTIFY: + switch (doper) { + case NVME_DIRECTIVE_RETURN_PARAMS: + if (ns->endgrp->fdp.enabled) { + id.supported |=3D 1 << NVME_DIRECTIVE_DATA_PLACEMENT; + id.enabled |=3D 1 << NVME_DIRECTIVE_DATA_PLACEMENT; + id.persistent |=3D 1 << NVME_DIRECTIVE_DATA_PLACEMENT; + } + + return nvme_c2h(n, (uint8_t *)&id, trans_len, req); + + default: + return NVME_INVALID_FIELD | NVME_DNR; + } + + default: + return NVME_INVALID_FIELD; + } } =20 static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest *req) @@ -7545,6 +8233,10 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *p= ci_dev) ctratt |=3D NVME_CTRATT_ENDGRPS; =20 id->endgidmax =3D cpu_to_le16(0x1); + + if (n->subsys->endgrp.fdp.enabled) { + ctratt |=3D NVME_CTRATT_FDPS; + } } =20 id->ctratt =3D cpu_to_le32(ctratt); diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c index 23872a174c9a..cfac960dcf39 100644 --- a/hw/nvme/ns.c +++ b/hw/nvme/ns.c @@ -14,8 +14,10 @@ =20 #include "qemu/osdep.h" #include "qemu/units.h" +#include "qemu/cutils.h" #include "qemu/error-report.h" #include "qapi/error.h" +#include "qemu/bitops.h" #include "sysemu/sysemu.h" #include "sysemu/block-backend.h" =20 @@ -377,6 +379,130 @@ static void nvme_zoned_ns_shutdown(NvmeNamespace *ns) assert(ns->nr_open_zones =3D=3D 0); } =20 +static NvmeRuHandle *nvme_find_ruh_by_attr(NvmeEnduranceGroup *endgrp, + uint8_t ruha, uint16_t *ruhid) +{ + for (uint16_t i =3D 0; i < endgrp->fdp.nruh; i++) { + NvmeRuHandle *ruh =3D &endgrp->fdp.ruhs[i]; + + if (ruh->ruha =3D=3D ruha) { + *ruhid =3D i; + return ruh; + } + } + + return NULL; +} + +static bool nvme_ns_init_fdp(NvmeNamespace *ns, Error **errp) +{ + NvmeEnduranceGroup *endgrp =3D ns->endgrp; + NvmeRuHandle *ruh; + uint8_t lbafi =3D NVME_ID_NS_FLBAS_INDEX(ns->id_ns.flbas); + unsigned int *ruhid, *ruhids; + char *r, *p, *token; + uint16_t *ph; + + if (!ns->params.fdp.ruhs) { + ns->fdp.nphs =3D 1; + ph =3D ns->fdp.phs =3D g_new(uint16_t, 1); + + ruh =3D nvme_find_ruh_by_attr(endgrp, NVME_RUHA_CTRL, ph); + if (!ruh) { + ruh =3D nvme_find_ruh_by_attr(endgrp, NVME_RUHA_UNUSED, ph); + if (!ruh) { + error_setg(errp, "no unused reclaim unit handles left"); + return false; + } + + ruh->ruha =3D NVME_RUHA_CTRL; + ruh->lbafi =3D lbafi; + ruh->ruamw =3D endgrp->fdp.runs >> ns->lbaf.ds; + + for (uint16_t rg =3D 0; rg < endgrp->fdp.nrg; rg++) { + ruh->rus[rg].ruamw =3D ruh->ruamw; + } + } else if (ruh->lbafi !=3D lbafi) { + error_setg(errp, "lba format index of controller assigned " + "reclaim unit handle does not match namespace lba " + "format index"); + return false; + } + + return true; + } + + ruhid =3D ruhids =3D g_new0(unsigned int, endgrp->fdp.nruh); + r =3D p =3D strdup(ns->params.fdp.ruhs); + + /* parse the placement handle identifiers */ + while ((token =3D qemu_strsep(&p, ";")) !=3D NULL) { + ns->fdp.nphs +=3D 1; + if (ns->fdp.nphs > NVME_FDP_MAXPIDS || + ns->fdp.nphs =3D=3D endgrp->fdp.nruh) { + error_setg(errp, "too many placement handles"); + free(r); + return false; + } + + if (qemu_strtoui(token, NULL, 0, ruhid++) < 0) { + error_setg(errp, "cannot parse reclaim unit handle identifier"= ); + free(r); + return false; + } + } + + free(r); + + ph =3D ns->fdp.phs =3D g_new(uint16_t, ns->fdp.nphs); + + ruhid =3D ruhids; + + /* verify the identifiers */ + for (unsigned int i =3D 0; i < ns->fdp.nphs; i++, ruhid++, ph++) { + if (*ruhid >=3D endgrp->fdp.nruh) { + error_setg(errp, "invalid reclaim unit handle identifier"); + return false; + } + + ruh =3D &endgrp->fdp.ruhs[*ruhid]; + + switch (ruh->ruha) { + case NVME_RUHA_UNUSED: + ruh->ruha =3D NVME_RUHA_HOST; + ruh->lbafi =3D lbafi; + ruh->ruamw =3D endgrp->fdp.runs >> ns->lbaf.ds; + + for (uint16_t rg =3D 0; rg < endgrp->fdp.nrg; rg++) { + ruh->rus[rg].ruamw =3D ruh->ruamw; + } + + break; + + case NVME_RUHA_HOST: + if (ruh->lbafi !=3D lbafi) { + error_setg(errp, "lba format index of host assigned" + "reclaim unit handle does not match namespace " + "lba format index"); + return false; + } + + break; + + case NVME_RUHA_CTRL: + error_setg(errp, "reclaim unit handle is controller assigned"); + return false; + + default: + abort(); + } + + *ph =3D *ruhid; + } + + return true; +} + static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp) { unsigned int pi_size; @@ -417,6 +543,11 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns= , Error **errp) return -1; } =20 + if (ns->params.zoned && ns->endgrp && ns->endgrp->fdp.enabled) { + error_setg(errp, "cannot be a zoned- in an FDP configuration"); + return -1; + } + if (ns->params.zoned) { if (ns->params.max_active_zones) { if (ns->params.max_open_zones > ns->params.max_active_zones) { @@ -502,6 +633,12 @@ int nvme_ns_setup(NvmeNamespace *ns, Error **errp) nvme_ns_init_zoned(ns); } =20 + if (ns->endgrp && ns->endgrp->fdp.enabled) { + if (!nvme_ns_init_fdp(ns, errp)) { + return -1; + } + } + return 0; } =20 @@ -525,6 +662,10 @@ void nvme_ns_cleanup(NvmeNamespace *ns) g_free(ns->zone_array); g_free(ns->zd_extensions); } + + if (ns->endgrp && ns->endgrp->fdp.enabled) { + g_free(ns->fdp.phs); + } } =20 static void nvme_ns_unrealize(DeviceState *dev) @@ -562,6 +703,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **e= rrp) return; } ns->subsys =3D subsys; + ns->endgrp =3D &subsys->endgrp; } =20 if (nvme_ns_setup(ns, errp)) { @@ -648,6 +790,7 @@ static Property nvme_ns_props[] =3D { DEFINE_PROP_SIZE("zoned.zrwafg", NvmeNamespace, params.zrwafg, -1), DEFINE_PROP_BOOL("eui64-default", NvmeNamespace, params.eui64_default, false), + DEFINE_PROP_STRING("fdp.ruhs", NvmeNamespace, params.fdp.ruhs), DEFINE_PROP_END_OF_LIST(), }; =20 diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index e489830478ee..209e8f5b4c08 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -27,6 +27,8 @@ #define NVME_MAX_CONTROLLERS 256 #define NVME_MAX_NAMESPACES 256 #define NVME_EUI64_DEFAULT ((uint64_t)0x5254000000000000) +#define NVME_FDP_MAX_EVENTS 63 +#define NVME_FDP_MAXPIDS 128 =20 QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > NVME_NSID_BROADCAST - 1); =20 @@ -45,8 +47,47 @@ typedef struct NvmeBus { OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS) #define SUBSYS_SLOT_RSVD (void *)0xFFFF =20 +typedef struct NvmeReclaimUnit { + uint64_t ruamw; +} NvmeReclaimUnit; + +typedef struct NvmeRuHandle { + uint8_t ruht; + uint8_t ruha; + uint64_t event_filter; + uint8_t lbafi; + uint64_t ruamw; + + /* reclaim units indexed by reclaim group */ + NvmeReclaimUnit *rus; +} NvmeRuHandle; + +typedef struct NvmeFdpEventBuffer { + NvmeFdpEvent events[NVME_FDP_MAX_EVENTS]; + unsigned int nelems; + unsigned int start; + unsigned int next; +} NvmeFdpEventBuffer; + typedef struct NvmeEnduranceGroup { uint8_t event_conf; + + struct { + NvmeFdpEventBuffer host_events, ctrl_events; + + uint16_t nruh; + uint16_t nrg; + uint8_t rgif; + uint64_t runs; + + uint64_t hbmw; + uint64_t mbmw; + uint64_t mbe; + + bool enabled; + + NvmeRuHandle *ruhs; + } fdp; } NvmeEnduranceGroup; =20 typedef struct NvmeSubsystem { @@ -55,11 +96,19 @@ typedef struct NvmeSubsystem { uint8_t subnqn[256]; char *serial; =20 - NvmeCtrl *ctrls[NVME_MAX_CONTROLLERS]; - NvmeNamespace *namespaces[NVME_MAX_NAMESPACES + 1]; + NvmeCtrl *ctrls[NVME_MAX_CONTROLLERS]; + NvmeNamespace *namespaces[NVME_MAX_NAMESPACES + 1]; + NvmeEnduranceGroup endgrp; =20 struct { char *nqn; + + struct { + bool enabled; + uint64_t runs; + uint16_t nruh; + uint32_t nrg; + } fdp; } params; } NvmeSubsystem; =20 @@ -100,6 +149,21 @@ typedef struct NvmeZone { QTAILQ_ENTRY(NvmeZone) entry; } NvmeZone; =20 +#define FDP_EVT_MAX 0xff +#define NVME_FDP_MAX_NS_RUHS 32u +#define FDPVSS 0 + +static const uint8_t nvme_fdp_evf_shifts[FDP_EVT_MAX] =3D { + /* Host events */ + [FDP_EVT_RU_NOT_FULLY_WRITTEN] =3D 0, + [FDP_EVT_RU_ATL_EXCEEDED] =3D 1, + [FDP_EVT_CTRL_RESET_RUH] =3D 2, + [FDP_EVT_INVALID_PID] =3D 3, + /* CTRL events */ + [FDP_EVT_MEDIA_REALLOC] =3D 32, + [FDP_EVT_RUH_IMPLICIT_RU_CHANGE] =3D 33, +}; + typedef struct NvmeNamespaceParams { bool detached; bool shared; @@ -129,6 +193,10 @@ typedef struct NvmeNamespaceParams { uint32_t numzrwa; uint64_t zrwas; uint64_t zrwafg; + + struct { + char *ruhs; + } fdp; } NvmeNamespaceParams; =20 typedef struct NvmeNamespace { @@ -172,10 +240,17 @@ typedef struct NvmeNamespace { =20 NvmeNamespaceParams params; NvmeSubsystem *subsys; + NvmeEnduranceGroup *endgrp; =20 struct { uint32_t err_rec; } features; + + struct { + uint16_t nphs; + /* reclaim unit handle identifiers indexed by placement handle */ + uint16_t *phs; + } fdp; } NvmeNamespace; =20 static inline uint32_t nvme_nsid(NvmeNamespace *ns) @@ -279,6 +354,12 @@ static inline void nvme_aor_dec_active(NvmeNamespace *= ns) assert(ns->nr_active_zones >=3D 0); } =20 +static inline void nvme_fdp_stat_inc(uint64_t *a, uint64_t b) +{ + uint64_t ret =3D *a + b; + *a =3D ret < *a ? UINT64_MAX : ret; +} + void nvme_ns_init_format(NvmeNamespace *ns); int nvme_ns_setup(NvmeNamespace *ns, Error **errp); void nvme_ns_drain(NvmeNamespace *ns); diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c index 9d2643678b53..24ddec860e45 100644 --- a/hw/nvme/subsys.c +++ b/hw/nvme/subsys.c @@ -7,10 +7,13 @@ */ =20 #include "qemu/osdep.h" +#include "qemu/units.h" #include "qapi/error.h" =20 #include "nvme.h" =20 +#define NVME_DEFAULT_RU_SIZE (96 * MiB) + static int nvme_subsys_reserve_cntlids(NvmeCtrl *n, int start, int num) { NvmeSubsystem *subsys =3D n->subsys; @@ -109,13 +112,95 @@ void nvme_subsys_unregister_ctrl(NvmeSubsystem *subsy= s, NvmeCtrl *n) n->cntlid =3D -1; } =20 -static void nvme_subsys_setup(NvmeSubsystem *subsys) +static bool nvme_calc_rgif(uint16_t nruh, uint16_t nrg, uint8_t *rgif) +{ + uint16_t val; + unsigned int i; + + if (unlikely(nrg =3D=3D 1)) { + /* PIDRG_NORGI scenario, all of pid is used for PHID */ + *rgif =3D 0; + return true; + } + + val =3D nrg; + i =3D 0; + while (val) { + val >>=3D 1; + i++; + } + *rgif =3D i; + + /* ensure remaining bits suffice to represent number of phids in a RG = */ + if (unlikely((UINT16_MAX >> i) < nruh)) { + *rgif =3D 0; + return false; + } + + return true; +} + +static bool nvme_subsys_setup_fdp(NvmeSubsystem *subsys, Error **errp) +{ + NvmeEnduranceGroup *endgrp =3D &subsys->endgrp; + + if (!subsys->params.fdp.runs) { + error_setg(errp, "fdp.runs must be non-zero"); + return false; + } + + endgrp->fdp.runs =3D subsys->params.fdp.runs; + + if (!subsys->params.fdp.nrg) { + error_setg(errp, "fdp.nrg must be non-zero"); + return false; + } + + endgrp->fdp.nrg =3D subsys->params.fdp.nrg; + + if (!subsys->params.fdp.nruh) { + error_setg(errp, "fdp.nruh must be non-zero"); + return false; + } + + endgrp->fdp.nruh =3D subsys->params.fdp.nruh; + + if (!nvme_calc_rgif(endgrp->fdp.nruh, endgrp->fdp.nrg, &endgrp->fdp.rg= if)) { + error_setg(errp, + "cannot derive a valid rgif (nruh %"PRIu16" nrg %"PRIu3= 2")", + endgrp->fdp.nruh, endgrp->fdp.nrg); + return false; + } + + endgrp->fdp.ruhs =3D g_new(NvmeRuHandle, endgrp->fdp.nruh); + + for (uint16_t ruhid =3D 0; ruhid < endgrp->fdp.nruh; ruhid++) { + endgrp->fdp.ruhs[ruhid] =3D (NvmeRuHandle) { + .ruht =3D NVME_RUHT_INITIALLY_ISOLATED, + .ruha =3D NVME_RUHA_UNUSED, + }; + + endgrp->fdp.ruhs[ruhid].rus =3D g_new(NvmeReclaimUnit, endgrp->fdp= .nrg); + } + + endgrp->fdp.enabled =3D true; + + return true; +} + +static bool nvme_subsys_setup(NvmeSubsystem *subsys, Error **errp) { const char *nqn =3D subsys->params.nqn ? subsys->params.nqn : subsys->parent_obj.id; =20 snprintf((char *)subsys->subnqn, sizeof(subsys->subnqn), "nqn.2019-08.org.qemu:%s", nqn); + + if (subsys->params.fdp.enabled && !nvme_subsys_setup_fdp(subsys, errp)= ) { + return false; + } + + return true; } =20 static void nvme_subsys_realize(DeviceState *dev, Error **errp) @@ -124,11 +209,16 @@ static void nvme_subsys_realize(DeviceState *dev, Err= or **errp) =20 qbus_init(&subsys->bus, sizeof(NvmeBus), TYPE_NVME_BUS, dev, dev->id); =20 - nvme_subsys_setup(subsys); + nvme_subsys_setup(subsys, errp); } =20 static Property nvme_subsystem_props[] =3D { DEFINE_PROP_STRING("nqn", NvmeSubsystem, params.nqn), + DEFINE_PROP_BOOL("fdp", NvmeSubsystem, params.fdp.enabled, false), + DEFINE_PROP_SIZE("fdp.runs", NvmeSubsystem, params.fdp.runs, + NVME_DEFAULT_RU_SIZE), + DEFINE_PROP_UINT32("fdp.nrg", NvmeSubsystem, params.fdp.nrg, 1), + DEFINE_PROP_UINT16("fdp.nruh", NvmeSubsystem, params.fdp.nruh, 0), DEFINE_PROP_END_OF_LIST(), }; =20 diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events index b16f2260b4fd..7f7837e1a281 100644 --- a/hw/nvme/trace-events +++ b/hw/nvme/trace-events @@ -117,6 +117,7 @@ pci_nvme_clear_ns_reset(uint32_t state, uint64_t slba) = "zone state=3D%"PRIu32", sl pci_nvme_zoned_zrwa_implicit_flush(uint64_t zslba, uint32_t nlb) "zslba 0x= %"PRIx64" nlb %"PRIu32"" pci_nvme_pci_reset(void) "PCI Function Level Reset" pci_nvme_virt_mngmt(uint16_t cid, uint16_t act, uint16_t cntlid, const cha= r* rt, uint16_t nr) "cid %"PRIu16", act=3D0x%"PRIx16", ctrlid=3D%"PRIu16" %= s nr=3D%"PRIu16"" +pci_nvme_fdp_ruh_change(uint16_t rgid, uint16_t ruhid) "change RU on RUH r= gid=3D%"PRIu16", ruhid=3D%"PRIu16"" =20 # error conditions pci_nvme_err_mdts(size_t len) "len %zu" diff --git a/include/block/nvme.h b/include/block/nvme.h index d463d0ef5fdc..bb231d0b9ad0 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -1,6 +1,8 @@ #ifndef BLOCK_NVME_H #define BLOCK_NVME_H =20 +#include "hw/registerfields.h" + typedef struct QEMU_PACKED NvmeBar { uint64_t cap; uint32_t vs; @@ -631,7 +633,9 @@ enum NvmeIoCommands { NVME_CMD_WRITE_ZEROES =3D 0x08, NVME_CMD_DSM =3D 0x09, NVME_CMD_VERIFY =3D 0x0c, + NVME_CMD_IO_MGMT_RECV =3D 0x12, NVME_CMD_COPY =3D 0x19, + NVME_CMD_IO_MGMT_SEND =3D 0x1d, NVME_CMD_ZONE_MGMT_SEND =3D 0x79, NVME_CMD_ZONE_MGMT_RECV =3D 0x7a, NVME_CMD_ZONE_APPEND =3D 0x7d, @@ -724,7 +728,9 @@ typedef struct QEMU_PACKED NvmeRwCmd { uint64_t slba; uint16_t nlb; uint16_t control; - uint32_t dsmgmt; + uint8_t dsmgmt; + uint8_t rsvd; + uint16_t dspec; uint32_t reftag; uint16_t apptag; uint16_t appmask; @@ -895,6 +901,8 @@ enum NvmeStatusCodes { NVME_INVALID_PRP_OFFSET =3D 0x0013, NVME_CMD_SET_CMB_REJECTED =3D 0x002b, NVME_INVALID_CMD_SET =3D 0x002c, + NVME_FDP_DISABLED =3D 0x0029, + NVME_INVALID_PHID_LIST =3D 0x002a, NVME_LBA_RANGE =3D 0x0080, NVME_CAP_EXCEEDED =3D 0x0081, NVME_NS_NOT_READY =3D 0x0082, @@ -1031,6 +1039,10 @@ enum NvmeLogIdentifier { NVME_LOG_CHANGED_NSLIST =3D 0x04, NVME_LOG_CMD_EFFECTS =3D 0x05, NVME_LOG_ENDGRP =3D 0x09, + NVME_LOG_FDP_CONFS =3D 0x20, + NVME_LOG_FDP_RUH_USAGE =3D 0x21, + NVME_LOG_FDP_STATS =3D 0x22, + NVME_LOG_FDP_EVENTS =3D 0x23, }; =20 typedef struct QEMU_PACKED NvmePSD { @@ -1160,6 +1172,7 @@ enum NvmeIdCtrlOaes { enum NvmeIdCtrlCtratt { NVME_CTRATT_ENDGRPS =3D 1 << 4, NVME_CTRATT_ELBAS =3D 1 << 15, + NVME_CTRATT_FDPS =3D 1 << 19, }; =20 enum NvmeIdCtrlOacs { @@ -1273,6 +1286,8 @@ enum NvmeFeatureIds { NVME_TIMESTAMP =3D 0xe, NVME_HOST_BEHAVIOR_SUPPORT =3D 0x16, NVME_COMMAND_SET_PROFILE =3D 0x19, + NVME_FDP_MODE =3D 0x1d, + NVME_FDP_EVENTS =3D 0x1e, NVME_SOFTWARE_PROGRESS_MARKER =3D 0x80, NVME_FID_MAX =3D 0x100, }; @@ -1652,22 +1667,164 @@ typedef struct NvmeDirectiveIdentify { uint8_t unused1[31]; uint8_t enabled; uint8_t unused33[31]; - uint8_t rsvd64[4032]; + uint8_t persistent; + uint8_t unused65[31]; + uint8_t rsvd64[4000]; } NvmeDirectiveIdentify; =20 -enum NvmeDirective { - NVME_DIRECTIVE_SUPPORTED =3D 0x0, - NVME_DIRECTIVE_ENABLED =3D 0x1, -}; - enum NvmeDirectiveTypes { - NVME_DIRECTIVE_IDENTIFY =3D 0x0, + NVME_DIRECTIVE_IDENTIFY =3D 0x0, + NVME_DIRECTIVE_DATA_PLACEMENT =3D 0x2, }; =20 enum NvmeDirectiveOperations { NVME_DIRECTIVE_RETURN_PARAMS =3D 0x1, }; =20 +typedef struct QEMU_PACKED NvmeFdpConfsHdr { + uint16_t num_confs; + uint8_t version; + uint8_t rsvd3; + uint32_t size; + uint8_t rsvd8[8]; +} NvmeFdpConfsHdr; + +REG8(FDPA, 0x0) + FIELD(FDPA, RGIF, 0, 4) + FIELD(FDPA, VWC, 4, 1) + FIELD(FDPA, VALID, 7, 1); + +typedef struct QEMU_PACKED NvmeFdpDescrHdr { + uint16_t descr_size; + uint8_t fdpa; + uint8_t vss; + uint32_t nrg; + uint16_t nruh; + uint16_t maxpids; + uint32_t nnss; + uint64_t runs; + uint32_t erutl; + uint8_t rsvd28[36]; +} NvmeFdpDescrHdr; + +enum NvmeRuhType { + NVME_RUHT_INITIALLY_ISOLATED =3D 1, + NVME_RUHT_PERSISTENTLY_ISOLATED =3D 2, +}; + +typedef struct QEMU_PACKED NvmeRuhDescr { + uint8_t ruht; + uint8_t rsvd1[3]; +} NvmeRuhDescr; + +typedef struct QEMU_PACKED NvmeRuhuLog { + uint16_t nruh; + uint8_t rsvd2[6]; +} NvmeRuhuLog; + +enum NvmeRuhAttributes { + NVME_RUHA_UNUSED =3D 0, + NVME_RUHA_HOST =3D 1, + NVME_RUHA_CTRL =3D 2, +}; + +typedef struct QEMU_PACKED NvmeRuhuDescr { + uint8_t ruha; + uint8_t rsvd1[7]; +} NvmeRuhuDescr; + +typedef struct QEMU_PACKED NvmeFdpStatsLog { + uint64_t hbmw[2]; + uint64_t mbmw[2]; + uint64_t mbe[2]; + uint8_t rsvd48[16]; +} NvmeFdpStatsLog; + +typedef struct QEMU_PACKED NvmeFdpEventsLog { + uint32_t num_events; + uint8_t rsvd4[60]; +} NvmeFdpEventsLog; + +enum NvmeFdpEventType { + FDP_EVT_RU_NOT_FULLY_WRITTEN =3D 0x0, + FDP_EVT_RU_ATL_EXCEEDED =3D 0x1, + FDP_EVT_CTRL_RESET_RUH =3D 0x2, + FDP_EVT_INVALID_PID =3D 0x3, + FDP_EVT_MEDIA_REALLOC =3D 0x80, + FDP_EVT_RUH_IMPLICIT_RU_CHANGE =3D 0x81, +}; + +enum NvmeFdpEventFlags { + FDPEF_PIV =3D 1 << 0, + FDPEF_NSIDV =3D 1 << 1, + FDPEF_LV =3D 1 << 2, +}; + +typedef struct QEMU_PACKED NvmeFdpEvent { + uint8_t type; + uint8_t flags; + uint16_t pid; + uint64_t timestamp; + uint32_t nsid; + uint64_t type_specific[2]; + uint16_t rgid; + uint8_t ruhid; + uint8_t rsvd35[5]; + uint64_t vendor[3]; +} NvmeFdpEvent; + +typedef struct QEMU_PACKED NvmePhidList { + uint16_t nnruhd; + uint8_t rsvd2[6]; +} NvmePhidList; + +typedef struct QEMU_PACKED NvmePhidDescr { + uint8_t ruht; + uint8_t rsvd1; + uint16_t ruhid; +} NvmePhidDescr; + +REG32(FEAT_FDP, 0x0) + FIELD(FEAT_FDP, FDPE, 0, 1) + FIELD(FEAT_FDP, CONF_NDX, 8, 8); + +typedef struct QEMU_PACKED NvmeFdpEventDescr { + uint8_t evt; + uint8_t evta; +} NvmeFdpEventDescr; + +REG32(NVME_IOMR, 0x0) + FIELD(NVME_IOMR, MO, 0, 8) + FIELD(NVME_IOMR, MOS, 16, 16); + +enum NvmeIomr2Mo { + NVME_IOMR_MO_NOP =3D 0x0, + NVME_IOMR_MO_RUH_STATUS =3D 0x1, + NVME_IOMR_MO_VENDOR_SPECIFIC =3D 0x255, +}; + +typedef struct QEMU_PACKED NvmeRuhStatus { + uint8_t rsvd0[14]; + uint16_t nruhsd; +} NvmeRuhStatus; + +typedef struct QEMU_PACKED NvmeRuhStatusDescr { + uint16_t pid; + uint16_t ruhid; + uint32_t earutr; + uint64_t ruamw; + uint8_t rsvd16[16]; +} NvmeRuhStatusDescr; + +REG32(NVME_IOMS, 0x0) + FIELD(NVME_IOMS, MO, 0, 8) + FIELD(NVME_IOMS, MOS, 16, 16); + +enum NvmeIoms2Mo { + NVME_IOMS_MO_NOP =3D 0x0, + NVME_IOMS_MO_RUH_UPDATE =3D 0x1, +}; + static inline void _nvme_check_size(void) { QEMU_BUILD_BUG_ON(sizeof(NvmeBar) !=3D 4096); --=20 2.39.2