From nobody Thu May 2 04:22:13 2024 Delivered-To: importer@patchew.org Received-SPF: none (zoho.com: 80.81.252.135 is neither permitted nor denied by domain of seabios.org) client-ip=80.81.252.135; envelope-from=seabios-bounces@seabios.org; helo=mail.coreboot.org; Authentication-Results: mx.zoho.com; spf=none (zoho.com: 80.81.252.135 is neither permitted nor denied by domain of seabios.org) smtp.mailfrom=seabios-bounces@seabios.org; Return-Path: Received: from mail.coreboot.org (mail.coreboot.org [80.81.252.135]) by mx.zohomail.com with SMTPS id 1486976740805871.8087799816109; Mon, 13 Feb 2017 01:05:40 -0800 (PST) Received: from [127.0.0.1] (helo=ra.coresystems.de) by mail.coreboot.org with esmtp (Exim 4.86_2) (envelope-from ) id 1cdCZP-00041m-Ew; Mon, 13 Feb 2017 10:05:15 +0100 Received: from smtp-fw-6002.amazon.com ([52.95.49.90]) by mail.coreboot.org with esmtps (TLSv1.2:RC4-SHA:128) (Exim 4.86_2) (envelope-from ) id 1cdCZE-00040a-4S for seabios@seabios.org; Mon, 13 Feb 2017 10:05:13 +0100 Received: from iad6-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-60015.pdx1.amazon.com) ([10.124.125.6]) by smtp-border-fw-out-6002.iad6.amazon.com with ESMTP/TLS/DHE-RSA-AES256-SHA; 13 Feb 2017 09:05:02 +0000 Received: from u54ee758033e8582d9453.ant.amazon.com (pdx1-ws-svc-lb16-vlan3.amazon.com [10.239.138.214]) by email-inbound-relay-60015.pdx1.amazon.com (8.14.7/8.14.7) with ESMTP id v1D94vIq007750 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 13 Feb 2017 09:04:58 GMT Received: from u54ee758033e8582d9453.ant.amazon.com (localhost [127.0.0.1]) by u54ee758033e8582d9453.ant.amazon.com (8.15.2/8.15.2/Debian-3) with ESMTP id v1D951at011799; Mon, 13 Feb 2017 10:05:01 +0100 Received: (from jsteckli@localhost) by u54ee758033e8582d9453.ant.amazon.com (8.15.2/8.15.2/Submit) id v1D9504u011796; Mon, 13 Feb 2017 10:05:00 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1486976704; x=1518512704; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Phbo7VsdC2kQxw9uxcPDrrekdayxpL7apoV9jL88BKI=; b=fCUoLVHXm6n8pPNl+vTarxV7e/Ics3NpjUEn0q+4yix/LMVGUbMtxZsA v6yI2YH3bgK7AHiMnM5vaimlMpHZQf1JDQcGPdHJEDA1nHe0V0gp4QtM+ Vc7fVKYcgsBPIdYpEwr1yRL+qqGKgi+93lAAukhA8W+ppSmviG6YDq739 I=; X-IronPort-AV: E=Sophos;i="5.35,155,1484006400"; d="scan'208";a="240729596" From: Julian Stecklina To: seabios@seabios.org Date: Mon, 13 Feb 2017 10:03:59 +0100 Message-Id: <1486976639-11613-1-git-send-email-jsteckli@amazon.de> X-Mailer: git-send-email 2.7.4 In-Reply-To: <20170211171139.GA30652@morn.lan> References: <20170211171139.GA30652@morn.lan> X-Spam-Score: -14.6 (--------------) Subject: [SeaBIOS] [PATCH v3] block: add NVMe boot support X-BeenThere: seabios@seabios.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: SeaBIOS mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Errors-To: seabios-bounces@seabios.org Sender: "SeaBIOS" X-Duff: Orig. Duff, Duff Lite, Duff Dry, Duff Dark, Raspberry Duff, Lady Duff, Red Duff, Tartar Control Duff X-ZohoMail: RSF_4 Z_629925259 SPT_0 Content-Type: text/plain; charset="utf-8" This patch enables SeaBIOS to boot from NVMe. Finding namespaces and basic I/O works. Testing has been done in qemu and so far it works with Grub, syslinux, and the FreeBSD loader. You need a recent Qemu (>=3D 2.7.0), because older versions have buggy NVMe support. The NVMe code is currently only enabled on Qemu due to lack of testing on real hardware. Signed-off-by: Julian Stecklina --- Makefile | 2 +- src/Kconfig | 6 + src/block.c | 4 + src/block.h | 1 + src/hw/nvme-int.h | 199 +++++++++++++++++ src/hw/nvme.c | 655 ++++++++++++++++++++++++++++++++++++++++++++++++++= ++++ src/hw/nvme.h | 17 ++ src/hw/pci_ids.h | 1 + 8 files changed, 884 insertions(+), 1 deletion(-) create mode 100644 src/hw/nvme-int.h create mode 100644 src/hw/nvme.c create mode 100644 src/hw/nvme.h diff --git a/Makefile b/Makefile index 3b94ee0..946df7e 100644 --- a/Makefile +++ b/Makefile @@ -43,7 +43,7 @@ SRC32FLAT=3D$(SRCBOTH) post.c e820map.c malloc.c romfile.= c x86.c optionroms.c \ fw/paravirt.c fw/shadow.c fw/pciinit.c fw/smm.c fw/smp.c fw/mtrr.c fw/= xen.c \ fw/acpi.c fw/mptable.c fw/pirtable.c fw/smbios.c fw/romfile_loader.c \ hw/virtio-ring.c hw/virtio-pci.c hw/virtio-blk.c hw/virtio-scsi.c \ - hw/tpm_drivers.c + hw/tpm_drivers.c hw/nvme.c SRC32SEG=3Dstring.c output.c pcibios.c apm.c stacks.c hw/pci.c hw/serialio= .c DIRS=3Dsrc src/hw src/fw vgasrc =20 diff --git a/src/Kconfig b/src/Kconfig index 457d082..e1b83a4 100644 --- a/src/Kconfig +++ b/src/Kconfig @@ -227,6 +227,12 @@ menu "Hardware support" help Support floppy images stored in coreboot flash or from QEMU fw_cfg. + config NVME + depends on DRIVES && QEMU_HARDWARE + bool "NVMe controllers" + default y + help + Support for NVMe disk code. =20 config PS2PORT depends on KEYBOARD || MOUSE diff --git a/src/block.c b/src/block.c index f7280cf..d104f6d 100644 --- a/src/block.c +++ b/src/block.c @@ -20,6 +20,7 @@ #include "hw/usb-uas.h" // uas_process_op #include "hw/virtio-blk.h" // process_virtio_blk_op #include "hw/virtio-scsi.h" // virtio_scsi_process_op +#include "hw/nvme.h" // nvme_process_op #include "malloc.h" // malloc_low #include "output.h" // dprintf #include "stacks.h" // call32 @@ -502,6 +503,7 @@ block_setup(void) megasas_setup(); pvscsi_setup(); mpt_scsi_setup(); + nvme_setup(); } =20 // Fallback handler for command requests not implemented by drivers @@ -571,6 +573,8 @@ process_op_32(struct disk_op_s *op) return virtio_scsi_process_op(op); case DTYPE_PVSCSI: return pvscsi_process_op(op); + case DTYPE_NVME: + return nvme_process_op(op); default: return process_op_both(op); } diff --git a/src/block.h b/src/block.h index 0f15ff9..f03ec38 100644 --- a/src/block.h +++ b/src/block.h @@ -82,6 +82,7 @@ struct drive_s { #define DTYPE_PVSCSI 0x83 #define DTYPE_MPT_SCSI 0x84 #define DTYPE_SDCARD 0x90 +#define DTYPE_NVME 0x91 =20 #define MAXDESCSIZE 80 =20 diff --git a/src/hw/nvme-int.h b/src/hw/nvme-int.h new file mode 100644 index 0000000..9f95dd8 --- /dev/null +++ b/src/hw/nvme-int.h @@ -0,0 +1,199 @@ +// NVMe datastructures and constants +// +// Copyright 2017 Amazon.com, Inc. or its affiliates. +// +// This file may be distributed under the terms of the GNU LGPLv3 license. + +#ifndef __NVME_INT_H +#define __NVME_INT_H + +#include "types.h" // u32 +#include "pcidevice.h" // struct pci_device + +/* Data structures */ + +/* The register file of a NVMe host controller. This struct follows the na= ming + scheme in the NVMe specification. */ +struct nvme_reg { + u64 cap; /* controller capabilities */ + u32 vs; /* version */ + u32 intms; /* interrupt mask set */ + u32 intmc; /* interrupt mask clear */ + u32 cc; /* controller configuration */ + u32 _res0; + u32 csts; /* controller status */ + u32 _res1; + u32 aqa; /* admin queue attributes */ + u64 asq; /* admin submission queue base address */ + u64 acq; /* admin completion queue base address */ +}; + +/* Submission queue entry */ +struct nvme_sqe { + union { + u32 dword[16]; + struct { + u32 cdw0; /* Command DWORD 0 */ + u32 nsid; /* Namespace ID */ + u64 _res0; + u64 mptr; /* metadata ptr */ + + u64 dptr_prp1; + u64 dptr_prp2; + }; + }; +}; + +/* Completion queue entry */ +struct nvme_cqe { + union { + u32 dword[4]; + struct { + u32 cdw0; + u32 _res0; + u16 sq_head; + u16 sq_id; + u16 cid; + u16 status; + }; + }; +}; + +/* The common part of every submission or completion queue. */ +struct nvme_queue { + u32 *dbl; /* doorbell */ + u16 mask; /* length - 1 */ +}; + +struct nvme_cq { + struct nvme_queue common; + struct nvme_cqe *cqe; + + /* We have read upto (but not including) this entry in the queue. */ + u16 head; + + /* The current phase bit the controller uses to indicate that it has w= ritten + a new entry. This is inverted after each wrap. */ + unsigned phase : 1; +}; + +struct nvme_sq { + struct nvme_queue common; + struct nvme_sqe *sqe; + + /* Corresponding completion queue. We only support a single SQ per CQ.= */ + struct nvme_cq *cq; + + /* The last entry the controller has fetched. */ + u16 head; + + /* The last value we have written to the tail doorbell. */ + u16 tail; +}; + +struct nvme_ctrl { + struct pci_device *pci; + struct nvme_reg volatile *reg; + + u32 doorbell_stride; /* in bytes */ + + struct nvme_sq admin_sq; + struct nvme_cq admin_cq; + + u32 ns_count; + struct nvme_namespace *ns; + + struct nvme_sq io_sq; + struct nvme_cq io_cq; +}; + +struct nvme_namespace { + struct drive_s drive; + struct nvme_ctrl *ctrl; + + u32 ns_id; + + u64 lba_count; /* The total amount of sectors. */ + + u32 block_size; + u32 metadata_size; + + /* Page aligned buffer of size NVME_PAGE_SIZE. */ + char *dma_buffer; +}; + +/* Data structures for NVMe admin identify commands */ + +struct nvme_identify_ctrl { + u16 vid; + u16 ssvid; + char sn[20]; + char mn[40]; + char fr[8]; + + char _boring[516 - 72]; + + u32 nn; /* number of namespaces */ +}; + +struct nvme_identify_ns_list { + u32 ns_id[1024]; +}; + +struct nvme_lba_format { + u16 ms; + u8 lbads; + u8 rp; + u8 res; +}; + +struct nvme_identify_ns { + u64 nsze; + u64 ncap; + u64 nuse; + u8 nsfeat; + u8 nlbaf; + u8 flbas; + + char _boring[128 - 27]; + + struct nvme_lba_format lbaf[16]; +}; + +union nvme_identify { + struct nvme_identify_ns ns; + struct nvme_identify_ctrl ctrl; + struct nvme_identify_ns_list ns_list; +}; + +/* NVMe constants */ + +#define NVME_CAP_CSS_NVME (1ULL << 37) + +#define NVME_CSTS_FATAL (1U << 1) +#define NVME_CSTS_RDY (1U << 0) + +#define NVME_CC_EN (1U << 0) + +#define NVME_SQE_OPC_ADMIN_CREATE_IO_SQ 1U +#define NVME_SQE_OPC_ADMIN_CREATE_IO_CQ 5U +#define NVME_SQE_OPC_ADMIN_IDENTIFY 6U + +#define NVME_SQE_OPC_IO_WRITE 1U +#define NVME_SQE_OPC_IO_READ 2U + +#define NVME_ADMIN_IDENTIFY_CNS_ID_NS 0U +#define NVME_ADMIN_IDENTIFY_CNS_ID_CTRL 1U +#define NVME_ADMIN_IDENTIFY_CNS_GET_NS_LIST 2U + +#define NVME_CQE_DW3_P (1U << 16) + +#define NVME_PAGE_SIZE 4096 + +/* Length for the queue entries. */ +#define NVME_SQE_SIZE_LOG 6 +#define NVME_CQE_SIZE_LOG 4 + +#endif + +/* EOF */ diff --git a/src/hw/nvme.c b/src/hw/nvme.c new file mode 100644 index 0000000..31edf29 --- /dev/null +++ b/src/hw/nvme.c @@ -0,0 +1,655 @@ +// Low level NVMe disk access +// +// Copyright 2017 Amazon.com, Inc. or its affiliates. +// +// This file may be distributed under the terms of the GNU LGPLv3 license. + +#include "blockcmd.h" +#include "fw/paravirt.h" // runningOnQEMU +#include "malloc.h" // malloc_high +#include "output.h" // dprintf +#include "pci.h" +#include "pci_ids.h" // PCI_CLASS_STORAGE_NVME +#include "pci_regs.h" // PCI_BASE_ADDRESS_0 +#include "pcidevice.h" // foreachpci +#include "stacks.h" // yield +#include "std/disk.h" // DISK_RET_ +#include "string.h" // memset +#include "util.h" // boot_add_hd +#include "x86.h" // readl + +#include "nvme.h" +#include "nvme-int.h" + +static void * +zalloc_page_aligned(struct zone_s *zone, u32 size) +{ + void *res =3D _malloc(zone, size, NVME_PAGE_SIZE); + if (res) memset(res, 0, size); + return res; +} + +static void +nvme_init_queue_common(struct nvme_ctrl *ctrl, struct nvme_queue *q, u16 q= _idx, + u16 length) +{ + memset(q, 0, sizeof(*q)); + q->dbl =3D (u32 *)((char *)ctrl->reg + 0x1000 + q_idx * ctrl->doorbell= _stride); + dprintf(3, " q %p q_idx %u dbl %p\n", q, q_idx, q->dbl); + q->mask =3D length - 1; +} + +static void +nvme_init_sq(struct nvme_ctrl *ctrl, struct nvme_sq *sq, u16 q_idx, u16 le= ngth, + struct nvme_cq *cq) +{ + nvme_init_queue_common(ctrl, &sq->common, q_idx, length); + sq->sqe =3D zalloc_page_aligned(&ZoneHigh, sizeof(*sq->sqe) * length); + dprintf(3, "sq %p q_idx %u sqe %p\n", sq, q_idx, sq->sqe); + sq->cq =3D cq; + sq->head =3D 0; + sq->tail =3D 0; +} + +static void +nvme_init_cq(struct nvme_ctrl *ctrl, struct nvme_cq *cq, u16 q_idx, u16 le= ngth) +{ + nvme_init_queue_common(ctrl, &cq->common, q_idx, length); + cq->cqe =3D zalloc_page_aligned(&ZoneHigh, sizeof(*cq->cqe) * length); + + cq->head =3D 0; + + /* All CQE phase bits are initialized to zero. This means initially we= wait + for the host controller to set these to 1. */ + cq->phase =3D 1; +} + +static int +nvme_poll_cq(struct nvme_cq *cq) +{ + u32 dw3 =3D readl(&cq->cqe[cq->head].dword[3]); + return (!!(dw3 & NVME_CQE_DW3_P) =3D=3D cq->phase); +} + +static int +nvme_is_cqe_success(struct nvme_cqe const *cqe) +{ + return (cqe->status & 0xFF) >> 1 =3D=3D 0; +} + + +static struct nvme_cqe +nvme_error_cqe(void) +{ + struct nvme_cqe r; + + /* 0xFF is a vendor specific status code !=3D success. Should be okay = for + indicating failure. */ + memset(&r, 0xFF, sizeof(r)); + return r; +} + +static struct nvme_cqe +nvme_consume_cqe(struct nvme_sq *sq) +{ + struct nvme_cq *cq =3D sq->cq; + + if (!nvme_poll_cq(cq)) { + /* Cannot consume a completion queue entry, if there is none ready= . */ + return nvme_error_cqe(); + } + + struct nvme_cqe *cqe =3D &cq->cqe[cq->head]; + u16 cq_next_head =3D (cq->head + 1) & cq->common.mask; + dprintf(4, "cq %p head %u -> %u\n", cq, cq->head, cq_next_head); + if (cq_next_head < cq->head) { + dprintf(3, "cq %p wrap\n", cq); + cq->phase =3D ~cq->phase; + } + cq->head =3D cq_next_head; + + /* Update the submission queue head. */ + if (cqe->sq_head !=3D sq->head) { + sq->head =3D cqe->sq_head; + dprintf(4, "sq %p advanced to %u\n", sq, cqe->sq_head); + } + + /* Tell the controller that we consumed the completion. */ + writel(cq->common.dbl, cq->head); + + return *cqe; +} + +static struct nvme_cqe +nvme_wait(struct nvme_sq *sq) +{ + static const unsigned nvme_timeout =3D 500 /* ms */; + u32 to =3D timer_calc(nvme_timeout); + while (!nvme_poll_cq(sq->cq)) { + yield(); + + if (timer_check(to)) { + warn_timeout(); + return nvme_error_cqe(); + } + } + + return nvme_consume_cqe(sq); +} + +/* Returns the next submission queue entry (or NULL if the queue is full).= It + also fills out Command Dword 0 and clears the rest. */ +static struct nvme_sqe * +nvme_get_next_sqe(struct nvme_sq *sq, u8 opc, void *metadata, void *data) +{ + if (((sq->head + 1) & sq->common.mask) =3D=3D sq->tail) { + dprintf(3, "submission queue is full"); + return NULL; + } + + struct nvme_sqe *sqe =3D &sq->sqe[sq->tail]; + dprintf(4, "sq %p next_sqe %u\n", sq, sq->tail); + + memset(sqe, 0, sizeof(*sqe)); + sqe->cdw0 =3D opc | (sq->tail << 16 /* CID */); + sqe->mptr =3D (u32)metadata; + sqe->dptr_prp1 =3D (u32)data; + + if (sqe->dptr_prp1 & (NVME_PAGE_SIZE - 1)) { + /* Data buffer not page aligned. */ + warn_internalerror(); + } + + return sqe; +} + +/* Call this after you've filled out an sqe that you've got from nvme_get_= next_sqe. */ +static void +nvme_commit_sqe(struct nvme_sq *sq) +{ + dprintf(4, "sq %p commit_sqe %u\n", sq, sq->tail); + sq->tail =3D (sq->tail + 1) & sq->common.mask; + writel(sq->common.dbl, sq->tail); +} + +/* Perform an identify command on the admin queue and return the resulting + buffer. This may be a NULL pointer, if something failed. This function + cannot be used after initialization, because it uses buffers in tmp zon= e. */ +static union nvme_identify * +nvme_admin_identify(struct nvme_ctrl *ctrl, u8 cns, u32 nsid) +{ + union nvme_identify *identify_buf =3D zalloc_page_aligned(&ZoneTmpHigh= , 4096); + if (!identify_buf) { + /* Could not allocate identify buffer. */ + warn_internalerror(); + return NULL; + } + + struct nvme_sqe *cmd_identify; + cmd_identify =3D nvme_get_next_sqe(&ctrl->admin_sq, + NVME_SQE_OPC_ADMIN_IDENTIFY, NULL, + identify_buf); + + if (!cmd_identify) { + warn_internalerror(); + goto error; + } + + cmd_identify->nsid =3D nsid; + cmd_identify->dword[10] =3D cns; + + nvme_commit_sqe(&ctrl->admin_sq); + + struct nvme_cqe cqe =3D nvme_wait(&ctrl->admin_sq); + + if (!nvme_is_cqe_success(&cqe)) { + goto error; + } + + return identify_buf; + error: + free(identify_buf); + return NULL; +} + +static struct nvme_identify_ctrl * +nvme_admin_identify_ctrl(struct nvme_ctrl *ctrl) +{ + return &nvme_admin_identify(ctrl, NVME_ADMIN_IDENTIFY_CNS_ID_CTRL, 0)-= >ctrl; +} + +static struct nvme_identify_ns_list * +nvme_admin_identify_get_ns_list(struct nvme_ctrl *ctrl) +{ + return &nvme_admin_identify(ctrl, NVME_ADMIN_IDENTIFY_CNS_GET_NS_LIST, + 0)->ns_list; +} + +static struct nvme_identify_ns * +nvme_admin_identify_ns(struct nvme_ctrl *ctrl, u32 ns_id) +{ + return &nvme_admin_identify(ctrl, NVME_ADMIN_IDENTIFY_CNS_ID_NS, + ns_id)->ns; +} + +static void +nvme_probe_ns(struct nvme_ctrl *ctrl, struct nvme_namespace *ns, u32 ns_id) +{ + ns->ctrl =3D ctrl; + ns->ns_id =3D ns_id; + + struct nvme_identify_ns *id =3D nvme_admin_identify_ns(ctrl, ns_id); + if (!id) { + dprintf(2, "NVMe couldn't identify namespace %u.\n", ns_id); + goto free_buffer; + } + + u8 current_lba_format =3D id->flbas & 0xF; + if (current_lba_format > id->nlbaf) { + dprintf(2, "NVMe NS %u: current LBA format %u is beyond what the " + " namespace supports (%u)?\n", + ns_id, current_lba_format, id->nlbaf + 1); + goto free_buffer; + } + + ns->lba_count =3D id->nsze; + + struct nvme_lba_format *fmt =3D &id->lbaf[current_lba_format]; + + ns->block_size =3D 1U << fmt->lbads; + ns->metadata_size =3D fmt->ms; + + if (ns->block_size > NVME_PAGE_SIZE) { + /* If we see devices that trigger this path, we need to increase o= ur + buffer size. */ + warn_internalerror(); + goto free_buffer; + } + + ns->drive.cntl_id =3D ns - ctrl->ns; + ns->drive.removable =3D 0; + ns->drive.type =3D DTYPE_NVME; + ns->drive.blksize =3D ns->block_size; + ns->drive.sectors =3D ns->lba_count; + + ns->dma_buffer =3D zalloc_page_aligned(&ZoneHigh, NVME_PAGE_SIZE); + + char *desc =3D znprintf(MAXDESCSIZE, "NVMe NS %u: %llu MiB (%llu %u-by= te " + "blocks + %u-byte metadata)\n", + ns_id, (ns->lba_count * ns->block_size) >> 20, + ns->lba_count, ns->block_size, ns->metadata_size= ); + + dprintf(3, "%s", desc); + boot_add_hd(&ns->drive, desc, bootprio_find_pci_device(ctrl->pci)); + + free_buffer: + free (id); + } + +/* Returns 0 on success. */ +static int +nvme_create_io_cq(struct nvme_ctrl *ctrl, struct nvme_cq *cq, u16 q_idx) +{ + struct nvme_sqe *cmd_create_cq; + + nvme_init_cq(ctrl, cq, q_idx, NVME_PAGE_SIZE / sizeof(struct nvme_cqe)= ); + cmd_create_cq =3D nvme_get_next_sqe(&ctrl->admin_sq, + NVME_SQE_OPC_ADMIN_CREATE_IO_CQ, NUL= L, + cq->cqe); + if (!cmd_create_cq) { + return -1; + } + + cmd_create_cq->dword[10] =3D (cq->common.mask << 16) | (q_idx >> 1); + cmd_create_cq->dword[11] =3D 1 /* physically contiguous */; + + nvme_commit_sqe(&ctrl->admin_sq); + + struct nvme_cqe cqe =3D nvme_wait(&ctrl->admin_sq); + + if (!nvme_is_cqe_success(&cqe)) { + dprintf(2, "create io cq failed: %08x %08x %08x %08x\n", + cqe.dword[0], cqe.dword[1], cqe.dword[2], cqe.dword[3]); + + return -1; + } + + return 0; +} + +/* Returns 0 on success. */ +static int +nvme_create_io_sq(struct nvme_ctrl *ctrl, struct nvme_sq *sq, u16 q_idx, s= truct nvme_cq *cq) +{ + struct nvme_sqe *cmd_create_sq; + + nvme_init_sq(ctrl, sq, q_idx, NVME_PAGE_SIZE / sizeof(struct nvme_cqe)= , cq); + cmd_create_sq =3D nvme_get_next_sqe(&ctrl->admin_sq, + NVME_SQE_OPC_ADMIN_CREATE_IO_SQ, NUL= L, + sq->sqe); + if (!cmd_create_sq) { + return -1; + } + + cmd_create_sq->dword[10] =3D (sq->common.mask << 16) | (q_idx >> 1); + cmd_create_sq->dword[11] =3D (q_idx >> 1) << 16 | 1 /* contiguous */; + dprintf(3, "sq %p create dword10 %08x dword11 %08x\n", sq, + cmd_create_sq->dword[10], cmd_create_sq->dword[11]); + + nvme_commit_sqe(&ctrl->admin_sq); + + struct nvme_cqe cqe =3D nvme_wait(&ctrl->admin_sq); + + if (!nvme_is_cqe_success(&cqe)) { + dprintf(2, "create io sq failed: %08x %08x %08x %08x\n", + cqe.dword[0], cqe.dword[1], cqe.dword[2], cqe.dword[3]); + return -1; + } + + return 0; +} + +/* Reads count sectors into buf. Returns DISK_RET_*. The buffer cannot cro= ss + page boundaries. */ +static int +nvme_io_readwrite(struct nvme_namespace *ns, u64 lba, char *buf, u16 count, + int write) +{ + u32 buf_addr =3D (u32)buf; + + if ((buf_addr & 0x3) || + ((buf_addr & ~(NVME_PAGE_SIZE - 1)) !=3D + ((buf_addr + ns->block_size * count - 1) & ~(NVME_PAGE_SIZE - 1))= )) { + /* Buffer is misaligned or crosses page boundary */ + warn_internalerror(); + return DISK_RET_EBADTRACK; + } + + struct nvme_sqe *io_read =3D nvme_get_next_sqe(&ns->ctrl->io_sq, + write ? NVME_SQE_OPC_IO_W= RITE + : NVME_SQE_OPC_IO_R= EAD, + NULL, buf); + io_read->nsid =3D ns->ns_id; + io_read->dword[10] =3D (u32)lba; + io_read->dword[11] =3D (u32)(lba >> 32); + io_read->dword[12] =3D (1U << 31 /* limited retry */) | (count - 1); + + nvme_commit_sqe(&ns->ctrl->io_sq); + + struct nvme_cqe cqe =3D nvme_wait(&ns->ctrl->io_sq); + + if (!nvme_is_cqe_success(&cqe)) { + dprintf(2, "read io: %08x %08x %08x %08x\n", + cqe.dword[0], cqe.dword[1], cqe.dword[2], cqe.dword[3]); + + return DISK_RET_EBADTRACK; + } + + return DISK_RET_SUCCESS; +} + + +static int +nvme_create_io_queues(struct nvme_ctrl *ctrl) +{ + if (nvme_create_io_cq(ctrl, &ctrl->io_cq, 3)) + return -1; + + if (nvme_create_io_sq(ctrl, &ctrl->io_sq, 2, &ctrl->io_cq)) + return -1; + + return 0; +} + +/* Waits for CSTS.RDY to match rdy. Returns 0 on success. */ +static int +nvme_wait_csts_rdy(struct nvme_ctrl *ctrl, unsigned rdy) +{ + u32 const max_to =3D 500 /* ms */ * ((ctrl->reg->cap >> 24) & 0xFFU); + u32 to =3D timer_calc(max_to); + u32 csts; + + while (rdy !=3D ((csts =3D ctrl->reg->csts) & NVME_CSTS_RDY)) { + yield(); + + if (csts & NVME_CSTS_FATAL) { + dprintf(3, "NVMe fatal error during controller shutdown\n"); + return -1; + } + + if (timer_check(to)) { + warn_timeout(); + return -1; + } + } + + return 0; +} + +/* Returns 0 on success. */ +static int +nvme_controller_enable(struct nvme_ctrl *ctrl) +{ + pci_enable_busmaster(ctrl->pci); + + /* Turn the controller off. */ + ctrl->reg->cc =3D 0; + if (nvme_wait_csts_rdy(ctrl, 0)) { + dprintf(2, "NVMe fatal error during controller shutdown\n"); + return -1; + } + + ctrl->doorbell_stride =3D 4U << ((ctrl->reg->cap >> 32) & 0xF); + + nvme_init_cq(ctrl, &ctrl->admin_cq, 1, + NVME_PAGE_SIZE / sizeof(struct nvme_cqe)); + + nvme_init_sq(ctrl, &ctrl->admin_sq, 0, + NVME_PAGE_SIZE / sizeof(struct nvme_sqe), &ctrl->admin_cq= ); + + ctrl->reg->aqa =3D ctrl->admin_cq.common.mask << 16 + | ctrl->admin_sq.common.mask; + + /* Create the admin queue pair */ + if (!ctrl->admin_sq.sqe || !ctrl->admin_cq.cqe) goto out_of_memory; + + ctrl->reg->asq =3D (u32)ctrl->admin_sq.sqe; + ctrl->reg->acq =3D (u32)ctrl->admin_cq.cqe; + + dprintf(3, " admin submission queue: %p\n", ctrl->admin_sq.sqe); + dprintf(3, " admin completion queue: %p\n", ctrl->admin_cq.cqe); + + ctrl->reg->cc =3D NVME_CC_EN | (NVME_CQE_SIZE_LOG << 20) + | (NVME_SQE_SIZE_LOG << 16 /* IOSQES */); + + if (nvme_wait_csts_rdy(ctrl, 1)) { + dprintf(2, "NVMe fatal error while enabling controller\n"); + goto failed; + } + /* The admin queue is set up and the controller is ready. Let's figure= out + what namespaces we have. */ + + struct nvme_identify_ctrl *identify =3D nvme_admin_identify_ctrl(ctrl); + + if (!identify) { + dprintf(2, "NVMe couldn't identify controller.\n"); + goto failed; + } + + /* TODO Print model/serial info. */ + dprintf(3, "NVMe has %u namespace%s.\n", + identify->nn, (identify->nn =3D=3D 1) ? "" : "s"); + + ctrl->ns_count =3D identify->nn; + free(identify); + + if ((ctrl->ns_count =3D=3D 0) || nvme_create_io_queues(ctrl)) { + /* No point to continue, if the controller says it doesn't have + namespaces or we couldn't create I/O queues. */ + goto failed; + } + + ctrl->ns =3D malloc_fseg(sizeof(*ctrl->ns) * ctrl->ns_count); + if (!ctrl->ns) goto out_of_memory; + memset(ctrl->ns, 0, sizeof(*ctrl->ns) * ctrl->ns_count); + + struct nvme_identify_ns_list *ns_list =3D nvme_admin_identify_get_ns_l= ist(ctrl); + if (!ns_list) { + dprintf(2, "NVMe couldn't get namespace list.\n"); + goto failed; + } + + /* Populate namespace IDs */ + int ns_idx; + for (ns_idx =3D 0; + ns_idx < ARRAY_SIZE(ns_list->ns_id) + && ns_idx < ctrl->ns_count + && ns_list->ns_id[ns_idx]; + ns_idx++) { + nvme_probe_ns(ctrl, &ctrl->ns[ns_idx], ns_list->ns_id[ns_idx]); + } + + free(ns_list); + + /* If for some reason the namespace list gives us fewer namespaces, we= just + go along. */ + if (ns_idx !=3D ctrl->ns_count) { + dprintf(2, "NVMe namespace list has only %u namespaces?\n", ns_idx= ); + ctrl->ns_count =3D ns_idx; + } + + dprintf(3, "NVMe initialization complete!\n"); + return 0; + + out_of_memory: + warn_noalloc(); + failed: + free(ctrl->admin_sq.sqe); + free(ctrl->admin_cq.cqe); + free(ctrl->ns); + return -1; +} + +/* Initialize an NVMe controller and detect its drives. */ +static void +nvme_controller_setup(void *opaque) +{ + struct pci_device *pci =3D opaque; + + struct nvme_reg volatile *reg =3D pci_enable_membar(pci, PCI_BASE_ADDR= ESS_0); + if (!reg) + return; + + u32 version =3D reg->vs; + dprintf(3, "Found NVMe controller with version %u.%u.%u.\n", + version >> 16, (version >> 8) & 0xFF, version & 0xFF); + dprintf(3, " Capabilities %016llx\n", reg->cap); + + if (version < 0x00010100U) { + dprintf(3, "Need at least 1.1.0! Skipping.\n"); + return; + } + + if (~reg->cap & NVME_CAP_CSS_NVME) { + dprintf(3, "Controller doesn't speak NVMe command set. Skipping.\n= "); + return; + } + + struct nvme_ctrl *ctrl =3D malloc_high(sizeof(*ctrl)); + if (!ctrl) { + warn_noalloc(); + return; + } + + memset(ctrl, 0, sizeof(*ctrl)); + + ctrl->reg =3D reg; + ctrl->pci =3D pci; + + if (nvme_controller_enable(ctrl)) { + /* Initialization failed */ + free(ctrl); + } +} + +// Locate and init NVMe controllers +static void +nvme_scan(void) +{ + // Scan PCI bus for ATA adapters + struct pci_device *pci; + + foreachpci(pci) { + if (pci->class !=3D PCI_CLASS_STORAGE_NVME) + continue; + if (pci->prog_if !=3D 2 /* as of NVM 1.0e */) { + dprintf(3, "Found incompatble NVMe: prog-if=3D%02x\n", pci->pr= og_if); + continue; + } + + run_thread(nvme_controller_setup, pci); + } +} + +static int +nvme_cmd_readwrite(struct nvme_namespace *ns, struct disk_op_s *op, int wr= ite) +{ + int res =3D DISK_RET_SUCCESS; + u16 const max_blocks =3D NVME_PAGE_SIZE / ns->block_size; + u16 i; + + for (i =3D 0; i < op->count || res !=3D DISK_RET_SUCCESS;) { + u16 blocks_remaining =3D op->count - i; + u16 blocks =3D blocks_remaining < max_blocks ? blocks_remaining + : max_blocks; + char *op_buf =3D op->buf_fl + i * ns->block_size; + + if (write) { + memcpy(ns->dma_buffer, op_buf, blocks * ns->block_size); + } + + res =3D nvme_io_readwrite(ns, op->lba + i, ns->dma_buffer, blocks,= write); + dprintf(3, "ns %u %s lba %llu+%u: %d\n", ns->ns_id, write ? "write" + : "read", + op->lba + i, blocks, res); + + if (!write && res =3D=3D DISK_RET_SUCCESS) { + memcpy(op_buf, ns->dma_buffer, blocks * ns->block_size); + } + + i +=3D blocks; + } + + return res; +} + +int +nvme_process_op(struct disk_op_s *op) +{ + if (!CONFIG_NVME || !runningOnQEMU()) + return DISK_RET_SUCCESS; + + struct nvme_namespace *ns =3D container_of(op->drive_gf, struct nvme_n= amespace, + drive); + + switch (op->command) { + case CMD_READ: + case CMD_WRITE: + return nvme_cmd_readwrite(ns, op, op->command =3D=3D CMD_WRITE); + default: + return default_process_op(op); + } +} + +void +nvme_setup(void) +{ + ASSERT32FLAT(); + if (!CONFIG_NVME || !runningOnQEMU()) + return; + + dprintf(3, "init nvme\n"); + nvme_scan(); +} + +/* EOF */ diff --git a/src/hw/nvme.h b/src/hw/nvme.h new file mode 100644 index 0000000..4dbb70a --- /dev/null +++ b/src/hw/nvme.h @@ -0,0 +1,17 @@ +// External interfaces for low level NVMe support +// +// Copyright 2017 Amazon.com, Inc. or its affiliates. +// +// This file may be distributed under the terms of the GNU LGPLv3 license. + +#ifndef __NVME_H +#define __NVME_H + +#include "block.h" // struct disk_op_s + +void nvme_setup(void); +int nvme_process_op(struct disk_op_s *op); + +#endif + +/* EOF */ diff --git a/src/hw/pci_ids.h b/src/hw/pci_ids.h index cdf9b3c..4ac73b4 100644 --- a/src/hw/pci_ids.h +++ b/src/hw/pci_ids.h @@ -18,6 +18,7 @@ #define PCI_CLASS_STORAGE_SATA 0x0106 #define PCI_CLASS_STORAGE_SATA_AHCI 0x010601 #define PCI_CLASS_STORAGE_SAS 0x0107 +#define PCI_CLASS_STORAGE_NVME 0x0108 #define PCI_CLASS_STORAGE_OTHER 0x0180 =20 #define PCI_BASE_CLASS_NETWORK 0x02 --=20 2.7.4 _______________________________________________ SeaBIOS mailing list SeaBIOS@seabios.org https://www.coreboot.org/mailman/listinfo/seabios