From nobody Sun May 24 19:33:19 2026 Received: from mail-vs1-f48.google.com (mail-vs1-f48.google.com [209.85.217.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07031257827 for ; Fri, 22 May 2026 16:26:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.217.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779467207; cv=none; b=lul3dZrGwA+noJ6RGL3Gkr8lFLRqvq8KxvOO6vAyZ54JTZr9hekhHEDZkESXCGDdpTdKuhqFq2aCuU+rTCHlaomTawheIsu7qMMhwiQixyjz4q2dXkFw9acDCbBN7Q31vvXpf1s9U2flzBWCfGLBWahq6m/rpiprSgkdTO47LmM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779467207; c=relaxed/simple; bh=dPXHqk02VTY0KZmQzSkBPP99rVfP1KRDxFsQkXYF7y4=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=jLVazg5hiNokWjEIEvHHwhCHGzWcLM4mGn8gzemqBIIREc9iSOaq2QZMAyIAQY9SJXgy/Ut86qEQojDorafiwAy3EPSmxq+OSWoVj5mR4nPtcJ8kHIarOJQgX7KQuXtY9h7RHNWsVv2w/R99hcquO0SceosmQYyFkhPFU18aBaY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=b8ZeVltG; arc=none smtp.client-ip=209.85.217.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="b8ZeVltG" Received: by mail-vs1-f48.google.com with SMTP id ada2fe7eead31-6313b7e3d03so4720646137.3 for ; Fri, 22 May 2026 09:26:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779467205; x=1780072005; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=W8xZQehkh3ianc7VYzDNDYXbFb1UAJo0G6OX86ibadI=; b=b8ZeVltGzB5la5qsZZ6QObS76LQW90R1vmjzZ6H6Ivm0QQvEcCJLUsssEkL0FrZ8lQ qYMkNKKQvAfxaps7+RLW5/qJgtKCnlsZyZtgc3RQgdbXb0QzmRLq1xxIiq2AW7xlwaEH 1+X1MjIKrn3kMnKmY5fZKG8XJXc+3IUpmUTBRpDXrZt1TObGGSxuiVYsjaf1JlKPWKn9 Kafb8G2geiz6QTDrx9L19n28NkqstMgZxypJ/57gXADncFzFWIxtlZIbAQuzfoQkCt5t OKWklK8c10NVJRO/1xR9RxvVDqvms+3fkoDCgSuK2Q6ePDsIjLHB/lgOJWCAp2yxXPlZ pluQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779467205; x=1780072005; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=W8xZQehkh3ianc7VYzDNDYXbFb1UAJo0G6OX86ibadI=; b=DTPqNn+yBzLvAmYcv0KyiYtvhNttNUeQhNZRAkAiLd83/OZehsek7Qcd7z6KtoTWBG riejlCeNe/qIwgJb6dON9E1i3zCUvy8vcJVCcdl1Q2rIcmueoC8hHl8mcxUjmc1XX49C QQSUNpJJAhtaYVrctENBepKetDmZzzTAqH5s4FlY0xvZVmhwHXaeIhLtHWuGwEFVC25X pSpqB15qyZTW1flkS7drMXMdvm0cDChixP1gql9zK6VSg7Lb6KirFbAYYfUhBVnja8lY 6qtO/R6xLX9kOct7LaaNWl/OAW+olCoCEVUznUiLusRlS0BPMlWN8WJQ3QNQTAnLIbeY kDlg== X-Forwarded-Encrypted: i=1; AFNElJ9v+Wsd8I9jQdW9hIGkqpIOx1TXYRcL+1Oig2d3chLBEHWgNHPNKn54Qveg23nlUF3YCgyrn/fMtaOKfIc=@vger.kernel.org X-Gm-Message-State: AOJu0YyL+zjH9B6Ei9WTxRHvezpQ5E7FvaxC0eHpvegLARqJ7TsatCA9 kC7AhtKgk6lNraoyhj5Hn5d9vhT5ec8PnjhjRVWi5wct/+EB0wksRZ8ILzXVy4NQb0g= X-Gm-Gg: Acq92OHksjRuakZpVhQl9cJkm13q/tGu2/VS501SRwp+YX8BdaYkjoM6Y1770KXx6N1 tXgd8lFBtEojBD03EZwSs1xOEOVPU26E4e7VgXtdhzfIh+SbhigQxN4IV7JRDJkhuJy7dgfZem3 uC6R3Mk53aHAcsop13kxLJqNIxsjqWpk0ABMmHX1g01MGs0tIN+Yf61JlahS3Ebzp7mv/HbIaIH DyqXkTwzz53lalej4oZo/HiVV/6bh+zfVD59K0JcJP9Vw5gq6igEi0sSnMTvSX43sWeDlBNoi3y /e1PLzlvNBuV3Hb2PfI/J+xZ9oKWPR7kZY+0zAPR+GBYAtDGPVXZr6JbKZRM5Ihje71a+Jj+uy9 N9sTE06oSJ7WcR90lLvTbsP0mLb5SE0QkQaIVTrcxyg9DGkxhF3Wl03Q9BbVdaExgqUpDH8Eotb qCc1M1AN6tNsJLNvicPvurh+Rli2C/pnBJ2qhP9sqT2w== X-Received: by 2002:a05:6102:5e98:b0:631:7781:fe91 with SMTP id ada2fe7eead31-67c738ae6a4mr2707048137.9.1779467204825; Fri, 22 May 2026 09:26:44 -0700 (PDT) Received: from syssplab.cs.fiu.edu (nat1.cs.fiu.edu. [131.94.134.89]) by smtp.gmail.com with ESMTPSA id a1e0cc1a2514c-96173afb1ecsm2151019241.9.2026.05.22.09.26.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 May 2026 09:26:44 -0700 (PDT) From: Chao Shi To: linux-nvme@lists.infradead.org, Keith Busch Cc: Christoph Hellwig , Sagi Grimberg , Jens Axboe , Tatsuya Sasaki , Maurizio Lombardi , linux-kernel@vger.kernel.org, Chao Shi , Sungwoo Kim , Dave Tian , Weidong Zhu Subject: [PATCH v4] nvme: reject keep-alive passthrough on non-fabrics Date: Fri, 22 May 2026 12:26:39 -0400 Message-ID: <20260522162639.395802-1-coshi036@gmail.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Since commit b58da2d270db ("nvme: update keep alive interval when kato is modified"), userspace can start keep-alive on any transport via a Set Features (KATO) passthrough command. nvme_keep_alive_work() then allocates with BLK_MQ_REQ_RESERVED, but nvme_alloc_admin_tag_set() only reserves admin tags for fabrics, so the allocation trips WARN_ON_ONCE() in blk_mq_get_tag() and fails: nvme nvme0: keep-alive failed: -11 Keep Alive is optional on PCIe (NVMe 2.0a section 5.27.1.12) and the driver only arms keep-alive for fabrics; enabling it elsewhere has no reserved tag and an active keep-alive command only harms idle power states. Reject Set Features commands the driver is not prepared to handle from userspace passthrough, starting with KATO on non-fabrics. The check can be extended to other problematic features as they are identified. This guards the userspace passthrough paths (ioctl and io_uring); the nvmet target passthru path is out of scope and is not changed here. Link: https://lore.kernel.org/linux-nvme/20260515071248.2689513-1-coshi036@= gmail.com/ Fixes: b58da2d270db ("nvme: update keep alive interval when kato is modifie= d") Found by FuzzNvme(Syzkaller with FEMU fuzzing framework). Acked-by: Sungwoo Kim Acked-by: Dave Tian Acked-by: Weidong Zhu Signed-off-by: Chao Shi --- Reproducer (run as root on a PCIe NVMe device): #include #include #include #include #include int main(void) { struct nvme_admin_cmd cmd =3D {0}; int fd =3D open("/dev/nvme0", O_RDWR); if (fd < 0) { perror("open"); return 1; } cmd.opcode =3D 0x09; /* SET_FEATURES */ cmd.cdw10 =3D 0x0f; /* Feature ID: KATO */ cmd.cdw11 =3D 5; /* KATO =3D 5 seconds */ if (ioctl(fd, NVME_IOCTL_ADMIN_CMD, &cmd) < 0) { perror("ioctl"); return 1; } return 0; } On an unpatched kernel, within ~kato/2 seconds after the program exits, dmesg shows: nvme nvme0: keep alive interval updated from 0 ms to 5000 ms WARNING: CPU: 0 PID: ... at block/blk-mq-tag.c:148 blk_mq_get_tag+... nvme nvme0: keep-alive failed: -11 With this patch the ioctl fails with EOPNOTSUPP on non-fabrics and keep-alive is never started. Changes since v3: - Only inspect admin commands (ns =3D=3D NULL). I/O commands share the opcode space with admin commands (Dataset Management is 0x09, same as Set Features), so the previous version could wrongly reject a DSM I/O command. Pass ns to the helper and bail out for I/O (Keith Busch). Changes since v2: - Reject the KATO Set Features passthrough on non-fabrics instead of reserving an admin tag for all transports (Keith Busch, Christoph Hellwig). PCIe does not need keep-alive, and an active keep-alive command only harms idle power states. - Implement as an extensible passthrough filter for Set Features commands the driver cannot handle. - Drop the core.c reserved_tags change. Changes since v1: - v2 added a spec citation and a quirk discussion; both are superseded by the filter approach above. drivers/nvme/host/ioctl.c | 42 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index a9c097dacad6..33caa3ae79e5 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -86,6 +86,39 @@ static bool nvme_cmd_allowed(struct nvme_ns *ns, struct = nvme_command *c, return capable(CAP_SYS_ADMIN); } =20 +/* + * Some Set Features commands change controller behaviour that the driver = is + * not prepared to handle on every transport. Reject such commands from + * userspace passthrough rather than letting them put the controller into a + * state the driver cannot deal with. The list can be extended as other + * problematic features are identified. + */ +static bool nvme_passthru_cmd_allowed(struct nvme_ctrl *ctrl, + struct nvme_ns *ns, + struct nvme_command *c) +{ + /* + * This only filters admin commands (ns =3D=3D NULL). I/O commands share + * the opcode space with admin commands - Dataset Management is 0x09, + * the same value as Set Features - so they must not be inspected here. + */ + if (ns || c->common.opcode !=3D nvme_admin_set_features) + return true; + + switch (le32_to_cpu(c->common.cdw10) & 0xff) { + case NVME_FEAT_KATO: + /* + * Keep Alive is optional on PCIe (NVMe 2.0a 5.27.1.12) and the + * driver only arms keep-alive for fabrics. Enabling it on + * other transports starts a keep-alive command the driver is + * not set up for and harms idle power states, so reject it. + */ + return ctrl->ops->flags & NVME_F_FABRICS; + default: + return true; + } +} + /* * Convert integer values from ioctl structures to user pointers, silently * ignoring the upper bits in the compat case to match behaviour of 32-bit @@ -311,6 +344,9 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct= nvme_ns *ns, if (!nvme_cmd_allowed(ns, &c, 0, open_for_write)) return -EACCES; =20 + if (!nvme_passthru_cmd_allowed(ctrl, ns, &c)) + return -EOPNOTSUPP; + if (cmd.timeout_ms) timeout =3D msecs_to_jiffies(cmd.timeout_ms); =20 @@ -358,6 +394,9 @@ static int nvme_user_cmd64(struct nvme_ctrl *ctrl, stru= ct nvme_ns *ns, if (!nvme_cmd_allowed(ns, &c, flags, open_for_write)) return -EACCES; =20 + if (!nvme_passthru_cmd_allowed(ctrl, ns, &c)) + return -EOPNOTSUPP; + if (cmd.timeout_ms) timeout =3D msecs_to_jiffies(cmd.timeout_ms); =20 @@ -475,6 +514,9 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, st= ruct nvme_ns *ns, if (!nvme_cmd_allowed(ns, &c, 0, ioucmd->file->f_mode & FMODE_WRITE)) return -EACCES; =20 + if (!nvme_passthru_cmd_allowed(ctrl, ns, &c)) + return -EOPNOTSUPP; + d.metadata =3D READ_ONCE(cmd->metadata); d.addr =3D READ_ONCE(cmd->addr); d.data_len =3D READ_ONCE(cmd->data_len); --=20 2.43.0