From nobody Sun May 24 18:41:06 2026 Received: from mail-ua1-f54.google.com (mail-ua1-f54.google.com [209.85.222.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D70323C4E9 for ; Sat, 23 May 2026 22:56:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779576994; cv=none; b=GWBmOHzfSXhSkLtWOz+Pus1a9zUT7s4xGLra7WlpRuCWCMQt9Z4ddUthTk/qJ3je2k4zlaBOJYHuloQ1Nf5F0DnPAYp8dBzS2X8ih0PFKwDl1pbOSwXDncGwXVXaAcXqFvBXR2BNivdWBJOw5OOlCrfvxHjH68Bw8Ts8+k/plAQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779576994; c=relaxed/simple; bh=12eiV8VXb8uqlReBCybNCpgU+gg8e18tDYC3yMWRlD4=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=MAG1nZfbvXefrR5gnhr3Ceg4oqGcu3iblkdyzZkvr4r3eICkMsAi2fdrpJRje9nJex8JIbTzqnnF+jBDr9fUbUUwsZmXeuEjb59SVK+zX5FiNIpvC04TJ18TNMwD4x0dxLnHFZUy41ecmHWJVMSCKdfzbMwSmg7CsBhpyfeEMe0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=HfJvKdzZ; arc=none smtp.client-ip=209.85.222.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HfJvKdzZ" Received: by mail-ua1-f54.google.com with SMTP id a1e0cc1a2514c-95d0476492bso6153560241.2 for ; Sat, 23 May 2026 15:56:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779576991; x=1780181791; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=cDes2ImAivF9GmrbE+N4ZQZmqIxy9Q4epmjcP19kqWQ=; b=HfJvKdzZNCFsTmgggq3CzTmGeKdJHwhoiPGMYSGRBodJf+XMvJJBVsjXpe9mwZl/D8 cXArchWAeWIp7a/8Ubxs3t5yE2LvCvsSC86SRId4CGNLWNuwuUDARtGWVek8FXCByCMy T+v8m3RmmlVWhmu7fookfn0bfG/STpGoBI8wEQxqFoIQnIWjy07nOrz33LOYYeO4b1a5 PGldVfgmMfYZLhoRSZxQSazh3ivrWNH8+WVzG+NXpuTt7ihTgaIbu0IpZwqNpwzsQK8u ICIvqa/Bs0hZQjHrZPbyNPdIqDRWIV/3IwYGV2pPPAm6eOBHc7O4/WhTyO2b6PsGwt4v ui9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779576991; x=1780181791; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=cDes2ImAivF9GmrbE+N4ZQZmqIxy9Q4epmjcP19kqWQ=; b=bWjLPvH7CoTKkny5v5sJFAOU8fFg9S6W4PXOr79YMdTga0I5EWWj8Vgl+Yy9U0IWnK sTpUTiF9NU3NJ5RtWgPPWbtZxVC7/tkh7L5CwHoP3H1WTQpzPLGFMf98isVQg3ndN4EI m6Kg/pk7KLyXUkR1YuKFgekQucIdV38wi8c0snKaY10rZvHHYsPpnJO/WjSEfntxPHCf pZiK8ESQDzydNKa2Vpf4RbtQ+yUfxQejRD02gbrEkCEbAMrRytKwHt34c9XkDHuZFzad 5pUqq7LS2RcuGSVdVhkaNmLJFjo2j/yWwGMz+4TrSLZ4V4SYbED85bfmqW2l2rHyTW2t obwg== X-Forwarded-Encrypted: i=1; AFNElJ8SzBftQWio/ndWc219LezDNG+lhXxEqHbw0g0lZrCP9jgJ3PDv/24+s/CRL8HIxrbHDNaYO2i89Fa/TA0=@vger.kernel.org X-Gm-Message-State: AOJu0Yzaqj3Pe0vJGJ+RNs+y8ZH0wA0zfnxrtxbAPwa1cC1EQjRJOoE3 ZJMQl8wneVQ2idjvoG+biqDt2+5o1MRM6ieuX3XmhmEbwNnbE1vpAJ4O X-Gm-Gg: Acq92OEzLm3/LrELQkR1ieP/C5n9IH+Cn0qlg2JGKipZbo2+zQ1kBqo5QsK5NMZHmII VlfhA6pm5VTFAuGCvft1R9jaZ89K7jZDE9z3MP4AcRR+n1ZjxJ6S+5RsqELS7vuxHKL7inPg4ZE 27RQsnGRfxhOp5aFw2WvMW8w2TQWtgQ1Q7QvKHaxv2oEKi0bVLMD/oTMHfCGvUi3UiHYMJ82p9m ontn1YGYePBV24dXKbW+UmrOh95+tDsF1xtMLltYEtXVrIEmEXF0fms2nbDrWE2hq/wIrpamTEZ 30HdNICY+lrW/x3vReHlkbdW6cfY1kjOQRGuoNtYPkOO/TngwuJKGzcUmr83IMHEsLuNwmhMNPF ++F2N8jYl1XZXtGBUmaQViDiOsPjbbbm+zjqcFP1Er/yXGO4lzs4LBo+ap4dIlUq4SMx7OC+5ap OFUqgaNyEKBioB6qOKpSZMa6pjgbwYVGRUsj232E8R7A== X-Received: by 2002:a05:6102:b07:b0:618:442a:9e76 with SMTP id ada2fe7eead31-67c7cee1428mr5016464137.10.1779576991313; Sat, 23 May 2026 15:56:31 -0700 (PDT) Received: from syssplab.cs.fiu.edu (nat1.cs.fiu.edu. [131.94.134.89]) by smtp.gmail.com with ESMTPSA id a1e0cc1a2514c-9617383b1a2sm6080769241.4.2026.05.23.15.56.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 23 May 2026 15:56:30 -0700 (PDT) From: Chao Shi To: linux-nvme@lists.infradead.org, Keith Busch Cc: Christoph Hellwig , Sagi Grimberg , Jens Axboe , Tatsuya Sasaki , Maurizio Lombardi , linux-kernel@vger.kernel.org, Sungwoo Kim , Dave Tian , Weidong Zhu Subject: [PATCH v5] nvme: reject passthrough of driver-managed Set Features Date: Sat, 23 May 2026 18:56:29 -0400 Message-ID: <20260523225629.3964037-1-coshi036@gmail.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Since commit b58da2d270db ("nvme: update keep alive interval when kato is modified"), userspace can start keep-alive on any transport via a Set Features (KATO) passthrough command. nvme_keep_alive_work() then allocates with BLK_MQ_REQ_RESERVED, but nvme_alloc_admin_tag_set() only reserves admin tags for fabrics, so the allocation trips WARN_ON_ONCE() in blk_mq_get_tag() and fails: nvme nvme0: keep-alive failed: -11 More generally, several Set Features change controller state that the driver manages itself and cannot react to correctly when set behind its back from userspace. Reject these in nvme_cmd_allowed(): - KATO on non-fabrics (keep-alive is only armed for fabrics; on PCIe it has no reserved tag and an active keep-alive harms idle power states) - Host Behavior Support, Host Memory Buffer, Number of Queues, and Autonomous Power State Transition (all driver-managed) Keep Alive on fabrics is unchanged. I/O commands are unaffected as the check is confined to the admin path (ns =3D=3D NULL). Link: https://lore.kernel.org/linux-nvme/20260522162639.395802-1-coshi036@g= mail.com/ Fixes: b58da2d270db ("nvme: update keep alive interval when kato is modifie= d") Found by FuzzNvme(Syzkaller with FEMU fuzzing framework). Acked-by: Sungwoo Kim Acked-by: Dave Tian Acked-by: Weidong Zhu Signed-off-by: Chao Shi --- Reproducer for the keep-alive case (run as root on a PCIe NVMe device): #include #include #include #include #include int main(void) { struct nvme_admin_cmd cmd =3D {0}; int fd =3D open("/dev/nvme0", O_RDWR); if (fd < 0) { perror("open"); return 1; } cmd.opcode =3D 0x09; /* SET_FEATURES */ cmd.cdw10 =3D 0x0f; /* Feature ID: KATO */ cmd.cdw11 =3D 5; /* KATO =3D 5 seconds */ if (ioctl(fd, NVME_IOCTL_ADMIN_CMD, &cmd) < 0) { perror("ioctl"); return 1; } return 0; } On an unpatched kernel, within ~kato/2 seconds after the program exits, dmesg shows: nvme nvme0: keep alive interval updated from 0 ms to 5000 ms WARNING: CPU: 0 PID: ... at block/blk-mq-tag.c:148 blk_mq_get_tag+... nvme nvme0: keep-alive failed: -11 With this patch the ioctl fails with EACCES on non-fabrics. Changes since v4: - Fold the check into the existing nvme_cmd_allowed() instead of a separate helper, and reject additional driver-managed Set Features (Host Behavior, Host Memory Buffer, Number of Queues, Autonomous Power State Transition) in the same switch (Keith Busch). The admin vs I/O distinction is now structural: the switch lives in the ns =3D=3D NULL branch, so I/O commands (e.g. Dataset Management, which shares opcode 0x09 with Set Features) are never inspected. Changes since v3: - Only inspect admin commands so a DSM I/O command is not wrongly rejected (Keith Busch). Changes since v2: - Reject the KATO passthrough on non-fabrics instead of reserving an admin tag for all transports (Keith Busch, Christoph Hellwig). Changes since v1: - v2 added a spec citation and quirk discussion, superseded by the reject approach. drivers/nvme/host/ioctl.c | 33 +++++++++++++++++++++++++++------ 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index a9c097dacad6..31784506e845 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -14,8 +14,9 @@ enum { NVME_IOCTL_PARTITION =3D (1 << 1), }; =20 -static bool nvme_cmd_allowed(struct nvme_ns *ns, struct nvme_command *c, - unsigned int flags, bool open_for_write) +static bool nvme_cmd_allowed(struct nvme_ctrl *ctrl, struct nvme_ns *ns, + struct nvme_command *c, unsigned int flags, + bool open_for_write) { u32 effects; =20 @@ -50,6 +51,26 @@ static bool nvme_cmd_allowed(struct nvme_ns *ns, struct = nvme_command *c, case NVME_ID_CNS_CTRL: return true; } + } else if (c->common.opcode =3D=3D nvme_admin_set_features) { + /* + * Reject Set Features that change controller state the + * driver manages itself; setting them behind the driver's + * back from userspace leaves it unable to react correctly. + * Keep Alive is only armed for fabrics - on other + * transports it has no reserved tag and harms idle power + * states. + */ + switch (le32_to_cpu(c->features.fid) & 0xff) { + case NVME_FEAT_KATO: + if (ctrl->ops->flags & NVME_F_FABRICS) + break; + fallthrough; + case NVME_FEAT_HOST_BEHAVIOR: + case NVME_FEAT_HOST_MEM_BUF: + case NVME_FEAT_NUM_QUEUES: + case NVME_FEAT_AUTO_PST: + return false; + } } goto admin; } @@ -59,7 +80,7 @@ static bool nvme_cmd_allowed(struct nvme_ns *ns, struct n= vme_command *c, * and marks this command as supported. If not reject unprivileged * passthrough. */ - effects =3D nvme_command_effects(ns->ctrl, ns, c->common.opcode); + effects =3D nvme_command_effects(ctrl, ns, c->common.opcode); if (!(effects & NVME_CMD_EFFECTS_CSUPP)) goto admin; =20 @@ -308,7 +329,7 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct= nvme_ns *ns, c.common.cdw14 =3D cpu_to_le32(cmd.cdw14); c.common.cdw15 =3D cpu_to_le32(cmd.cdw15); =20 - if (!nvme_cmd_allowed(ns, &c, 0, open_for_write)) + if (!nvme_cmd_allowed(ctrl, ns, &c, 0, open_for_write)) return -EACCES; =20 if (cmd.timeout_ms) @@ -355,7 +376,7 @@ static int nvme_user_cmd64(struct nvme_ctrl *ctrl, stru= ct nvme_ns *ns, c.common.cdw14 =3D cpu_to_le32(cmd.cdw14); c.common.cdw15 =3D cpu_to_le32(cmd.cdw15); =20 - if (!nvme_cmd_allowed(ns, &c, flags, open_for_write)) + if (!nvme_cmd_allowed(ctrl, ns, &c, flags, open_for_write)) return -EACCES; =20 if (cmd.timeout_ms) @@ -472,7 +493,7 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, st= ruct nvme_ns *ns, c.common.cdw14 =3D cpu_to_le32(READ_ONCE(cmd->cdw14)); c.common.cdw15 =3D cpu_to_le32(READ_ONCE(cmd->cdw15)); =20 - if (!nvme_cmd_allowed(ns, &c, 0, ioucmd->file->f_mode & FMODE_WRITE)) + if (!nvme_cmd_allowed(ctrl, ns, &c, 0, ioucmd->file->f_mode & FMODE_WRITE= )) return -EACCES; =20 d.metadata =3D READ_ONCE(cmd->metadata); --=20 2.43.0