[PATCH] nvme-core: remove head->effects to fix use-after-free

Yuanyuan Zhong posted 1 patch 2 years, 1 month ago
drivers/nvme/host/core.c | 17 ++++++++---------
drivers/nvme/host/nvme.h |  1 -
2 files changed, 8 insertions(+), 10 deletions(-)
[PATCH] nvme-core: remove head->effects to fix use-after-free
Posted by Yuanyuan Zhong 2 years, 1 month ago
The head->effects stores a pointer to the controller's Command Sets
Supported and Effects log. When the namespace head is shared by multiple
controllers, if the particular controller is removed, dereferencing
head->effects causes use-after-free:

[  227.820066] ==================================================================
[  227.820069] BUG: KFENCE: use-after-free read in nvme_command_effects+0x1f/0x90 [nvme_core]

[  227.820091] Use-after-free read at 0x00000000c02dadcf (in kfence-#0):
[  227.820094]  nvme_command_effects+0x1f/0x90 [nvme_core]
[  227.820107]  nvme_passthru_start+0x19/0x80 [nvme_core]
[  227.820121]  nvme_submit_user_cmd+0xc7/0x170 [nvme_core]
[  227.820136]  nvme_user_cmd.constprop.16+0x152/0x1d0 [nvme_core]
[  227.820149]  nvme_ns_head_ioctl+0x92/0x140 [nvme_core]
[  227.820162]  blkdev_ioctl+0x1c5/0x280
[  227.820169]  __x64_sys_ioctl+0x93/0xd0
[  227.820174]  do_syscall_64+0x45/0xf0
[  227.820177]  entry_SYSCALL_64_after_hwframe+0x6e/0x76

[  227.820182] kfence-#0: 0x000000000fac1d5d-0x00000000a28a73c3, size=4096, cache=kmalloc-4k

[  227.820185] allocated by task 2559 on cpu 3 at 188.703118s:
[  227.820233]  __kmem_cache_alloc_node+0x3c9/0x410
[  227.820236]  kmalloc_trace+0x2a/0xa0
[  227.820238]  nvme_get_effects_log+0x68/0xd0 [nvme_core]
[  227.820251]  nvme_init_identify+0x5ef/0x640 [nvme_core]
[  227.820263]  nvme_init_ctrl_finish+0x8d/0x250 [nvme_core]
[  227.820275]  nvme_tcp_setup_ctrl+0x1ce/0x710 [nvme_tcp]
[  227.820281]  nvme_tcp_create_ctrl+0x359/0x450 [nvme_tcp]
[  227.820286]  nvmf_dev_write+0x203/0x3b0 [nvme_fabrics]
[  227.820292]  vfs_write+0xd2/0x3d0
[  227.820294]  ksys_write+0x5d/0xd0
[  227.820297]  do_syscall_64+0x45/0xf0
[  227.820298]  entry_SYSCALL_64_after_hwframe+0x6e/0x76

[  227.820302] freed by task 2521 on cpu 28 at 220.115945s:
[  227.820329]  nvme_free_ctrl+0xa6/0x260 [nvme_core]
[  227.820342]  device_release+0x37/0x90
[  227.820345]  kobject_release+0x57/0x120
[  227.820347]  nvme_sysfs_delete+0x39/0x50 [nvme_core]
[  227.820360]  kernfs_fop_write_iter+0x144/0x1e0
[  227.820362]  vfs_write+0x25f/0x3d0
[  227.820364]  ksys_write+0x5d/0xd0
[  227.820366]  do_syscall_64+0x45/0xf0
[  227.820368]  entry_SYSCALL_64_after_hwframe+0x6e/0x76

Fix this by removing head->effects. Use the Commands Supported and Effects log
in ctrl->cels directly.

Fixes: be93e87e7802 ("nvme: support for multiple Command Sets Supported and Effects log pages")
Signed-off-by: Yuanyuan Zhong <yzhong@purestorage.com>
Reviewed-by: Randy Jennings <randyj@purestorage.com>
Reviewed-by: Hamilton Coutinho <hcoutinho@purestorage.com>
---
 drivers/nvme/host/core.c | 17 ++++++++---------
 drivers/nvme/host/nvme.h |  1 -
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 88b54cdcbd68..14fdc2de3a75 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1085,7 +1085,9 @@ u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode)
 	u32 effects = 0;
 
 	if (ns) {
-		effects = le32_to_cpu(ns->head->effects->iocs[opcode]);
+		struct nvme_effects_log	*cel = xa_load(&ctrl->cels, ns->head->ids.csi);
+
+		effects = le32_to_cpu(cel->iocs[opcode]);
 		if (effects & ~(NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC))
 			dev_warn_once(ctrl->device,
 				"IO command:%02x has unusual effects:%08x\n",
@@ -2872,7 +2874,8 @@ static int nvme_get_effects_log(struct nvme_ctrl *ctrl, u8 csi,
 
 	xa_store(&ctrl->cels, csi, cel, GFP_KERNEL);
 out:
-	*log = cel;
+	if (log)
+		*log = cel;
 	return 0;
 }
 
@@ -3388,13 +3391,6 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
 	head->shared = info->is_shared;
 	kref_init(&head->ref);
 
-	if (head->ids.csi) {
-		ret = nvme_get_effects_log(ctrl, head->ids.csi, &head->effects);
-		if (ret)
-			goto out_cleanup_srcu;
-	} else
-		head->effects = ctrl->effects;
-
 	ret = nvme_mpath_alloc_disk(ctrl, head);
 	if (ret)
 		goto out_cleanup_srcu;
@@ -3779,6 +3775,9 @@ static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	if (ret || !info.is_ready)
 		return;
 
+	if (info.ids.csi && nvme_get_effects_log(ctrl, info.ids.csi, NULL))
+		return;
+
 	ns = nvme_find_get_ns(ctrl, nsid);
 	if (ns) {
 		nvme_validate_ns(ns, &info);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 39a90b7cb125..38298c072ec3 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -445,7 +445,6 @@ struct nvme_ns_head {
 	struct kref		ref;
 	bool			shared;
 	int			instance;
-	struct nvme_effects_log *effects;
 
 	struct cdev		cdev;
 	struct device		cdev_device;
-- 
2.34.1
Re: [PATCH] nvme-core: remove head->effects to fix use-after-free
Posted by kernel test robot 2 years, 1 month ago
Hi Yuanyuan,

kernel test robot noticed the following build errors:

[auto build test ERROR on hch-configfs/for-next]
[also build test ERROR on linus/master v6.7-rc1 next-20231117]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yuanyuan-Zhong/nvme-core-remove-head-effects-to-fix-use-after-free/20231116-025616
base:   git://git.infradead.org/users/hch/configfs.git for-next
patch link:    https://lore.kernel.org/r/20231115185439.2616073-1-yzhong%40purestorage.com
patch subject: [PATCH] nvme-core: remove head->effects to fix use-after-free
config: powerpc-randconfig-r133-20231119 (https://download.01.org/0day-ci/archive/20231120/202311200305.oyZkQTJh-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce: (https://download.01.org/0day-ci/archive/20231120/202311200305.oyZkQTJh-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311200305.oyZkQTJh-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/nvme/host/zns.c:50:43: error: no member named 'effects' in 'struct nvme_ns_head'
      50 |         struct nvme_effects_log *log = ns->head->effects;
         |                                        ~~~~~~~~  ^
   1 error generated.


vim +50 drivers/nvme/host/zns.c

240e6ee272c07a Keith Busch       2020-06-29   47  
d525c3c0232216 Christoph Hellwig 2020-08-20   48  int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf)
240e6ee272c07a Keith Busch       2020-06-29   49  {
240e6ee272c07a Keith Busch       2020-06-29  @50  	struct nvme_effects_log *log = ns->head->effects;
d525c3c0232216 Christoph Hellwig 2020-08-20   51  	struct request_queue *q = ns->queue;
240e6ee272c07a Keith Busch       2020-06-29   52  	struct nvme_command c = { };
240e6ee272c07a Keith Busch       2020-06-29   53  	struct nvme_id_ns_zns *id;
240e6ee272c07a Keith Busch       2020-06-29   54  	int status;
240e6ee272c07a Keith Busch       2020-06-29   55  
240e6ee272c07a Keith Busch       2020-06-29   56  	/* Driver requires zone append support */
2f4c9ba23b887e Javier González   2020-12-01   57  	if ((le32_to_cpu(log->iocs[nvme_cmd_zone_append]) &
240e6ee272c07a Keith Busch       2020-06-29   58  			NVME_CMD_EFFECTS_CSUPP)) {
2f4c9ba23b887e Javier González   2020-12-01   59  		if (test_and_clear_bit(NVME_NS_FORCE_RO, &ns->flags))
240e6ee272c07a Keith Busch       2020-06-29   60  			dev_warn(ns->ctrl->device,
2f4c9ba23b887e Javier González   2020-12-01   61  				 "Zone Append supported for zoned namespace:%d. Remove read-only mode\n",
2f4c9ba23b887e Javier González   2020-12-01   62  				 ns->head->ns_id);
2f4c9ba23b887e Javier González   2020-12-01   63  	} else {
2f4c9ba23b887e Javier González   2020-12-01   64  		set_bit(NVME_NS_FORCE_RO, &ns->flags);
2f4c9ba23b887e Javier González   2020-12-01   65  		dev_warn(ns->ctrl->device,
2f4c9ba23b887e Javier González   2020-12-01   66  			 "Zone Append not supported for zoned namespace:%d. Forcing to read-only mode\n",
240e6ee272c07a Keith Busch       2020-06-29   67  			 ns->head->ns_id);
240e6ee272c07a Keith Busch       2020-06-29   68  	}
240e6ee272c07a Keith Busch       2020-06-29   69  
240e6ee272c07a Keith Busch       2020-06-29   70  	/* Lazily query controller append limit for the first zoned namespace */
240e6ee272c07a Keith Busch       2020-06-29   71  	if (!ns->ctrl->max_zone_append) {
240e6ee272c07a Keith Busch       2020-06-29   72  		status = nvme_set_max_append(ns->ctrl);
240e6ee272c07a Keith Busch       2020-06-29   73  		if (status)
240e6ee272c07a Keith Busch       2020-06-29   74  			return status;
240e6ee272c07a Keith Busch       2020-06-29   75  	}
240e6ee272c07a Keith Busch       2020-06-29   76  
240e6ee272c07a Keith Busch       2020-06-29   77  	id = kzalloc(sizeof(*id), GFP_KERNEL);
240e6ee272c07a Keith Busch       2020-06-29   78  	if (!id)
240e6ee272c07a Keith Busch       2020-06-29   79  		return -ENOMEM;
240e6ee272c07a Keith Busch       2020-06-29   80  
240e6ee272c07a Keith Busch       2020-06-29   81  	c.identify.opcode = nvme_admin_identify;
240e6ee272c07a Keith Busch       2020-06-29   82  	c.identify.nsid = cpu_to_le32(ns->head->ns_id);
240e6ee272c07a Keith Busch       2020-06-29   83  	c.identify.cns = NVME_ID_CNS_CS_NS;
240e6ee272c07a Keith Busch       2020-06-29   84  	c.identify.csi = NVME_CSI_ZNS;
240e6ee272c07a Keith Busch       2020-06-29   85  
240e6ee272c07a Keith Busch       2020-06-29   86  	status = nvme_submit_sync_cmd(ns->ctrl->admin_q, &c, id, sizeof(*id));
240e6ee272c07a Keith Busch       2020-06-29   87  	if (status)
240e6ee272c07a Keith Busch       2020-06-29   88  		goto free_data;
240e6ee272c07a Keith Busch       2020-06-29   89  
240e6ee272c07a Keith Busch       2020-06-29   90  	/*
240e6ee272c07a Keith Busch       2020-06-29   91  	 * We currently do not handle devices requiring any of the zoned
240e6ee272c07a Keith Busch       2020-06-29   92  	 * operation characteristics.
240e6ee272c07a Keith Busch       2020-06-29   93  	 */
240e6ee272c07a Keith Busch       2020-06-29   94  	if (id->zoc) {
240e6ee272c07a Keith Busch       2020-06-29   95  		dev_warn(ns->ctrl->device,
240e6ee272c07a Keith Busch       2020-06-29   96  			"zone operations:%x not supported for namespace:%u\n",
240e6ee272c07a Keith Busch       2020-06-29   97  			le16_to_cpu(id->zoc), ns->head->ns_id);
a9e0e6bc728ebc Christoph Hellwig 2021-04-07   98  		status = -ENODEV;
240e6ee272c07a Keith Busch       2020-06-29   99  		goto free_data;
240e6ee272c07a Keith Busch       2020-06-29  100  	}
240e6ee272c07a Keith Busch       2020-06-29  101  
240e6ee272c07a Keith Busch       2020-06-29  102  	ns->zsze = nvme_lba_to_sect(ns, le64_to_cpu(id->lbafe[lbaf].zsze));
240e6ee272c07a Keith Busch       2020-06-29  103  	if (!is_power_of_2(ns->zsze)) {
240e6ee272c07a Keith Busch       2020-06-29  104  		dev_warn(ns->ctrl->device,
240e6ee272c07a Keith Busch       2020-06-29  105  			"invalid zone size:%llu for namespace:%u\n",
240e6ee272c07a Keith Busch       2020-06-29  106  			ns->zsze, ns->head->ns_id);
a9e0e6bc728ebc Christoph Hellwig 2021-04-07  107  		status = -ENODEV;
240e6ee272c07a Keith Busch       2020-06-29  108  		goto free_data;
240e6ee272c07a Keith Busch       2020-06-29  109  	}
240e6ee272c07a Keith Busch       2020-06-29  110  
6b2bd274744e64 Christoph Hellwig 2022-07-06  111  	disk_set_zoned(ns->disk, BLK_ZONED_HM);
240e6ee272c07a Keith Busch       2020-06-29  112  	blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
982977df48179c Christoph Hellwig 2022-07-06  113  	disk_set_max_open_zones(ns->disk, le32_to_cpu(id->mor) + 1);
982977df48179c Christoph Hellwig 2022-07-06  114  	disk_set_max_active_zones(ns->disk, le32_to_cpu(id->mar) + 1);
240e6ee272c07a Keith Busch       2020-06-29  115  free_data:
240e6ee272c07a Keith Busch       2020-06-29  116  	kfree(id);
240e6ee272c07a Keith Busch       2020-06-29  117  	return status;
240e6ee272c07a Keith Busch       2020-06-29  118  }
240e6ee272c07a Keith Busch       2020-06-29  119  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH] nvme-core: remove head->effects to fix use-after-free
Posted by kernel test robot 2 years, 1 month ago
Hi Yuanyuan,

kernel test robot noticed the following build errors:

[auto build test ERROR on hch-configfs/for-next]
[also build test ERROR on linus/master v6.7-rc1 next-20231115]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yuanyuan-Zhong/nvme-core-remove-head-effects-to-fix-use-after-free/20231116-025616
base:   git://git.infradead.org/users/hch/configfs.git for-next
patch link:    https://lore.kernel.org/r/20231115185439.2616073-1-yzhong%40purestorage.com
patch subject: [PATCH] nvme-core: remove head->effects to fix use-after-free
config: x86_64-buildonly-randconfig-001-20231116 (https://download.01.org/0day-ci/archive/20231116/202311160738.ndt0obP3-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231116/202311160738.ndt0obP3-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311160738.ndt0obP3-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/nvme/host/zns.c: In function 'nvme_update_zone_info':
>> drivers/nvme/host/zns.c:50:48: error: 'struct nvme_ns_head' has no member named 'effects'
      50 |         struct nvme_effects_log *log = ns->head->effects;
         |                                                ^~


vim +50 drivers/nvme/host/zns.c

240e6ee272c07a2 Keith Busch       2020-06-29   47  
d525c3c02322161 Christoph Hellwig 2020-08-20   48  int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf)
240e6ee272c07a2 Keith Busch       2020-06-29   49  {
240e6ee272c07a2 Keith Busch       2020-06-29  @50  	struct nvme_effects_log *log = ns->head->effects;
d525c3c02322161 Christoph Hellwig 2020-08-20   51  	struct request_queue *q = ns->queue;
240e6ee272c07a2 Keith Busch       2020-06-29   52  	struct nvme_command c = { };
240e6ee272c07a2 Keith Busch       2020-06-29   53  	struct nvme_id_ns_zns *id;
240e6ee272c07a2 Keith Busch       2020-06-29   54  	int status;
240e6ee272c07a2 Keith Busch       2020-06-29   55  
240e6ee272c07a2 Keith Busch       2020-06-29   56  	/* Driver requires zone append support */
2f4c9ba23b887e7 Javier González   2020-12-01   57  	if ((le32_to_cpu(log->iocs[nvme_cmd_zone_append]) &
240e6ee272c07a2 Keith Busch       2020-06-29   58  			NVME_CMD_EFFECTS_CSUPP)) {
2f4c9ba23b887e7 Javier González   2020-12-01   59  		if (test_and_clear_bit(NVME_NS_FORCE_RO, &ns->flags))
240e6ee272c07a2 Keith Busch       2020-06-29   60  			dev_warn(ns->ctrl->device,
2f4c9ba23b887e7 Javier González   2020-12-01   61  				 "Zone Append supported for zoned namespace:%d. Remove read-only mode\n",
2f4c9ba23b887e7 Javier González   2020-12-01   62  				 ns->head->ns_id);
2f4c9ba23b887e7 Javier González   2020-12-01   63  	} else {
2f4c9ba23b887e7 Javier González   2020-12-01   64  		set_bit(NVME_NS_FORCE_RO, &ns->flags);
2f4c9ba23b887e7 Javier González   2020-12-01   65  		dev_warn(ns->ctrl->device,
2f4c9ba23b887e7 Javier González   2020-12-01   66  			 "Zone Append not supported for zoned namespace:%d. Forcing to read-only mode\n",
240e6ee272c07a2 Keith Busch       2020-06-29   67  			 ns->head->ns_id);
240e6ee272c07a2 Keith Busch       2020-06-29   68  	}
240e6ee272c07a2 Keith Busch       2020-06-29   69  
240e6ee272c07a2 Keith Busch       2020-06-29   70  	/* Lazily query controller append limit for the first zoned namespace */
240e6ee272c07a2 Keith Busch       2020-06-29   71  	if (!ns->ctrl->max_zone_append) {
240e6ee272c07a2 Keith Busch       2020-06-29   72  		status = nvme_set_max_append(ns->ctrl);
240e6ee272c07a2 Keith Busch       2020-06-29   73  		if (status)
240e6ee272c07a2 Keith Busch       2020-06-29   74  			return status;
240e6ee272c07a2 Keith Busch       2020-06-29   75  	}
240e6ee272c07a2 Keith Busch       2020-06-29   76  
240e6ee272c07a2 Keith Busch       2020-06-29   77  	id = kzalloc(sizeof(*id), GFP_KERNEL);
240e6ee272c07a2 Keith Busch       2020-06-29   78  	if (!id)
240e6ee272c07a2 Keith Busch       2020-06-29   79  		return -ENOMEM;
240e6ee272c07a2 Keith Busch       2020-06-29   80  
240e6ee272c07a2 Keith Busch       2020-06-29   81  	c.identify.opcode = nvme_admin_identify;
240e6ee272c07a2 Keith Busch       2020-06-29   82  	c.identify.nsid = cpu_to_le32(ns->head->ns_id);
240e6ee272c07a2 Keith Busch       2020-06-29   83  	c.identify.cns = NVME_ID_CNS_CS_NS;
240e6ee272c07a2 Keith Busch       2020-06-29   84  	c.identify.csi = NVME_CSI_ZNS;
240e6ee272c07a2 Keith Busch       2020-06-29   85  
240e6ee272c07a2 Keith Busch       2020-06-29   86  	status = nvme_submit_sync_cmd(ns->ctrl->admin_q, &c, id, sizeof(*id));
240e6ee272c07a2 Keith Busch       2020-06-29   87  	if (status)
240e6ee272c07a2 Keith Busch       2020-06-29   88  		goto free_data;
240e6ee272c07a2 Keith Busch       2020-06-29   89  
240e6ee272c07a2 Keith Busch       2020-06-29   90  	/*
240e6ee272c07a2 Keith Busch       2020-06-29   91  	 * We currently do not handle devices requiring any of the zoned
240e6ee272c07a2 Keith Busch       2020-06-29   92  	 * operation characteristics.
240e6ee272c07a2 Keith Busch       2020-06-29   93  	 */
240e6ee272c07a2 Keith Busch       2020-06-29   94  	if (id->zoc) {
240e6ee272c07a2 Keith Busch       2020-06-29   95  		dev_warn(ns->ctrl->device,
240e6ee272c07a2 Keith Busch       2020-06-29   96  			"zone operations:%x not supported for namespace:%u\n",
240e6ee272c07a2 Keith Busch       2020-06-29   97  			le16_to_cpu(id->zoc), ns->head->ns_id);
a9e0e6bc728ebcf Christoph Hellwig 2021-04-07   98  		status = -ENODEV;
240e6ee272c07a2 Keith Busch       2020-06-29   99  		goto free_data;
240e6ee272c07a2 Keith Busch       2020-06-29  100  	}
240e6ee272c07a2 Keith Busch       2020-06-29  101  
240e6ee272c07a2 Keith Busch       2020-06-29  102  	ns->zsze = nvme_lba_to_sect(ns, le64_to_cpu(id->lbafe[lbaf].zsze));
240e6ee272c07a2 Keith Busch       2020-06-29  103  	if (!is_power_of_2(ns->zsze)) {
240e6ee272c07a2 Keith Busch       2020-06-29  104  		dev_warn(ns->ctrl->device,
240e6ee272c07a2 Keith Busch       2020-06-29  105  			"invalid zone size:%llu for namespace:%u\n",
240e6ee272c07a2 Keith Busch       2020-06-29  106  			ns->zsze, ns->head->ns_id);
a9e0e6bc728ebcf Christoph Hellwig 2021-04-07  107  		status = -ENODEV;
240e6ee272c07a2 Keith Busch       2020-06-29  108  		goto free_data;
240e6ee272c07a2 Keith Busch       2020-06-29  109  	}
240e6ee272c07a2 Keith Busch       2020-06-29  110  
6b2bd274744e645 Christoph Hellwig 2022-07-06  111  	disk_set_zoned(ns->disk, BLK_ZONED_HM);
240e6ee272c07a2 Keith Busch       2020-06-29  112  	blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
982977df48179c8 Christoph Hellwig 2022-07-06  113  	disk_set_max_open_zones(ns->disk, le32_to_cpu(id->mor) + 1);
982977df48179c8 Christoph Hellwig 2022-07-06  114  	disk_set_max_active_zones(ns->disk, le32_to_cpu(id->mar) + 1);
240e6ee272c07a2 Keith Busch       2020-06-29  115  free_data:
240e6ee272c07a2 Keith Busch       2020-06-29  116  	kfree(id);
240e6ee272c07a2 Keith Busch       2020-06-29  117  	return status;
240e6ee272c07a2 Keith Busch       2020-06-29  118  }
240e6ee272c07a2 Keith Busch       2020-06-29  119  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH] nvme-core: remove head->effects to fix use-after-free
Posted by Jens Axboe 2 years, 1 month ago
On 11/15/23 11:54 AM, Yuanyuan Zhong wrote:
> The head->effects stores a pointer to the controller's Command Sets
> Supported and Effects log. When the namespace head is shared by multiple
> controllers, if the particular controller is removed, dereferencing
> head->effects causes use-after-free:
> 
> [  227.820066] ==================================================================
> [  227.820069] BUG: KFENCE: use-after-free read in nvme_command_effects+0x1f/0x90 [nvme_core]
> 
> [  227.820091] Use-after-free read at 0x00000000c02dadcf (in kfence-#0):
> [  227.820094]  nvme_command_effects+0x1f/0x90 [nvme_core]
> [  227.820107]  nvme_passthru_start+0x19/0x80 [nvme_core]
> [  227.820121]  nvme_submit_user_cmd+0xc7/0x170 [nvme_core]
> [  227.820136]  nvme_user_cmd.constprop.16+0x152/0x1d0 [nvme_core]
> [  227.820149]  nvme_ns_head_ioctl+0x92/0x140 [nvme_core]
> [  227.820162]  blkdev_ioctl+0x1c5/0x280
> [  227.820169]  __x64_sys_ioctl+0x93/0xd0
> [  227.820174]  do_syscall_64+0x45/0xf0
> [  227.820177]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
> 
> [  227.820182] kfence-#0: 0x000000000fac1d5d-0x00000000a28a73c3, size=4096, cache=kmalloc-4k
> 
> [  227.820185] allocated by task 2559 on cpu 3 at 188.703118s:
> [  227.820233]  __kmem_cache_alloc_node+0x3c9/0x410
> [  227.820236]  kmalloc_trace+0x2a/0xa0
> [  227.820238]  nvme_get_effects_log+0x68/0xd0 [nvme_core]
> [  227.820251]  nvme_init_identify+0x5ef/0x640 [nvme_core]
> [  227.820263]  nvme_init_ctrl_finish+0x8d/0x250 [nvme_core]
> [  227.820275]  nvme_tcp_setup_ctrl+0x1ce/0x710 [nvme_tcp]
> [  227.820281]  nvme_tcp_create_ctrl+0x359/0x450 [nvme_tcp]
> [  227.820286]  nvmf_dev_write+0x203/0x3b0 [nvme_fabrics]
> [  227.820292]  vfs_write+0xd2/0x3d0
> [  227.820294]  ksys_write+0x5d/0xd0
> [  227.820297]  do_syscall_64+0x45/0xf0
> [  227.820298]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
> 
> [  227.820302] freed by task 2521 on cpu 28 at 220.115945s:
> [  227.820329]  nvme_free_ctrl+0xa6/0x260 [nvme_core]
> [  227.820342]  device_release+0x37/0x90
> [  227.820345]  kobject_release+0x57/0x120
> [  227.820347]  nvme_sysfs_delete+0x39/0x50 [nvme_core]
> [  227.820360]  kernfs_fop_write_iter+0x144/0x1e0
> [  227.820362]  vfs_write+0x25f/0x3d0
> [  227.820364]  ksys_write+0x5d/0xd0
> [  227.820366]  do_syscall_64+0x45/0xf0
> [  227.820368]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
> 
> Fix this by removing head->effects. Use the Commands Supported and Effects log
> in ctrl->cels directly.
> 
> Fixes: be93e87e7802 ("nvme: support for multiple Command Sets Supported and Effects log pages")
> Signed-off-by: Yuanyuan Zhong <yzhong@purestorage.com>
> Reviewed-by: Randy Jennings <randyj@purestorage.com>
> Reviewed-by: Hamilton Coutinho <hcoutinho@purestorage.com>
> ---
>  drivers/nvme/host/core.c | 17 ++++++++---------
>  drivers/nvme/host/nvme.h |  1 -
>  2 files changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 88b54cdcbd68..14fdc2de3a75 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1085,7 +1085,9 @@ u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode)
>  	u32 effects = 0;
>  
>  	if (ns) {
> -		effects = le32_to_cpu(ns->head->effects->iocs[opcode]);
> +		struct nvme_effects_log	*cel = xa_load(&ctrl->cels, ns->head->ids.csi);
> +
> +		effects = le32_to_cpu(cel->iocs[opcode]);
>  		if (effects & ~(NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC))
>  			dev_warn_once(ctrl->device,
>  				"IO command:%02x has unusual effects:%08x\n",

This is in the hot path for passthrough, can't we simply reload
->effects when we need?

-- 
Jens Axboe
Re: [PATCH] nvme-core: remove head->effects to fix use-after-free
Posted by Yuanyuan Zhong 2 years, 1 month ago
On Wed, Nov 15, 2023 at 11:02 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 11/15/23 11:54 AM, Yuanyuan Zhong wrote:
> > The head->effects stores a pointer to the controller's Command Sets
> > Supported and Effects log. When the namespace head is shared by multiple
> > controllers, if the particular controller is removed, dereferencing
> > head->effects causes use-after-free:
> >
> > [  227.820066] ==================================================================
> > [  227.820069] BUG: KFENCE: use-after-free read in nvme_command_effects+0x1f/0x90 [nvme_core]
> >
> > [  227.820091] Use-after-free read at 0x00000000c02dadcf (in kfence-#0):
> > [  227.820094]  nvme_command_effects+0x1f/0x90 [nvme_core]
> > [  227.820107]  nvme_passthru_start+0x19/0x80 [nvme_core]
> > [  227.820121]  nvme_submit_user_cmd+0xc7/0x170 [nvme_core]
> > [  227.820136]  nvme_user_cmd.constprop.16+0x152/0x1d0 [nvme_core]
> > [  227.820149]  nvme_ns_head_ioctl+0x92/0x140 [nvme_core]
> > [  227.820162]  blkdev_ioctl+0x1c5/0x280
> > [  227.820169]  __x64_sys_ioctl+0x93/0xd0
> > [  227.820174]  do_syscall_64+0x45/0xf0
> > [  227.820177]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
> >
> > [  227.820182] kfence-#0: 0x000000000fac1d5d-0x00000000a28a73c3, size=4096, cache=kmalloc-4k
> >
> > [  227.820185] allocated by task 2559 on cpu 3 at 188.703118s:
> > [  227.820233]  __kmem_cache_alloc_node+0x3c9/0x410
> > [  227.820236]  kmalloc_trace+0x2a/0xa0
> > [  227.820238]  nvme_get_effects_log+0x68/0xd0 [nvme_core]
> > [  227.820251]  nvme_init_identify+0x5ef/0x640 [nvme_core]
> > [  227.820263]  nvme_init_ctrl_finish+0x8d/0x250 [nvme_core]
> > [  227.820275]  nvme_tcp_setup_ctrl+0x1ce/0x710 [nvme_tcp]
> > [  227.820281]  nvme_tcp_create_ctrl+0x359/0x450 [nvme_tcp]
> > [  227.820286]  nvmf_dev_write+0x203/0x3b0 [nvme_fabrics]
> > [  227.820292]  vfs_write+0xd2/0x3d0
> > [  227.820294]  ksys_write+0x5d/0xd0
> > [  227.820297]  do_syscall_64+0x45/0xf0
> > [  227.820298]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
> >
> > [  227.820302] freed by task 2521 on cpu 28 at 220.115945s:
> > [  227.820329]  nvme_free_ctrl+0xa6/0x260 [nvme_core]
> > [  227.820342]  device_release+0x37/0x90
> > [  227.820345]  kobject_release+0x57/0x120
> > [  227.820347]  nvme_sysfs_delete+0x39/0x50 [nvme_core]
> > [  227.820360]  kernfs_fop_write_iter+0x144/0x1e0
> > [  227.820362]  vfs_write+0x25f/0x3d0
> > [  227.820364]  ksys_write+0x5d/0xd0
> > [  227.820366]  do_syscall_64+0x45/0xf0
> > [  227.820368]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
> >
> > Fix this by removing head->effects. Use the Commands Supported and Effects log
> > in ctrl->cels directly.
> >
> > Fixes: be93e87e7802 ("nvme: support for multiple Command Sets Supported and Effects log pages")
> > Signed-off-by: Yuanyuan Zhong <yzhong@purestorage.com>
> > Reviewed-by: Randy Jennings <randyj@purestorage.com>
> > Reviewed-by: Hamilton Coutinho <hcoutinho@purestorage.com>
> > ---
> >  drivers/nvme/host/core.c | 17 ++++++++---------
> >  drivers/nvme/host/nvme.h |  1 -
> >  2 files changed, 8 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 88b54cdcbd68..14fdc2de3a75 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -1085,7 +1085,9 @@ u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode)
> >       u32 effects = 0;
> >
> >       if (ns) {
> > -             effects = le32_to_cpu(ns->head->effects->iocs[opcode]);
> > +             struct nvme_effects_log *cel = xa_load(&ctrl->cels, ns->head->ids.csi);
> > +
> > +             effects = le32_to_cpu(cel->iocs[opcode]);
> >               if (effects & ~(NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC))
> >                       dev_warn_once(ctrl->device,
> >                               "IO command:%02x has unusual effects:%08x\n",
>
> This is in the hot path for passthrough, can't we simply reload
> ->effects when we need?

Do you mean something like this? If not, can you please elaborate
"when we need"?
-               struct nvme_effects_log *cel = xa_load(&ctrl->cels,
ns->head->ids.csi);
+               struct nvme_effects_log *cel = (ns->head->ids.csi ==
NVME_CSI_NVM) ?
+                       ctrl->effects : xa_load(&ctrl->cels, ns->head->ids.csi);
Will it be good to change ctrl->effects to ctrl->effects[3] for
already defined CSI?


>
> --
> Jens Axboe
>


--
Regards,
Yuanyuan Zhong
Re: [PATCH] nvme-core: remove head->effects to fix use-after-free
Posted by Keith Busch 2 years, 1 month ago
On Wed, Nov 15, 2023 at 11:21:53AM -0800, Yuanyuan Zhong wrote:
> On Wed, Nov 15, 2023 at 11:02 AM Jens Axboe <axboe@kernel.dk> wrote:
> 
> Do you mean something like this? If not, can you please elaborate
> "when we need"?
> -               struct nvme_effects_log *cel = xa_load(&ctrl->cels, ns->head->ids.csi);
> +               struct nvme_effects_log *cel = (ns->head->ids.csi == NVME_CSI_NVM) ?
> +                       ctrl->effects : xa_load(&ctrl->cels, ns->head->ids.csi);
> Will it be good to change ctrl->effects to ctrl->effects[3] for
> already defined CSI?

I suggest either re-assign the cached head->effects to one from a still
live controller when current path is removed, or move the saved effects
to the subsystem instead of the controller. All controllers in the
subsystem should be reporting the same effects log anyway, so
duplicating all that per-controller is kind of wasteful.
Re: [PATCH] nvme-core: remove head->effects to fix use-after-free
Posted by Yuanyuan Zhong 2 years, 1 month ago
On Wed, Nov 15, 2023 at 11:55 AM Keith Busch <kbusch@kernel.org> wrote:
>
> On Wed, Nov 15, 2023 at 11:21:53AM -0800, Yuanyuan Zhong wrote:
> > On Wed, Nov 15, 2023 at 11:02 AM Jens Axboe <axboe@kernel.dk> wrote:
> >
> > Do you mean something like this? If not, can you please elaborate
> > "when we need"?
> > -               struct nvme_effects_log *cel = xa_load(&ctrl->cels, ns->head->ids.csi);
> > +               struct nvme_effects_log *cel = (ns->head->ids.csi == NVME_CSI_NVM) ?
> > +                       ctrl->effects : xa_load(&ctrl->cels, ns->head->ids.csi);
> > Will it be good to change ctrl->effects to ctrl->effects[3] for
> > already defined CSI?
>
> I suggest either re-assign the cached head->effects to one from a still
> live controller when current path is removed, or move the saved effects
> to the subsystem instead of the controller. All controllers in the
> subsystem should be reporting the same effects log anyway, so
Is it specified in spec that all controllers in the subsystem
should be reporting the same effects log?
> duplicating all that per-controller is kind of wasteful.
Re: [PATCH] nvme-core: remove head->effects to fix use-after-free
Posted by Keith Busch 2 years, 1 month ago
On Wed, Nov 15, 2023 at 02:44:04PM -0800, Yuanyuan Zhong wrote:
> On Wed, Nov 15, 2023 at 11:55 AM Keith Busch <kbusch@kernel.org> wrote:
> >
> > On Wed, Nov 15, 2023 at 11:21:53AM -0800, Yuanyuan Zhong wrote:
> > > On Wed, Nov 15, 2023 at 11:02 AM Jens Axboe <axboe@kernel.dk> wrote:
> > >
> > > Do you mean something like this? If not, can you please elaborate
> > > "when we need"?
> > > -               struct nvme_effects_log *cel = xa_load(&ctrl->cels, ns->head->ids.csi);
> > > +               struct nvme_effects_log *cel = (ns->head->ids.csi == NVME_CSI_NVM) ?
> > > +                       ctrl->effects : xa_load(&ctrl->cels, ns->head->ids.csi);
> > > Will it be good to change ctrl->effects to ctrl->effects[3] for
> > > already defined CSI?
> >
> > I suggest either re-assign the cached head->effects to one from a still
> > live controller when current path is removed, or move the saved effects
> > to the subsystem instead of the controller. All controllers in the
> > subsystem should be reporting the same effects log anyway, so
> Is it specified in spec that all controllers in the subsystem
> should be reporting the same effects log?

Yes, in section 5.16.1.6, "Commands Supported and Effects":

  This log page is used to describe the commands that the controller
  supports and the effects of those commands on the state of the NVM
  subsystem.

Oddly enough, Figure 202 says the scope of the log page is "Controller"
rather than "Subsystem". Sounds like ECN potential. You can memcmp the
effects log from each controller for a sanity check if you think some
subsystem controllers messed that up.
Re: [PATCH] nvme-core: remove head->effects to fix use-after-free
Posted by Christoph Hellwig 2 years, 1 month ago
On Wed, Nov 15, 2023 at 10:52:01PM -0500, Keith Busch wrote:
> 
> Yes, in section 5.16.1.6, "Commands Supported and Effects":
> 
>   This log page is used to describe the commands that the controller
>   supports and the effects of those commands on the state of the NVM
>   subsystem.
> 
> Oddly enough, Figure 202 says the scope of the log page is "Controller"
> rather than "Subsystem". Sounds like ECN potential. You can memcmp the
> effects log from each controller for a sanity check if you think some
> subsystem controllers messed that up.

If we really want to be 111% sure we could read the effects for all
controllers and do a logical OR of them, but I think the reason for the
per-controller scope is that for odd subsystems where different
controllers don't actually access the same namespaces these flags
could be different, i.e. one that only does KV, one that does ZNS,
one that does NVM and one that is just an administrative controller.
Re: [PATCH] nvme-core: remove head->effects to fix use-after-free
Posted by Keith Busch 2 years, 1 month ago
On Fri, Nov 17, 2023 at 02:28:46PM +0100, Christoph Hellwig wrote:
> On Wed, Nov 15, 2023 at 10:52:01PM -0500, Keith Busch wrote:
> > 
> > Yes, in section 5.16.1.6, "Commands Supported and Effects":
> > 
> >   This log page is used to describe the commands that the controller
> >   supports and the effects of those commands on the state of the NVM
> >   subsystem.
> > 
> > Oddly enough, Figure 202 says the scope of the log page is "Controller"
> > rather than "Subsystem". Sounds like ECN potential. You can memcmp the
> > effects log from each controller for a sanity check if you think some
> > subsystem controllers messed that up.
> 
> If we really want to be 111% sure we could read the effects for all
> controllers and do a logical OR of them, but I think the reason for the
> per-controller scope is that for odd subsystems where different
> controllers don't actually access the same namespaces these flags
> could be different, i.e. one that only does KV, one that does ZNS,
> one that does NVM and one that is just an administrative controller.

The effects log is per-CSI so different command sets won't create
conflicts.

Namespaces that are not shared don't really matter here because this
problem is unique to mulitpath.

It doesn't make sense for effects logs to be different per-controller
for the same shared namespace. The spec doesn't seem to explicitly
prevent that, but hints that all hosts should be seeing the same thing
no matter which controller they're connected to:

"
  If the namespace is attached to multiple controllers, the host(s)
  associated with those controllers should coordinate their commands to
  meet the Command Submission and Execution requirements
"

That couldn't be a reliable suggestion if the hosts observe diverging
effects. For the controllers that a host does see, though, I agree we
should use the most cautious effects reported with logical OR of them.
Re: [PATCH] nvme-core: remove head->effects to fix use-after-free
Posted by Christoph Hellwig 2 years, 1 month ago
On Fri, Nov 17, 2023 at 09:38:19AM -0700, Keith Busch wrote:
> The effects log is per-CSI so different command sets won't create
> conflicts.

True.  But that wasn't the point anyway.  It is that different
controllers might expose very different namespaes with different
capabilities.  Maybe a controller with HDD namespaces vs flash might
be a better example.

> Namespaces that are not shared don't really matter here because this
> problem is unique to mulitpath.

Indeed.

> It doesn't make sense for effects logs to be different per-controller
> for the same shared namespace. The spec doesn't seem to explicitly
> prevent that, but hints that all hosts should be seeing the same thing
> no matter which controller they're connected to:

Also agreed as already indicated in the past mail.
Re: [PATCH] nvme-core: remove head->effects to fix use-after-free
Posted by Sagi Grimberg 2 years, 1 month ago

On 11/20/23 10:23, Christoph Hellwig wrote:
> On Fri, Nov 17, 2023 at 09:38:19AM -0700, Keith Busch wrote:
>> The effects log is per-CSI so different command sets won't create
>> conflicts.
> 
> True.  But that wasn't the point anyway.  It is that different
> controllers might expose very different namespaes with different
> capabilities.  Maybe a controller with HDD namespaces vs flash might
> be a better example.
> 
>> Namespaces that are not shared don't really matter here because this
>> problem is unique to mulitpath.
> 
> Indeed.
> 
>> It doesn't make sense for effects logs to be different per-controller
>> for the same shared namespace. The spec doesn't seem to explicitly
>> prevent that, but hints that all hosts should be seeing the same thing
>> no matter which controller they're connected to:
> 
> Also agreed as already indicated in the past mail.

Having every ns get its own effects log cache is another 4k per nshead.
Even if we restrict it only to iocs its 1k per nshead.

Would it make sense to have nvme_free_cels fence passthru commands
with an rcu instead?
Re: [PATCH] nvme-core: remove head->effects to fix use-after-free
Posted by Randy Jennings 2 years, 1 month ago
>> It doesn't make sense for effects logs to be different per-controller
>> for the same shared namespace. The spec doesn't seem to explicitly
>> prevent that, but hints that all hosts should be seeing the same thing
>> no matter which controller they're connected to:
>
> Also agreed as already indicated in the past mail.

Yuanyuan Zhong already pointed out a situation where the commands
supported portion of the log page could be different.  When upgrading
the subsystem, some controllers may be upgraded sooner than others.
The upgrade could support new commands.

However, I would be surprised if the effects would be different for
currently supported commands, unless a controller was not reporting
effects before and starts reporting them.

Sincerely,
Randy Jennings
Re: [PATCH] nvme-core: remove head->effects to fix use-after-free
Posted by Yuanyuan Zhong 2 years, 1 month ago
On Wed, Nov 15, 2023 at 7:52 PM Keith Busch <kbusch@kernel.org> wrote:
>
> On Wed, Nov 15, 2023 at 02:44:04PM -0800, Yuanyuan Zhong wrote:
> > On Wed, Nov 15, 2023 at 11:55 AM Keith Busch <kbusch@kernel.org> wrote:
> > >
> > > On Wed, Nov 15, 2023 at 11:21:53AM -0800, Yuanyuan Zhong wrote:
> > > > On Wed, Nov 15, 2023 at 11:02 AM Jens Axboe <axboe@kernel.dk> wrote:
> > > >
> > > > Do you mean something like this? If not, can you please elaborate
> > > > "when we need"?
> > > > -               struct nvme_effects_log *cel = xa_load(&ctrl->cels, ns->head->ids.csi);
> > > > +               struct nvme_effects_log *cel = (ns->head->ids.csi == NVME_CSI_NVM) ?
> > > > +                       ctrl->effects : xa_load(&ctrl->cels, ns->head->ids.csi);
> > > > Will it be good to change ctrl->effects to ctrl->effects[3] for
> > > > already defined CSI?
> > >
> > > I suggest either re-assign the cached head->effects to one from a still
> > > live controller when current path is removed, or move the saved effects
> > > to the subsystem instead of the controller. All controllers in the
> > > subsystem should be reporting the same effects log anyway, so
> > Is it specified in spec that all controllers in the subsystem
> > should be reporting the same effects log?
>
> Yes, in section 5.16.1.6, "Commands Supported and Effects":
>
>   This log page is used to describe the commands that the controller
>   supports and the effects of those commands on the state of the NVM
>   subsystem.
>
> Oddly enough, Figure 202 says the scope of the log page is "Controller"
> rather than "Subsystem". Sounds like ECN potential. You can memcmp the
> effects log from each controller for a sanity check if you think some
> subsystem controllers messed that up.

Yeah, it says scope is controller.
I think it’s valid to start upgrading one controller in the subsystem
to report different
effects log, e.g. adding or revoking CSUPP for some opcode.
If the saved effects log is kept in the subsystem, how to refresh it
for the subsystem?