[PATCH] scsi: lpfc: Fix improper handling of refcount in lpfc_bsg_hba_get_event()

Qiu-ji Chen posted 1 patch 2 weeks, 4 days ago
drivers/scsi/lpfc/lpfc_bsg.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
[PATCH] scsi: lpfc: Fix improper handling of refcount in lpfc_bsg_hba_get_event()
Posted by Qiu-ji Chen 2 weeks, 4 days ago
This patch addresses a reference count handling issue in the 
lpfc_bsg_hba_get_event() function. In the branch 
if (evt->reg_id == event_req->ev_reg_id), the function calls 
lpfc_bsg_event_ref(), which increments the reference count of the relevant 
resources. However, in the branch if (evt_dat == NULL), a goto statement 
directly jumps to the function’s final goto block, skipping the release 
operations at the end of the function. This means that, if the condition 
if (evt_dat == NULL) is met, the function fails to correctly release the 
resources acquired by lpfc_bsg_event_ref(), leading to a reference count 
leak.

To fix this issue, we added a new block job_error_unref before the 
job_error block. When the condition if (evt_dat == NULL) is met, the 
function will enter the job_error_unref block, ensuring that the previously
allocated resources are properly released, thereby preventing the reference
count leak.

This bug was identified by an experimental static analysis tool developed
by our team. The tool specializes in analyzing reference count operations
and detecting potential issues where resources are not properly managed.
In this case, the tool flagged the missing release operation as a
potential problem, which led to the development of this patch.

Fixes: 4cc0e56e977f ("[SCSI] lpfc 8.3.8: (BSG3) Modify BSG commands to operate asynchronously")
Cc: stable@vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666@gmail.com>
---
 drivers/scsi/lpfc/lpfc_bsg.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_bsg.c b/drivers/scsi/lpfc/lpfc_bsg.c
index 85059b83ea6b..832a5a6dd85f 100644
--- a/drivers/scsi/lpfc/lpfc_bsg.c
+++ b/drivers/scsi/lpfc/lpfc_bsg.c
@@ -1294,7 +1294,7 @@ lpfc_bsg_hba_get_event(struct bsg_job *job)
 	if (evt_dat == NULL) {
 		bsg_reply->reply_payload_rcv_len = 0;
 		rc = -ENOENT;
-		goto job_error;
+		goto job_error_unref;
 	}
 
 	if (evt_dat->len > job->request_payload.payload_len) {
@@ -1329,6 +1329,10 @@ lpfc_bsg_hba_get_event(struct bsg_job *job)
 		       bsg_reply->reply_payload_rcv_len);
 	return 0;
 
+job_err_unref:
+	spin_lock_irqsave(&phba->ct_ev_lock, flags);
+	lpfc_bsg_event_unref(evt);
+	spin_unlock_irqrestore(&phba->ct_ev_lock, flags);
 job_error:
 	job->dd_data = NULL;
 	bsg_reply->result = rc;
-- 
2.34.1

Re: [PATCH] scsi: lpfc: Fix improper handling of refcount in lpfc_bsg_hba_get_event()
Posted by kernel test robot 2 weeks, 4 days ago
Hi Qiu-ji,

kernel test robot noticed the following build errors:

[auto build test ERROR on jejb-scsi/for-next]
[also build test ERROR on mkp-scsi/for-next linus/master v6.12-rc6 next-20241105]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Qiu-ji-Chen/scsi-lpfc-Fix-improper-handling-of-refcount-in-lpfc_bsg_hba_get_event/20241105-211110
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
patch link:    https://lore.kernel.org/r/20241105130902.4603-1-chenqiuji666%40gmail.com
patch subject: [PATCH] scsi: lpfc: Fix improper handling of refcount in lpfc_bsg_hba_get_event()
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20241106/202411060527.qPI24Q8a-lkp@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241106/202411060527.qPI24Q8a-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411060527.qPI24Q8a-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from drivers/scsi/lpfc/lpfc_bsg.c:25:
   In file included from include/linux/pci.h:1650:
   In file included from include/linux/dmapool.h:14:
   In file included from include/linux/scatterlist.h:8:
   In file included from include/linux/mm.h:2213:
   include/linux/vmstat.h:504:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     504 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     505 |                            item];
         |                            ~~~~
   include/linux/vmstat.h:511:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     511 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     512 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     518 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   include/linux/vmstat.h:524:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     524 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     525 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
>> drivers/scsi/lpfc/lpfc_bsg.c:1297:8: error: use of undeclared label 'job_error_unref'
    1297 |                 goto job_error_unref;
         |                      ^
>> drivers/scsi/lpfc/lpfc_bsg.c:1332:1: warning: unused label 'job_err_unref' [-Wunused-label]
    1332 | job_err_unref:
         | ^~~~~~~~~~~~~~
   drivers/scsi/lpfc/lpfc_bsg.c:2810:11: warning: variable 'offset' set but not used [-Wunused-but-set-variable]
    2810 |         int cnt, offset = 0, i = 0;
         |                  ^
   6 warnings and 1 error generated.

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for MODVERSIONS
   Depends on [n]: MODULES [=y] && !COMPILE_TEST [=y]
   Selected by [y]:
   - RANDSTRUCT_FULL [=y] && (CC_HAS_RANDSTRUCT [=y] || GCC_PLUGINS [=n]) && MODULES [=y]


vim +/job_error_unref +1297 drivers/scsi/lpfc/lpfc_bsg.c

  1243	
  1244	/**
  1245	 * lpfc_bsg_hba_get_event - process a GET_EVENT bsg vendor command
  1246	 * @job: GET_EVENT fc_bsg_job
  1247	 **/
  1248	static int
  1249	lpfc_bsg_hba_get_event(struct bsg_job *job)
  1250	{
  1251		struct lpfc_vport *vport = shost_priv(fc_bsg_to_shost(job));
  1252		struct lpfc_hba *phba = vport->phba;
  1253		struct fc_bsg_request *bsg_request = job->request;
  1254		struct fc_bsg_reply *bsg_reply = job->reply;
  1255		struct get_ct_event *event_req;
  1256		struct get_ct_event_reply *event_reply;
  1257		struct lpfc_bsg_event *evt, *evt_next;
  1258		struct event_data *evt_dat = NULL;
  1259		unsigned long flags;
  1260		uint32_t rc = 0;
  1261	
  1262		if (job->request_len <
  1263		    sizeof(struct fc_bsg_request) + sizeof(struct get_ct_event)) {
  1264			lpfc_printf_log(phba, KERN_WARNING, LOG_LIBDFC,
  1265					"2613 Received GET_CT_EVENT request below "
  1266					"minimum size\n");
  1267			rc = -EINVAL;
  1268			goto job_error;
  1269		}
  1270	
  1271		event_req = (struct get_ct_event *)
  1272			bsg_request->rqst_data.h_vendor.vendor_cmd;
  1273	
  1274		event_reply = (struct get_ct_event_reply *)
  1275			bsg_reply->reply_data.vendor_reply.vendor_rsp;
  1276		spin_lock_irqsave(&phba->ct_ev_lock, flags);
  1277		list_for_each_entry_safe(evt, evt_next, &phba->ct_ev_waiters, node) {
  1278			if (evt->reg_id == event_req->ev_reg_id) {
  1279				if (list_empty(&evt->events_to_get))
  1280					break;
  1281				lpfc_bsg_event_ref(evt);
  1282				evt->wait_time_stamp = jiffies;
  1283				evt_dat = list_entry(evt->events_to_get.prev,
  1284						     struct event_data, node);
  1285				list_del(&evt_dat->node);
  1286				break;
  1287			}
  1288		}
  1289		spin_unlock_irqrestore(&phba->ct_ev_lock, flags);
  1290	
  1291		/* The app may continue to ask for event data until it gets
  1292		 * an error indicating that there isn't anymore
  1293		 */
  1294		if (evt_dat == NULL) {
  1295			bsg_reply->reply_payload_rcv_len = 0;
  1296			rc = -ENOENT;
> 1297			goto job_error_unref;
  1298		}
  1299	
  1300		if (evt_dat->len > job->request_payload.payload_len) {
  1301			evt_dat->len = job->request_payload.payload_len;
  1302			lpfc_printf_log(phba, KERN_WARNING, LOG_LIBDFC,
  1303					"2618 Truncated event data at %d "
  1304					"bytes\n",
  1305					job->request_payload.payload_len);
  1306		}
  1307	
  1308		event_reply->type = evt_dat->type;
  1309		event_reply->immed_data = evt_dat->immed_dat;
  1310		if (evt_dat->len > 0)
  1311			bsg_reply->reply_payload_rcv_len =
  1312				sg_copy_from_buffer(job->request_payload.sg_list,
  1313						    job->request_payload.sg_cnt,
  1314						    evt_dat->data, evt_dat->len);
  1315		else
  1316			bsg_reply->reply_payload_rcv_len = 0;
  1317	
  1318		if (evt_dat) {
  1319			kfree(evt_dat->data);
  1320			kfree(evt_dat);
  1321		}
  1322	
  1323		spin_lock_irqsave(&phba->ct_ev_lock, flags);
  1324		lpfc_bsg_event_unref(evt);
  1325		spin_unlock_irqrestore(&phba->ct_ev_lock, flags);
  1326		job->dd_data = NULL;
  1327		bsg_reply->result = 0;
  1328		bsg_job_done(job, bsg_reply->result,
  1329			       bsg_reply->reply_payload_rcv_len);
  1330		return 0;
  1331	
> 1332	job_err_unref:
  1333		spin_lock_irqsave(&phba->ct_ev_lock, flags);
  1334		lpfc_bsg_event_unref(evt);
  1335		spin_unlock_irqrestore(&phba->ct_ev_lock, flags);
  1336	job_error:
  1337		job->dd_data = NULL;
  1338		bsg_reply->result = rc;
  1339		return rc;
  1340	}
  1341	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH] scsi: lpfc: Fix improper handling of refcount in lpfc_bsg_hba_get_event()
Posted by kernel test robot 2 weeks, 4 days ago
Hi Qiu-ji,

kernel test robot noticed the following build errors:

[auto build test ERROR on jejb-scsi/for-next]
[also build test ERROR on mkp-scsi/for-next linus/master v6.12-rc6 next-20241105]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Qiu-ji-Chen/scsi-lpfc-Fix-improper-handling-of-refcount-in-lpfc_bsg_hba_get_event/20241105-211110
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
patch link:    https://lore.kernel.org/r/20241105130902.4603-1-chenqiuji666%40gmail.com
patch subject: [PATCH] scsi: lpfc: Fix improper handling of refcount in lpfc_bsg_hba_get_event()
config: i386-allmodconfig (https://download.01.org/0day-ci/archive/20241106/202411060332.fmmxsBzv-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241106/202411060332.fmmxsBzv-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411060332.fmmxsBzv-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   drivers/scsi/lpfc/lpfc_bsg.c: In function 'lpfc_bsg_hba_get_event':
>> drivers/scsi/lpfc/lpfc_bsg.c:1332:1: warning: label 'job_err_unref' defined but not used [-Wunused-label]
    1332 | job_err_unref:
         | ^~~~~~~~~~~~~
>> drivers/scsi/lpfc/lpfc_bsg.c:1297:17: error: label 'job_error_unref' used but not defined
    1297 |                 goto job_error_unref;
         |                 ^~~~

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for MODVERSIONS
   Depends on [n]: MODULES [=y] && !COMPILE_TEST [=y]
   Selected by [y]:
   - RANDSTRUCT_FULL [=y] && (CC_HAS_RANDSTRUCT [=n] || GCC_PLUGINS [=y]) && MODULES [=y]
   WARNING: unmet direct dependencies detected for GET_FREE_REGION
   Depends on [n]: SPARSEMEM [=n]
   Selected by [m]:
   - RESOURCE_KUNIT_TEST [=m] && RUNTIME_TESTING_MENU [=y] && KUNIT [=m]


vim +/job_error_unref +1297 drivers/scsi/lpfc/lpfc_bsg.c

  1243	
  1244	/**
  1245	 * lpfc_bsg_hba_get_event - process a GET_EVENT bsg vendor command
  1246	 * @job: GET_EVENT fc_bsg_job
  1247	 **/
  1248	static int
  1249	lpfc_bsg_hba_get_event(struct bsg_job *job)
  1250	{
  1251		struct lpfc_vport *vport = shost_priv(fc_bsg_to_shost(job));
  1252		struct lpfc_hba *phba = vport->phba;
  1253		struct fc_bsg_request *bsg_request = job->request;
  1254		struct fc_bsg_reply *bsg_reply = job->reply;
  1255		struct get_ct_event *event_req;
  1256		struct get_ct_event_reply *event_reply;
  1257		struct lpfc_bsg_event *evt, *evt_next;
  1258		struct event_data *evt_dat = NULL;
  1259		unsigned long flags;
  1260		uint32_t rc = 0;
  1261	
  1262		if (job->request_len <
  1263		    sizeof(struct fc_bsg_request) + sizeof(struct get_ct_event)) {
  1264			lpfc_printf_log(phba, KERN_WARNING, LOG_LIBDFC,
  1265					"2613 Received GET_CT_EVENT request below "
  1266					"minimum size\n");
  1267			rc = -EINVAL;
  1268			goto job_error;
  1269		}
  1270	
  1271		event_req = (struct get_ct_event *)
  1272			bsg_request->rqst_data.h_vendor.vendor_cmd;
  1273	
  1274		event_reply = (struct get_ct_event_reply *)
  1275			bsg_reply->reply_data.vendor_reply.vendor_rsp;
  1276		spin_lock_irqsave(&phba->ct_ev_lock, flags);
  1277		list_for_each_entry_safe(evt, evt_next, &phba->ct_ev_waiters, node) {
  1278			if (evt->reg_id == event_req->ev_reg_id) {
  1279				if (list_empty(&evt->events_to_get))
  1280					break;
  1281				lpfc_bsg_event_ref(evt);
  1282				evt->wait_time_stamp = jiffies;
  1283				evt_dat = list_entry(evt->events_to_get.prev,
  1284						     struct event_data, node);
  1285				list_del(&evt_dat->node);
  1286				break;
  1287			}
  1288		}
  1289		spin_unlock_irqrestore(&phba->ct_ev_lock, flags);
  1290	
  1291		/* The app may continue to ask for event data until it gets
  1292		 * an error indicating that there isn't anymore
  1293		 */
  1294		if (evt_dat == NULL) {
  1295			bsg_reply->reply_payload_rcv_len = 0;
  1296			rc = -ENOENT;
> 1297			goto job_error_unref;
  1298		}
  1299	
  1300		if (evt_dat->len > job->request_payload.payload_len) {
  1301			evt_dat->len = job->request_payload.payload_len;
  1302			lpfc_printf_log(phba, KERN_WARNING, LOG_LIBDFC,
  1303					"2618 Truncated event data at %d "
  1304					"bytes\n",
  1305					job->request_payload.payload_len);
  1306		}
  1307	
  1308		event_reply->type = evt_dat->type;
  1309		event_reply->immed_data = evt_dat->immed_dat;
  1310		if (evt_dat->len > 0)
  1311			bsg_reply->reply_payload_rcv_len =
  1312				sg_copy_from_buffer(job->request_payload.sg_list,
  1313						    job->request_payload.sg_cnt,
  1314						    evt_dat->data, evt_dat->len);
  1315		else
  1316			bsg_reply->reply_payload_rcv_len = 0;
  1317	
  1318		if (evt_dat) {
  1319			kfree(evt_dat->data);
  1320			kfree(evt_dat);
  1321		}
  1322	
  1323		spin_lock_irqsave(&phba->ct_ev_lock, flags);
  1324		lpfc_bsg_event_unref(evt);
  1325		spin_unlock_irqrestore(&phba->ct_ev_lock, flags);
  1326		job->dd_data = NULL;
  1327		bsg_reply->result = 0;
  1328		bsg_job_done(job, bsg_reply->result,
  1329			       bsg_reply->reply_payload_rcv_len);
  1330		return 0;
  1331	
> 1332	job_err_unref:
  1333		spin_lock_irqsave(&phba->ct_ev_lock, flags);
  1334		lpfc_bsg_event_unref(evt);
  1335		spin_unlock_irqrestore(&phba->ct_ev_lock, flags);
  1336	job_error:
  1337		job->dd_data = NULL;
  1338		bsg_reply->result = rc;
  1339		return rc;
  1340	}
  1341	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH] scsi: lpfc: Fix improper handling of refcount in lpfc_bsg_hba_get_event()
Posted by Justin Tee 2 weeks, 4 days ago
Hi Qiu-ji,

Similar to the other suggested patch, this does not look logically
correct.  if (evt_dat == NULL) evaluates to true, then that means the
list_for_each_entry_safe(evt, evt_next, &phba->ct_ev_waiters, node)
loop did not find an evt lpfc_bsg_event object of interest or that the
phba->ct_ev_waiters list is empty.

Why would this patch want to call lpfc_bsg_event_unref on an evt
object that is not of specified interest indicated by the bsg
event_req object?

Even worse, as mentioned in the other email, this patch could kref_put
on the phba->ct_ev_waiters head which is not a preallocated
lpfc_bsg_event object leading to references on an uninitialized memory
region.

Sorry, but I cannot acknowledge this patch as well.

Regards,
Justin