scsi: 3w-sas: validate request_id reported by controller

[PATCH v2] scsi: 3w-sas: validate request_id reported by controller

Posted by Shipei Qu 1 month, 3 weeks ago

Hi,

resending with a correctly generated diff (the previous email had a malformed
patch header). The content is unchanged: while testing 3w-sas with an emulated
3ware 9750-compatible controller we noticed that the driver trusts the
request_id field reported by the controller in the completion path and uses it
as an index into several TW_Q_LENGTH-sized arrays without validating the range.
A bogus value can lead to out-of-bounds accesses and crashes.

For example, a malicious PCIe/Thunderbolt-attached device that emulates a
3ware controller can return crafted completions and drive the driver out of
bounds in this way. We have a small QEMU-based PoC that reproduces the problem
and can share it if that would be helpful.

This driver is enabled by default in common distributions (e.g. Ubuntu), so a
misbehaving or emulated controller can trigger this on a stock installation.
The same pattern is present in current mainline kernels.

This issue was first reported via security@kernel.org. The kernel security team
replied that, under the usual upstream threat model (only trusted
PCIe/Thunderbolt devices are allowed to bind to such drivers), it should be
treated as a normal robustness bug rather than a security issue, and asked us
to send fixes to the relevant development lists. This email follows that
guidance.

The fix is to reject out-of-range request_id values before indexing the
per-request arrays in the interrupt handler.

Reported-by: DARKNAVY (@DarkNavyOrg) <vr@darknavy.com>
Signed-off-by: Shipei Qu <qu@darknavy.com>
---
 drivers/scsi/3w-sas.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/scsi/3w-sas.c b/drivers/scsi/3w-sas.c
index e319be7d3..4d1c31c04 100644
--- a/drivers/scsi/3w-sas.c
+++ b/drivers/scsi/3w-sas.c
@@ -1184,6 +1184,15 @@ static irqreturn_t twl_interrupt(int irq, void *dev_instance)
 		} else
 			request_id = TW_RESID_OUT(response);
 
+		if (request_id >= TW_Q_LENGTH) {
+			TW_PRINTK(tw_dev->host, TW_DRIVER, 0x10,
+				  "Received out-of-range request id %u",
+				  request_id);
+			TWL_MASK_INTERRUPTS(tw_dev);
+			/* let the reset / error handling path deal with it */
+			goto twl_interrupt_bail;
+		}
+
 		full_command_packet = tw_dev->command_packet_virt[request_id];
 
 		/* Check for correct state */
-- 
2.45.1

Re: [PATCH v2] scsi: 3w-sas: validate request_id reported by controller

Posted by kernel test robot 1 month, 2 weeks ago

Hi Shipei,

kernel test robot noticed the following build errors:

[auto build test ERROR on jejb-scsi/for-next]
[also build test ERROR on mkp-scsi/for-next linus/master v6.19-rc1 next-20251219]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Shipei-Qu/scsi-3w-sas-validate-request_id-reported-by-controller/20251216-140928
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
patch link:    https://lore.kernel.org/r/20251216060156.41320-1-qu%40darknavy.com
patch subject: [PATCH v2] scsi: 3w-sas: validate request_id reported by controller
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20251220/202512202253.30DG2Afm-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251220/202512202253.30DG2Afm-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512202253.30DG2Afm-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/scsi/3w-sas.c:1190:7: error: too many arguments provided to function-like macro invocation
    1190 |                                   request_id);
         |                                   ^
   drivers/scsi/3w-sas.h:207:9: note: macro 'TW_PRINTK' defined here
     207 | #define TW_PRINTK(h,a,b,c) { \
         |         ^
>> drivers/scsi/3w-sas.c:1188:4: error: use of undeclared identifier 'TW_PRINTK'
    1188 |                         TW_PRINTK(tw_dev->host, TW_DRIVER, 0x10,
         |                         ^
   drivers/scsi/3w-sas.c:1579:49: warning: shift count >= width of type [-Wshift-count-overflow]
    1579 |         retval = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
         |                                                        ^~~~~~~~~~~~~~~~
   include/linux/dma-mapping.h:93:54: note: expanded from macro 'DMA_BIT_MASK'
      93 | #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
         |                                                      ^ ~~~
   drivers/scsi/3w-sas.c:1796:49: warning: shift count >= width of type [-Wshift-count-overflow]
    1796 |         retval = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
         |                                                        ^~~~~~~~~~~~~~~~
   include/linux/dma-mapping.h:93:54: note: expanded from macro 'DMA_BIT_MASK'
      93 | #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
         |                                                      ^ ~~~
   2 warnings and 2 errors generated.


vim +1190 drivers/scsi/3w-sas.c

  1117	
  1118	/* Interrupt service routine */
  1119	static irqreturn_t twl_interrupt(int irq, void *dev_instance)
  1120	{
  1121		TW_Device_Extension *tw_dev = (TW_Device_Extension *)dev_instance;
  1122		int i, handled = 0, error = 0;
  1123		dma_addr_t mfa = 0;
  1124		u32 reg, regl, regh, response, request_id = 0;
  1125		struct scsi_cmnd *cmd;
  1126		TW_Command_Full *full_command_packet;
  1127	
  1128		spin_lock(tw_dev->host->host_lock);
  1129	
  1130		/* Read host interrupt status */
  1131		reg = readl(TWL_HISTAT_REG_ADDR(tw_dev));
  1132	
  1133		/* Check if this is our interrupt, otherwise bail */
  1134		if (!(reg & TWL_HISTATUS_VALID_INTERRUPT))
  1135			goto twl_interrupt_bail;
  1136	
  1137		handled = 1;
  1138	
  1139		/* If we are resetting, bail */
  1140		if (test_bit(TW_IN_RESET, &tw_dev->flags))
  1141			goto twl_interrupt_bail;
  1142	
  1143		/* Attention interrupt */
  1144		if (reg & TWL_HISTATUS_ATTENTION_INTERRUPT) {
  1145			if (twl_handle_attention_interrupt(tw_dev)) {
  1146				TWL_MASK_INTERRUPTS(tw_dev);
  1147				goto twl_interrupt_bail;
  1148			}
  1149		}
  1150	
  1151		/* Response interrupt */
  1152		while (reg & TWL_HISTATUS_RESPONSE_INTERRUPT) {
  1153			if (sizeof(dma_addr_t) > 4) {
  1154				regh = readl(TWL_HOBQPH_REG_ADDR(tw_dev));
  1155				regl = readl(TWL_HOBQPL_REG_ADDR(tw_dev));
  1156				mfa = ((u64)regh << 32) | regl;
  1157			} else
  1158				mfa = readl(TWL_HOBQPL_REG_ADDR(tw_dev));
  1159	
  1160			error = 0;
  1161			response = (u32)mfa;
  1162	
  1163			/* Check for command packet error */
  1164			if (!TW_NOTMFA_OUT(response)) {
  1165				for (i=0;i<TW_Q_LENGTH;i++) {
  1166					if (tw_dev->sense_buffer_phys[i] == mfa) {
  1167						request_id = le16_to_cpu(tw_dev->sense_buffer_virt[i]->header_desc.request_id);
  1168						if (tw_dev->srb[request_id] != NULL)
  1169							error = twl_fill_sense(tw_dev, i, request_id, 1, 1);
  1170						else {
  1171							/* Skip ioctl error prints */
  1172							if (request_id != tw_dev->chrdev_request_id)
  1173								error = twl_fill_sense(tw_dev, i, request_id, 0, 1);
  1174							else
  1175								memcpy(tw_dev->command_packet_virt[request_id], tw_dev->sense_buffer_virt[i], sizeof(TW_Command_Apache_Header));
  1176						}
  1177	
  1178						/* Now re-post the sense buffer */
  1179						writel((u32)((u64)tw_dev->sense_buffer_phys[i] >> 32), TWL_HOBQPH_REG_ADDR(tw_dev));
  1180						writel((u32)tw_dev->sense_buffer_phys[i], TWL_HOBQPL_REG_ADDR(tw_dev));
  1181						break;
  1182					}
  1183				}
  1184			} else
  1185				request_id = TW_RESID_OUT(response);
  1186	
  1187			if (request_id >= TW_Q_LENGTH) {
> 1188				TW_PRINTK(tw_dev->host, TW_DRIVER, 0x10,
  1189					  "Received out-of-range request id %u",
> 1190					  request_id);
  1191				TWL_MASK_INTERRUPTS(tw_dev);
  1192				/* let the reset / error handling path deal with it */
  1193				goto twl_interrupt_bail;
  1194			}
  1195	
  1196			full_command_packet = tw_dev->command_packet_virt[request_id];
  1197	
  1198			/* Check for correct state */
  1199			if (tw_dev->state[request_id] != TW_S_POSTED) {
  1200				if (tw_dev->srb[request_id] != NULL) {
  1201					TW_PRINTK(tw_dev->host, TW_DRIVER, 0xe, "Received a request id that wasn't posted");
  1202					TWL_MASK_INTERRUPTS(tw_dev);
  1203					goto twl_interrupt_bail;
  1204				}
  1205			}
  1206	
  1207			/* Check for internal command completion */
  1208			if (tw_dev->srb[request_id] == NULL) {
  1209				if (request_id != tw_dev->chrdev_request_id) {
  1210					if (twl_aen_complete(tw_dev, request_id))
  1211						TW_PRINTK(tw_dev->host, TW_DRIVER, 0xf, "Error completing AEN during attention interrupt");
  1212				} else {
  1213					tw_dev->chrdev_request_id = TW_IOCTL_CHRDEV_FREE;
  1214					wake_up(&tw_dev->ioctl_wqueue);
  1215				}
  1216			} else {
  1217				cmd = tw_dev->srb[request_id];
  1218	
  1219				if (!error)
  1220					cmd->result = (DID_OK << 16);
  1221	
  1222				/* Report residual bytes for single sgl */
  1223				if ((scsi_sg_count(cmd) <= 1) && (full_command_packet->command.newcommand.status == 0)) {
  1224					if (full_command_packet->command.newcommand.sg_list[0].length < scsi_bufflen(tw_dev->srb[request_id]))
  1225						scsi_set_resid(cmd, scsi_bufflen(cmd) - full_command_packet->command.newcommand.sg_list[0].length);
  1226				}
  1227	
  1228				/* Now complete the io */
  1229				scsi_dma_unmap(cmd);
  1230				scsi_done(cmd);
  1231				tw_dev->state[request_id] = TW_S_COMPLETED;
  1232				twl_free_request_id(tw_dev, request_id);
  1233				tw_dev->posted_request_count--;
  1234			}
  1235	
  1236			/* Check for another response interrupt */
  1237			reg = readl(TWL_HISTAT_REG_ADDR(tw_dev));
  1238		}
  1239	
  1240	twl_interrupt_bail:
  1241		spin_unlock(tw_dev->host->host_lock);
  1242		return IRQ_RETVAL(handled);
  1243	} /* End twl_interrupt() */
  1244	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Re: [PATCH v2] scsi: 3w-sas: validate request_id reported by controller

Posted by James Bottomley 1 month, 3 weeks ago

On Tue, 2025-12-16 at 14:01 +0800, Shipei Qu wrote:
> This issue was first reported via security@kernel.org. The kernel
> security team replied that, under the usual upstream threat model
> (only trusted PCIe/Thunderbolt devices are allowed to bind to such
> drivers), it should be treated as a normal robustness bug rather than
> a security issue, and asked us to send fixes to the relevant
> development lists. This email follows that guidance.

Realistically the same rationale goes for us as well.  Absent the
observation in the field of a problem device, we usually trust the
hardware, so unless you can find an actual device that exhibits the
problem this isn't really a fix for anything.

If the security@ list is happy that the existing trust model for
Thunderbolt/PCIe would prevent the attachment of malicious devices,
then I think we're also happy to take their word for it.  Even if
thunderbolt were a security problem, the fix would likely be in the
trust model not in all possible drivers.

Regards,

James Bottomley

Re: [PATCH v2] scsi: 3w-sas: validate request_id reported by controller

Posted by Shipei Qu 1 month, 3 weeks ago

Hi James,

> Realistically the same rationale goes for us as well.  Absent the
> observation in the field of a problem device, we usually trust the
> hardware, so unless you can find an actual device that exhibits the
> problem this isn't really a fix for anything.
>
> If the security@ list is happy that the existing trust model for
> Thunderbolt/PCIe would prevent the attachment of malicious devices,
> then I think we're also happy to take their word for it.  Even if
> thunderbolt were a security problem, the fix would likely be in the
> trust model not in all possible drivers.

Thanks for the feedback. We understand and respect the current trust
model (drivers assume trusted hardware), so no objections if you prefer
not to take the patch under that model.

For context: some confidential-computing / virtualized deployments
include a hostile or compromised VMM or passthrough device in the threat
model, so bounds checks can reduce crash surface when the device isn't
fully trusted. But if that's outside upstream scope here, we're fine
either way. Thanks for the clarification.

Thanks,
Shipei

Re: [PATCH v2] scsi: 3w-sas: validate request_id reported by controller

Posted by James Bottomley 1 month, 3 weeks ago

On Tue, 2025-12-16 at 14:43 +0800, Shipei Qu wrote:
[...]
> For context: some confidential-computing / virtualized deployments
> include a hostile or compromised VMM or passthrough device in the
> threat model, so bounds checks can reduce crash surface when the
> device isn't fully trusted. But if that's outside upstream scope
> here, we're fine either way. Thanks for the clarification.

We already had a massive fight over hardening device drivers for
confidential computing:

https://lore.kernel.org/all/20230327141816.2648615-1-carlos.bilbao@amd.com/

The summary is that if you can't trust the hardware or its emulation,
policing it is too huge a burden to place on ordinary drivers.  So
while there's no objection to specifically hardened drivers for
confidential computing, generally we don't add hardening to real
hardware drivers because it tanks performance.

Regards,

James