drivers/scsi/3w-sas.c | 9 +++++++++ 1 file changed, 9 insertions(+)
Hi,
resending with a correctly generated diff (the previous email had a malformed
patch header). The content is unchanged: while testing 3w-sas with an emulated
3ware 9750-compatible controller we noticed that the driver trusts the
request_id field reported by the controller in the completion path and uses it
as an index into several TW_Q_LENGTH-sized arrays without validating the range.
A bogus value can lead to out-of-bounds accesses and crashes.
For example, a malicious PCIe/Thunderbolt-attached device that emulates a
3ware controller can return crafted completions and drive the driver out of
bounds in this way. We have a small QEMU-based PoC that reproduces the problem
and can share it if that would be helpful.
This driver is enabled by default in common distributions (e.g. Ubuntu), so a
misbehaving or emulated controller can trigger this on a stock installation.
The same pattern is present in current mainline kernels.
This issue was first reported via security@kernel.org. The kernel security team
replied that, under the usual upstream threat model (only trusted
PCIe/Thunderbolt devices are allowed to bind to such drivers), it should be
treated as a normal robustness bug rather than a security issue, and asked us
to send fixes to the relevant development lists. This email follows that
guidance.
The fix is to reject out-of-range request_id values before indexing the
per-request arrays in the interrupt handler.
Reported-by: DARKNAVY (@DarkNavyOrg) <vr@darknavy.com>
Signed-off-by: Shipei Qu <qu@darknavy.com>
---
drivers/scsi/3w-sas.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/scsi/3w-sas.c b/drivers/scsi/3w-sas.c
index e319be7d3..4d1c31c04 100644
--- a/drivers/scsi/3w-sas.c
+++ b/drivers/scsi/3w-sas.c
@@ -1184,6 +1184,15 @@ static irqreturn_t twl_interrupt(int irq, void *dev_instance)
} else
request_id = TW_RESID_OUT(response);
+ if (request_id >= TW_Q_LENGTH) {
+ TW_PRINTK(tw_dev->host, TW_DRIVER, 0x10,
+ "Received out-of-range request id %u",
+ request_id);
+ TWL_MASK_INTERRUPTS(tw_dev);
+ /* let the reset / error handling path deal with it */
+ goto twl_interrupt_bail;
+ }
+
full_command_packet = tw_dev->command_packet_virt[request_id];
/* Check for correct state */
--
2.45.1
Hi Shipei,
kernel test robot noticed the following build errors:
[auto build test ERROR on jejb-scsi/for-next]
[also build test ERROR on mkp-scsi/for-next linus/master v6.19-rc1 next-20251219]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Shipei-Qu/scsi-3w-sas-validate-request_id-reported-by-controller/20251216-140928
base: https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
patch link: https://lore.kernel.org/r/20251216060156.41320-1-qu%40darknavy.com
patch subject: [PATCH v2] scsi: 3w-sas: validate request_id reported by controller
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20251220/202512202253.30DG2Afm-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251220/202512202253.30DG2Afm-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512202253.30DG2Afm-lkp@intel.com/
All errors (new ones prefixed by >>):
>> drivers/scsi/3w-sas.c:1190:7: error: too many arguments provided to function-like macro invocation
1190 | request_id);
| ^
drivers/scsi/3w-sas.h:207:9: note: macro 'TW_PRINTK' defined here
207 | #define TW_PRINTK(h,a,b,c) { \
| ^
>> drivers/scsi/3w-sas.c:1188:4: error: use of undeclared identifier 'TW_PRINTK'
1188 | TW_PRINTK(tw_dev->host, TW_DRIVER, 0x10,
| ^
drivers/scsi/3w-sas.c:1579:49: warning: shift count >= width of type [-Wshift-count-overflow]
1579 | retval = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
| ^~~~~~~~~~~~~~~~
include/linux/dma-mapping.h:93:54: note: expanded from macro 'DMA_BIT_MASK'
93 | #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
| ^ ~~~
drivers/scsi/3w-sas.c:1796:49: warning: shift count >= width of type [-Wshift-count-overflow]
1796 | retval = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
| ^~~~~~~~~~~~~~~~
include/linux/dma-mapping.h:93:54: note: expanded from macro 'DMA_BIT_MASK'
93 | #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
| ^ ~~~
2 warnings and 2 errors generated.
vim +1190 drivers/scsi/3w-sas.c
1117
1118 /* Interrupt service routine */
1119 static irqreturn_t twl_interrupt(int irq, void *dev_instance)
1120 {
1121 TW_Device_Extension *tw_dev = (TW_Device_Extension *)dev_instance;
1122 int i, handled = 0, error = 0;
1123 dma_addr_t mfa = 0;
1124 u32 reg, regl, regh, response, request_id = 0;
1125 struct scsi_cmnd *cmd;
1126 TW_Command_Full *full_command_packet;
1127
1128 spin_lock(tw_dev->host->host_lock);
1129
1130 /* Read host interrupt status */
1131 reg = readl(TWL_HISTAT_REG_ADDR(tw_dev));
1132
1133 /* Check if this is our interrupt, otherwise bail */
1134 if (!(reg & TWL_HISTATUS_VALID_INTERRUPT))
1135 goto twl_interrupt_bail;
1136
1137 handled = 1;
1138
1139 /* If we are resetting, bail */
1140 if (test_bit(TW_IN_RESET, &tw_dev->flags))
1141 goto twl_interrupt_bail;
1142
1143 /* Attention interrupt */
1144 if (reg & TWL_HISTATUS_ATTENTION_INTERRUPT) {
1145 if (twl_handle_attention_interrupt(tw_dev)) {
1146 TWL_MASK_INTERRUPTS(tw_dev);
1147 goto twl_interrupt_bail;
1148 }
1149 }
1150
1151 /* Response interrupt */
1152 while (reg & TWL_HISTATUS_RESPONSE_INTERRUPT) {
1153 if (sizeof(dma_addr_t) > 4) {
1154 regh = readl(TWL_HOBQPH_REG_ADDR(tw_dev));
1155 regl = readl(TWL_HOBQPL_REG_ADDR(tw_dev));
1156 mfa = ((u64)regh << 32) | regl;
1157 } else
1158 mfa = readl(TWL_HOBQPL_REG_ADDR(tw_dev));
1159
1160 error = 0;
1161 response = (u32)mfa;
1162
1163 /* Check for command packet error */
1164 if (!TW_NOTMFA_OUT(response)) {
1165 for (i=0;i<TW_Q_LENGTH;i++) {
1166 if (tw_dev->sense_buffer_phys[i] == mfa) {
1167 request_id = le16_to_cpu(tw_dev->sense_buffer_virt[i]->header_desc.request_id);
1168 if (tw_dev->srb[request_id] != NULL)
1169 error = twl_fill_sense(tw_dev, i, request_id, 1, 1);
1170 else {
1171 /* Skip ioctl error prints */
1172 if (request_id != tw_dev->chrdev_request_id)
1173 error = twl_fill_sense(tw_dev, i, request_id, 0, 1);
1174 else
1175 memcpy(tw_dev->command_packet_virt[request_id], tw_dev->sense_buffer_virt[i], sizeof(TW_Command_Apache_Header));
1176 }
1177
1178 /* Now re-post the sense buffer */
1179 writel((u32)((u64)tw_dev->sense_buffer_phys[i] >> 32), TWL_HOBQPH_REG_ADDR(tw_dev));
1180 writel((u32)tw_dev->sense_buffer_phys[i], TWL_HOBQPL_REG_ADDR(tw_dev));
1181 break;
1182 }
1183 }
1184 } else
1185 request_id = TW_RESID_OUT(response);
1186
1187 if (request_id >= TW_Q_LENGTH) {
> 1188 TW_PRINTK(tw_dev->host, TW_DRIVER, 0x10,
1189 "Received out-of-range request id %u",
> 1190 request_id);
1191 TWL_MASK_INTERRUPTS(tw_dev);
1192 /* let the reset / error handling path deal with it */
1193 goto twl_interrupt_bail;
1194 }
1195
1196 full_command_packet = tw_dev->command_packet_virt[request_id];
1197
1198 /* Check for correct state */
1199 if (tw_dev->state[request_id] != TW_S_POSTED) {
1200 if (tw_dev->srb[request_id] != NULL) {
1201 TW_PRINTK(tw_dev->host, TW_DRIVER, 0xe, "Received a request id that wasn't posted");
1202 TWL_MASK_INTERRUPTS(tw_dev);
1203 goto twl_interrupt_bail;
1204 }
1205 }
1206
1207 /* Check for internal command completion */
1208 if (tw_dev->srb[request_id] == NULL) {
1209 if (request_id != tw_dev->chrdev_request_id) {
1210 if (twl_aen_complete(tw_dev, request_id))
1211 TW_PRINTK(tw_dev->host, TW_DRIVER, 0xf, "Error completing AEN during attention interrupt");
1212 } else {
1213 tw_dev->chrdev_request_id = TW_IOCTL_CHRDEV_FREE;
1214 wake_up(&tw_dev->ioctl_wqueue);
1215 }
1216 } else {
1217 cmd = tw_dev->srb[request_id];
1218
1219 if (!error)
1220 cmd->result = (DID_OK << 16);
1221
1222 /* Report residual bytes for single sgl */
1223 if ((scsi_sg_count(cmd) <= 1) && (full_command_packet->command.newcommand.status == 0)) {
1224 if (full_command_packet->command.newcommand.sg_list[0].length < scsi_bufflen(tw_dev->srb[request_id]))
1225 scsi_set_resid(cmd, scsi_bufflen(cmd) - full_command_packet->command.newcommand.sg_list[0].length);
1226 }
1227
1228 /* Now complete the io */
1229 scsi_dma_unmap(cmd);
1230 scsi_done(cmd);
1231 tw_dev->state[request_id] = TW_S_COMPLETED;
1232 twl_free_request_id(tw_dev, request_id);
1233 tw_dev->posted_request_count--;
1234 }
1235
1236 /* Check for another response interrupt */
1237 reg = readl(TWL_HISTAT_REG_ADDR(tw_dev));
1238 }
1239
1240 twl_interrupt_bail:
1241 spin_unlock(tw_dev->host->host_lock);
1242 return IRQ_RETVAL(handled);
1243 } /* End twl_interrupt() */
1244
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On Tue, 2025-12-16 at 14:01 +0800, Shipei Qu wrote: > This issue was first reported via security@kernel.org. The kernel > security team replied that, under the usual upstream threat model > (only trusted PCIe/Thunderbolt devices are allowed to bind to such > drivers), it should be treated as a normal robustness bug rather than > a security issue, and asked us to send fixes to the relevant > development lists. This email follows that guidance. Realistically the same rationale goes for us as well. Absent the observation in the field of a problem device, we usually trust the hardware, so unless you can find an actual device that exhibits the problem this isn't really a fix for anything. If the security@ list is happy that the existing trust model for Thunderbolt/PCIe would prevent the attachment of malicious devices, then I think we're also happy to take their word for it. Even if thunderbolt were a security problem, the fix would likely be in the trust model not in all possible drivers. Regards, James Bottomley
Hi James, > Realistically the same rationale goes for us as well. Absent the > observation in the field of a problem device, we usually trust the > hardware, so unless you can find an actual device that exhibits the > problem this isn't really a fix for anything. > > If the security@ list is happy that the existing trust model for > Thunderbolt/PCIe would prevent the attachment of malicious devices, > then I think we're also happy to take their word for it. Even if > thunderbolt were a security problem, the fix would likely be in the > trust model not in all possible drivers. Thanks for the feedback. We understand and respect the current trust model (drivers assume trusted hardware), so no objections if you prefer not to take the patch under that model. For context: some confidential-computing / virtualized deployments include a hostile or compromised VMM or passthrough device in the threat model, so bounds checks can reduce crash surface when the device isn't fully trusted. But if that's outside upstream scope here, we're fine either way. Thanks for the clarification. Thanks, Shipei
On Tue, 2025-12-16 at 14:43 +0800, Shipei Qu wrote: [...] > For context: some confidential-computing / virtualized deployments > include a hostile or compromised VMM or passthrough device in the > threat model, so bounds checks can reduce crash surface when the > device isn't fully trusted. But if that's outside upstream scope > here, we're fine either way. Thanks for the clarification. We already had a massive fight over hardening device drivers for confidential computing: https://lore.kernel.org/all/20230327141816.2648615-1-carlos.bilbao@amd.com/ The summary is that if you can't trust the hardware or its emulation, policing it is too huge a burden to place on ordinary drivers. So while there's no objection to specifically hardened drivers for confidential computing, generally we don't add hardening to real hardware drivers because it tanks performance. Regards, James
© 2016 - 2026 Red Hat, Inc.