drivers/vhost/scsi.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
Since commit 3f8ca2e115e55 ("vhost scsi: alloc cmds per vq instead of session"), a bug can be triggered when the host sends a duplicate VHOST_SCSI_SET_ENDPOINT ioctl command.
In vhost_scsi_set_endpoint(), if the new `vhost_wwpn` matches the old tpg's tport_name but the tpg is still held by current vhost_scsi(i.e. it is busy), the active `tpg` will be unreferenced. Subsequently, if the owner releases vhost_scsi, the assertion `BUG_ON(sd->s_dependent_count < 1)` will be triggerred, terminating the target_undepend_item() procedure and leaving `configfs_dirent_lock` locked. If user enters configfs afterward, the CPU will become locked up.
This issue occurs because vhost_scsi_set_endpoint() allocates a new `vs_tpg` to hold the tpg array and copies all the old tpg entries into it before proceeding. When the new target is busy, the controw flow falls back to the `undepend` label, cause ing all the target `tpg` entries to be unreferenced, including the old one, which is still in use.
The backtrace is:
[ 60.085044] kernel BUG at fs/configfs/dir.c:1179!
[ 60.087729] RIP: 0010:configfs_undepend_item+0x76/0x80
[ 60.094735] Call Trace:
[ 60.094926] <TASK>
[ 60.098232] target_undepend_item+0x1a/0x30
[ 60.098745] vhost_scsi_clear_endpoint+0x363/0x3e0
[ 60.099342] vhost_scsi_release+0xea/0x1a0
[ 60.099860] ? __pfx_vhost_scsi_release+0x10/0x10
[ 60.100459] ? __pfx_locks_remove_file+0x10/0x10
[ 60.101025] ? __pfx_task_work_add+0x10/0x10
[ 60.101565] ? evm_file_release+0xc8/0xe0
[ 60.102074] ? __pfx_vhost_scsi_release+0x10/0x10
[ 60.102661] __fput+0x222/0x5a0
[ 60.102925] ____fput+0x1e/0x30
[ 60.103187] task_work_run+0x133/0x1c0
[ 60.103479] ? __pfx_task_work_run+0x10/0x10
[ 60.103813] ? pick_next_task_fair+0xe1/0x6f0
[ 60.104179] syscall_exit_to_user_mode+0x235/0x240
[ 60.104542] do_syscall_64+0x8a/0x170
[ 60.113301] </TASK>
[ 60.113931] ---[ end trace 0000000000000000 ]---
[ 60.121517] note: poc[2363] exited with preempt_count 1
To fix this issue, the controw flow should be redirected to the `free_vs_tpg` label to ensure proper cleanup.
Fixes: 3f8ca2e115e55 ("vhost scsi: alloc cmds per vq instead of session")
Signed-off-by: Haoran Zhang <wh1sper@zju.edu.cn>
---
drivers/vhost/scsi.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 718fa4e0b31e..b994138837f2 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1726,7 +1726,7 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
mutex_unlock(&tpg->tv_tpg_mutex);
mutex_unlock(&vhost_scsi_mutex);
ret = -EEXIST;
- goto undepend;
+ goto free_vs_tpg;
}
/*
* In order to ensure individual vhost-scsi configfs
@@ -1802,6 +1802,7 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
target_undepend_item(&tpg->se_tpg.tpg_group.cg_item);
}
}
+free_vs_tpg:
kfree(vs_tpg);
out:
mutex_unlock(&vs->dev.mutex);
--
2.43.0
On 1/10/25 9:34 PM, Haoran Zhang wrote:
> Since commit 3f8ca2e115e55 ("vhost scsi: alloc cmds per vq instead of session"), a bug can be triggered when the host sends a duplicate VHOST_SCSI_SET_ENDPOINT ioctl command.
I don't think that git commit is correct. It should be:
25b98b64e284 ("vhost scsi: alloc cmds per vq instead of session")
> diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> index 718fa4e0b31e..b994138837f2 100644
> --- a/drivers/vhost/scsi.c
> +++ b/drivers/vhost/scsi.c
> @@ -1726,7 +1726,7 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
> mutex_unlock(&tpg->tv_tpg_mutex);
> mutex_unlock(&vhost_scsi_mutex);
> ret = -EEXIST;
> - goto undepend;
> + goto free_vs_tpg;
I agree you found a bug, but I'm not sure how you hit it from here. A
couple lines before this, we do:
if (tpg->tv_tpg_vhost_count != 0) {
mutex_unlock(&tpg->tv_tpg_mutex);
continue;
}
If the tpg was already in the vs_tpg, then tv_tpg_vhost_count would be
non-zero and we would hit the check above.
Could you describe the target and tpg mapping for how you hit this?
> }
> /*
> * In order to ensure individual vhost-scsi configfs
However, I was able to replicate the bug by hitting the chunk below this
comment where we do:
ret = target_depend_item(&se_tpg->tpg_group.cg_item);
if (ret) {
pr_warn("target_depend_item() failed: %d\n", ret);
mutex_unlock(&tpg->tv_tpg_mutex);
mutex_unlock(&vhost_scsi_mutex);
goto undepend;
> @@ -1802,6 +1802,7 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
> target_undepend_item(&tpg->se_tpg.tpg_group.cg_item);
> }
> }
> +free_vs_tpg:
> kfree(vs_tpg);
To fix the bug, I don't think we can just free the vs_tpg. There's 2
cases we can hit the error path:
1. First time calling vhost_scsi_set_endpoint.
If we have a target with 2 tpgs, and for tpg1 we did a successful
target_depend_item call, but then we looped and for tpg2
target_depend_item failed, if we just did a kfree then we would leave a
refcount on tpg1.
So for this case, we need to do the "goto undepend".
2. N > 1 time calling vhost_scsi_set_endpoint.
This one is more complicated because let's say we started with 1 tpg and
on the first call to vhost_scsi_set_endpoint we successfully did
target_depend_item on it. Before the 2nd call to vhost_scsi_set_endpoint
we added tpg2 and tpg3. We then do vhost_scsi_set_endpoint for the 2nd
time, we successfully do target_depend_item on tpg2, but it fails for tpg3.
In this case, we want to unwind what we did on this 2nd call, so we want
to do target_undepend_item on tpg2. And, we don't want to call it for
tpg1 or we will hit the bug you found.
So I think to fix the issue, we would want to:
1. move the
memcpy(vs_tpg, vs->vs_tpg, len);
to the end of the function after we do the vhost_scsi_flush. This will
be more complicated than the current memcpy though. We will want to
merge the local vs_tpg and the vs->vs_tpg like:
for (i = 0; i < VHOST_SCSI_MAX_TARGET; i++) {
if (vs_tpg[i])
vs->vs_tpg[i] = vs_tpg[i])
}
2. We want to leave the "goto undepend" calls as is. For the the
undepend goto handling we also want to leave the code as is. We want to
continue to loop over the local vs_tpg because after we moved the memcpy
for #1 it now only contains the tpgs we updated on the current
vhost_scsi_set_endpoint call.
On 2025-01-13 01:35:20, Michael Christie wrote:
>
> On 1/10/25 9:34 PM, Haoran Zhang wrote:
> > Since commit 3f8ca2e115e55 ("vhost scsi: alloc cmds per vq instead of session"), a bug can be triggered when the host sends a duplicate VHOST_SCSI_SET_ENDPOINT ioctl command.
>
> I don't think that git commit is correct. It should be:
>
> 25b98b64e284 ("vhost scsi: alloc cmds per vq instead of session")
>
Yes, the correct commit ID is 25b98b64e284. My apologize for the oversight.
>
> > diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
> > index 718fa4e0b31e..b994138837f2 100644
> > --- a/drivers/vhost/scsi.c
> > +++ b/drivers/vhost/scsi.c
> > @@ -1726,7 +1726,7 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
> > mutex_unlock(&tpg->tv_tpg_mutex);
> > mutex_unlock(&vhost_scsi_mutex);
> > ret = -EEXIST;
> > - goto undepend;
> > + goto free_vs_tpg;
>
> I agree you found a bug, but I'm not sure how you hit it from here. A
> couple lines before this, we do:
>
> if (tpg->tv_tpg_vhost_count != 0) {
> mutex_unlock(&tpg->tv_tpg_mutex);
> continue;
> }
>
> If the tpg was already in the vs_tpg, then tv_tpg_vhost_count would be
> non-zero and we would hit the check above.
>
Thanks for point that out, I wasn't aware of it before. However, I don't believe I identified the wrong root cause. I tested my patch again, which only affects this specific path, it does resolve the issue. That said, maybe the triggering path somehow bypasses the aforementioned check, and my patch might not be sufficient.
> Could you describe the target and tpg mapping for how you hit this?
>
After reevaluating the PoC, I realized that my initial claim was incorrect. The target WWN in the second vhost_scsi_set_endpoint() call is not the same as in the first one. Below is my targetcli status:
o- vhost ......................................... [Targets: 3]
| o- naa.500140501e23be28 ......................... [TPGs: 1]
| | o- tpg1 ............. [naa.50014058f7da10b7, no-gen-acls]
| | o- acls ..................................... [ACLs: 0]
| | o- luns ..................................... [LUNs: 0]
| o- naa.500140562c8936fa ......................... [TPGs: 2]
| | o- tpg1 ............. [naa.50014058d133f962, no-gen-acls]
| | | o- acls ..................................... [ACLs: 0]
| | | o- luns ..................................... [LUNs: 3]
| | | o- lun0 ... [block/disk0 (/dev/disk/...) (default_tg_pt_gp)]
| | | o- lun1 .... [fileio/vhost-fileio (/root/fileio-vhost) (default_tg_pt_gp)]
| | | o- lun2 ............. [ramdisk/rd (default_tg_pt_gp)]
| | o- tpg2 ............. [naa.50014055c6fb4182, no-gen-acls]
| | o- acls ..................................... [ACLs: 0]
| | o- luns ..................................... [LUNs: 0]
The bug occurs when `naa.500140562c8936fa` has already been set as an endpoint, and I send a VHOST_SCSI_SET_ENDPOINT ioctl command with `naa.500140501e23be28`. The ioctl returns -1 EEXIST (File exists), and the kernel logs a BUG message in dmesg.
>
> > }
> > /*
> > * In order to ensure individual vhost-scsi configfs
>
>
> However, I was able to replicate the bug by hitting the chunk below this
> comment where we do:
>
> ret = target_depend_item(&se_tpg->tpg_group.cg_item);
> if (ret) {
> pr_warn("target_depend_item() failed: %d\n", ret);
I suspected this part of code to be the issue, but I did not observe the WARN message when the bug occurred.
> mutex_unlock(&tpg->tv_tpg_mutex);
> mutex_unlock(&vhost_scsi_mutex);
> goto undepend;
>
>
> > @@ -1802,6 +1802,7 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
> > target_undepend_item(&tpg->se_tpg.tpg_group.cg_item);
> > }
> > }
> > +free_vs_tpg:
> > kfree(vs_tpg);
>
> To fix the bug, I don't think we can just free the vs_tpg. There's 2
> cases we can hit the error path:
>
> 1. First time calling vhost_scsi_set_endpoint.
>
> If we have a target with 2 tpgs, and for tpg1 we did a successful
> target_depend_item call, but then we looped and for tpg2
> target_depend_item failed, if we just did a kfree then we would leave a
> refcount on tpg1.
>
> So for this case, we need to do the "goto undepend".
>
> 2. N > 1 time calling vhost_scsi_set_endpoint.
>
> This one is more complicated because let's say we started with 1 tpg and
> on the first call to vhost_scsi_set_endpoint we successfully did
> target_depend_item on it. Before the 2nd call to vhost_scsi_set_endpoint
> we added tpg2 and tpg3. We then do vhost_scsi_set_endpoint for the 2nd
> time, we successfully do target_depend_item on tpg2, but it fails for tpg3.
>
> In this case, we want to unwind what we did on this 2nd call, so we want
> to do target_undepend_item on tpg2. And, we don't want to call it for
> tpg1 or we will hit the bug you found.
>
> So I think to fix the issue, we would want to:
>
> 1. move the
>
> memcpy(vs_tpg, vs->vs_tpg, len);
>
> to the end of the function after we do the vhost_scsi_flush. This will
> be more complicated than the current memcpy though. We will want to
> merge the local vs_tpg and the vs->vs_tpg like:
>
> for (i = 0; i < VHOST_SCSI_MAX_TARGET; i++) {
> if (vs_tpg[i])
> vs->vs_tpg[i] = vs_tpg[i])
> }
>
> 2. We want to leave the "goto undepend" calls as is. For the the
> undepend goto handling we also want to leave the code as is. We want to
> continue to loop over the local vs_tpg because after we moved the memcpy
> for #1 it now only contains the tpgs we updated on the current
> vhost_scsi_set_endpoint call.
I think so.
On 1/14/25 1:40 AM, 张浩然 wrote: > After reevaluating the PoC, I realized that my initial claim was incorrect. The target WWN in the second vhost_scsi_set_endpoint() call is not the same as in the first one. Below is my targetcli status: > > o- vhost ......................................... [Targets: 3] > | o- naa.500140501e23be28 ......................... [TPGs: 1] > | | o- tpg1 ............. [naa.50014058f7da10b7, no-gen-acls] > | | o- acls ..................................... [ACLs: 0] > | | o- luns ..................................... [LUNs: 0] > | o- naa.500140562c8936fa ......................... [TPGs: 2] > | | o- tpg1 ............. [naa.50014058d133f962, no-gen-acls] > | | | o- acls ..................................... [ACLs: 0] > | | | o- luns ..................................... [LUNs: 3] > | | | o- lun0 ... [block/disk0 (/dev/disk/...) (default_tg_pt_gp)] > | | | o- lun1 .... [fileio/vhost-fileio (/root/fileio-vhost) (default_tg_pt_gp)] > | | | o- lun2 ............. [ramdisk/rd (default_tg_pt_gp)] > | | o- tpg2 ............. [naa.50014055c6fb4182, no-gen-acls] > | | o- acls ..................................... [ACLs: 0] > | | o- luns ..................................... [LUNs: 0] > > The bug occurs when `naa.500140562c8936fa` has already been set as an endpoint, and I send a VHOST_SCSI_SET_ENDPOINT ioctl command with `naa.500140501e23be28`. The ioctl returns -1 EEXIST (File exists), and the kernel logs a BUG message in dmesg. I see now and can replicate it. I think there is a 2nd bug in vhost_scsi_set_endpoint related to all this where we need to prevent switching targets like this or else we'll leak some other refcounts. If 500140501e23be28's tpg number was 3 then we would overwrite the existing vs->vs_vhost_wwpn and never be able to release the refounts on the tpgs from 500140562c8936fa. I'll send a patchset to fix everything and cc you. Thanks for all the work you did testing and debugging this issue.
> I see now and can replicate it. I think there is a 2nd bug in
> vhost_scsi_set_endpoint related to all this where we need to
> prevent switching targets like this or else we'll leak some
> other refcounts. If 500140501e23be28's tpg number was 3 then
> we would overwrite the existing vs->vs_vhost_wwpn and never
> be able to release the refounts on the tpgs from 500140562c8936fa.
>
> I'll send a patchset to fix everything and cc you.
>
> Thanks for all the work you did testing and debugging this
> issue.
You are welcome. There is another bug I was about to report, but I'm not
sure whether I should create a new thread. I feel that the original design
of dynamically allocating new vs_tpgs in vhost_scsi_set_endpoint is not
intuitive, and copying TPGs before setting the target doesn't seem
logical. Since you are already refactoring the code, maybe I should post
it here so we can address these issues in one go.
[PATCH] vhost/scsi: Fix dangling pointer in vhost_scsi_set_endpoint()
Since commit 4f7f46d32c98 ("tcm_vhost: Use vq->private_data to indicate
if the endpoint is setup"), a dangling pointer issue has been introduced
in vhost_scsi_set_endpoint() when the host fails to reconfigure the
vhost-scsi endpoint. Specifically, this causes a UAF fault in
vhost_scsi_get_req() when the guest attempts to send an SCSI request.
In vhost_scsi_set_endpoint(), vhost_scsi->vs_tpg holds the same pointer
as vq->private_data. The code allocates a new vs_tpg array to hold the
TPGs and updates vq->private_data if a match is found between
vhost_scsi_list's tport_name and the target WWPN. However, the code
ignored the case where `match == 0` (i.e. the target endpoint does not
match any TPGs in vhost_scsi_list). In this scenario, it directly frees
the old vs_tpg and updates vhost_scsi->vs_tpg without modifying
vq->private_data, leaving all vq's backend pointer dangling. As a result,
subsequent requests from the guest will trigger a UAF fault on vs_tpg in
vhost_scsi_get_req(). Below is the KASAN report:
[ 68.606821] BUG: KASAN: slab-use-after-free in vhost_scsi_get_req+0xef/0x1f0
[ 68.607671] Read of size 8 at addr ffff8880087a1008 by task vhost-1440/1460
[ 68.608570]
[ 68.612070] Call Trace:
[ 68.612429] <TASK>
[ 68.612739] dump_stack_lvl+0x9e/0xd0
[ 68.613206] print_report+0xd1/0x670
[ 68.613711] ? __virt_addr_valid+0x54/0x250
[ 68.614232] ? kasan_complete_mode_report_info+0x6a/0x200
[ 68.614879] kasan_report+0xd6/0x120
[ 68.615329] ? vhost_scsi_get_req+0xef/0x1f0
[ 68.615869] ? vhost_scsi_get_req+0xef/0x1f0
[ 68.616406] __asan_load8+0x8b/0xe0
[ 68.616854] vhost_scsi_get_req+0xef/0x1f0
[ 68.617362] vhost_scsi_handle_vq+0x30f/0x15e0
[ 68.622248] ...
[ 68.622868] vhost_scsi_handle_kick+0x39/0x50
[ 68.623409] vhost_run_work_list+0xd9/0x120
[ 68.623939] ? __pfx_vhost_run_work_list+0x10/0x10
[ 68.624521] vhost_task_fn+0xf8/0x240
[ 68.629164] ...
[ 68.629705] ret_from_fork+0x5d/0x80
[ 68.630155] ? __pfx_vhost_task_fn+0x10/0x10
[ 68.630693] ret_from_fork_asm+0x1a/0x30
[ 68.631179] RIP: 0033:0x0
[ 68.631516] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[ 68.637405] </TASK>
[ 68.637670]
[ 68.637798] Allocated by task 1440:
[ 68.638124] kasan_save_stack+0x39/0x70
[ 68.638607] kasan_save_track+0x14/0x40
[ 68.638919] kasan_save_alloc_info+0x37/0x60
[ 68.639331] __kasan_kmalloc+0xc3/0xd0
[ 68.639625] __kmalloc_cache_noprof+0x186/0x3a0
[ 68.639971] vhost_scsi_ioctl+0x630/0xec0
[ 68.640392] __x64_sys_ioctl+0x126/0x160
[ 68.640755] x64_sys_call+0x11ad/0x25f0
[ 68.641076] do_syscall_64+0x7e/0x170
[ 68.641437] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 68.641887]
[ 68.642016] Freed by task 1461:
[ 68.642301] kasan_save_stack+0x39/0x70
[ 68.642622] kasan_save_track+0x14/0x40
[ 68.642958] kasan_save_free_info+0x3b/0x60
[ 68.643352] __kasan_slab_free+0x52/0x80
[ 68.643675] kfree+0x129/0x440
[ 68.643973] vhost_scsi_ioctl+0xd74/0xec0
[ 68.644290] __x64_sys_ioctl+0x126/0x160
[ 68.644589] x64_sys_call+0x11ad/0x25f0
[ 68.644939] do_syscall_64+0x7e/0x170
[ 68.645377] entry_SYSCALL_64_after_hwframe+0x76/0x7e
To address this issue, we need to prevent the `free(vs_tpg)` operation
from being triggered by the `match == 0` branch , either by moving the
free operation inside the if-clause or by adding a goto statement in the
else-clause. Here is an alternative patch commit.
Fixes: 4f7f46d32c98 ("tcm_vhost: Use vq->private_data to indicate if the endpoint is setup")
Signed-off-by: Haoran Zhang <wh1sper@zju.edu.cn>
---
drivers/vhost/scsi.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 718fa4e0b31e..1e15eab530d7 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1775,6 +1775,7 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
ret = 0;
} else {
ret = -EEXIST;
+ goto undepend;
}
/*
--
2.43.0
On 1/17/25 5:42 AM, Haoran Zhang wrote:
>> I see now and can replicate it. I think there is a 2nd bug in
>> vhost_scsi_set_endpoint related to all this where we need to
>> prevent switching targets like this or else we'll leak some
>> other refcounts. If 500140501e23be28's tpg number was 3 then
>> we would overwrite the existing vs->vs_vhost_wwpn and never
>> be able to release the refounts on the tpgs from 500140562c8936fa.
>>
>> I'll send a patchset to fix everything and cc you.
>>
>> Thanks for all the work you did testing and debugging this
>> issue.
>
> You are welcome. There is another bug I was about to report, but I'm not
> sure whether I should create a new thread. I feel that the original design
> of dynamically allocating new vs_tpgs in vhost_scsi_set_endpoint is not
> intuitive, and copying TPGs before setting the target doesn't seem
> logical. Since you are already refactoring the code, maybe I should post
> it here so we can address these issues in one go.
Yeah, I'm not sure if being able to call vhost_scsi_set_endpoint multiple
times and pick up new tpgs is actually a feature or not. There's so many
bugs and it also doesn't support tpg removal.
>
> [PATCH] vhost/scsi: Fix dangling pointer in vhost_scsi_set_endpoint()
>
> Since commit 4f7f46d32c98 ("tcm_vhost: Use vq->private_data to indicate
> if the endpoint is setup"), a dangling pointer issue has been introduced
> in vhost_scsi_set_endpoint() when the host fails to reconfigure the
> vhost-scsi endpoint. Specifically, this causes a UAF fault in
> vhost_scsi_get_req() when the guest attempts to send an SCSI request.
>
I saw that while reviewing the code. Here is my patch. I just added a new
goto, because we don't need to do the undepend since we never did any
depend calls.
--------------------
From 0474c5d41968095ea911d48159e4f6a129f1a862 Mon Sep 17 00:00:00 2001
From: Mike Christie <michael.christie@oracle.com>
Date: Wed, 15 Jan 2025 19:05:22 -0600
Subject: [PATCH 1/3] vhost-scsi: Avoid accessing a freed vs_tpg when no tpgs
are found
This fixes a use after free that occurs when vhost_scsi_set_endpoint is
called more than once and calls after the first call do not find any
tpgs to add to the vs_tpg. When vhost_scsi_set_endpoint first finds
tpgs to add to the vs_tpg array match=true, so we will do:
vhost_vq_set_backend(vq, vs_tpg);
...
kfree(vs->vs_tpg);
vs->vs_tpg = vs_tpg;
If vhost_scsi_set_endpoint is called again and no tpgs are found
match=false so we skip the vhost_vq_set_backend call leaving the
pointer to the vs_tpg we then free via:
kfree(vs->vs_tpg);
vs->vs_tpg = vs_tpg;
If a scsi request is then sent we do:
vhost_scsi_handle_vq -> vhost_scsi_get_req -> vhost_vq_get_backend
which sees the vs_tpg we just did a kfree on.
This fixes the issue by having us not reset and free the existing
vs->vs-tpg pointer so the virtqueue private_data stays valid.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
drivers/vhost/scsi.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 718fa4e0b31e..143276df16e2 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1775,6 +1775,7 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
ret = 0;
} else {
ret = -EEXIST;
+ goto free_vs_tpg;
}
/*
@@ -1802,6 +1803,7 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
target_undepend_item(&tpg->se_tpg.tpg_group.cg_item);
}
}
+free_vs_tpg:
kfree(vs_tpg);
out:
mutex_unlock(&vs->dev.mutex);
--
2.43.0
On 2025-01-18 00:50:04, Mike Christie wrote:
> Yeah, I'm not sure if being able to call vhost_scsi_set_endpoint multiple
> times and pick up new tpgs is actually a feature or not. There's so many
> bugs and it also doesn't support tpg removal.
It seems vhost_scsi_clear_endpoint() is attempting to handle this, but it actually undepends all TPGs, ignoring the target, and also introduces the dangling pointer when `match == 0`.
> > [PATCH] vhost/scsi: Fix dangling pointer in vhost_scsi_set_endpoint()
> >
> > Since commit 4f7f46d32c98 ("tcm_vhost: Use vq->private_data to indicate
> > if the endpoint is setup"), a dangling pointer issue has been introduced
> > in vhost_scsi_set_endpoint() when the host fails to reconfigure the
> > vhost-scsi endpoint. Specifically, this causes a UAF fault in
> > vhost_scsi_get_req() when the guest attempts to send an SCSI request.
> >
> I saw that while reviewing the code. Here is my patch. I just added a new
> goto, because we don't need to do the undepend since we never did any
> depend calls.
Yes, there's no need to call undepend_item - just free vs_tpg. My patch was incorrect, thanks for bringing that to my attention.
On 1/17/25 10:50 AM, Mike Christie wrote:
>> You are welcome. There is another bug I was about to report, but I'm not
>> sure whether I should create a new thread. I feel that the original design
>> of dynamically allocating new vs_tpgs in vhost_scsi_set_endpoint is not
>> intuitive, and copying TPGs before setting the target doesn't seem
>> logical. Since you are already refactoring the code, maybe I should post
>> it here so we can address these issues in one go.
> Yeah, I'm not sure if being able to call vhost_scsi_set_endpoint multiple
> times and pick up new tpgs is actually a feature or not.
Oh yeah by this I mean should we just do:
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 718fa4e0b31e..372a7bfda14c 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1699,6 +1699,11 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
}
}
+ if (vs->vs_tpg) {
+ ret = -EEXIST;
+ goto out;
+ }
+
len = sizeof(vs_tpg[0]) * VHOST_SCSI_MAX_TARGET;
vs_tpg = kzalloc(len, GFP_KERNEL);
if (!vs_tpg) {
?
I can't tell if being able to call VHOST_SCSI_SET_ENDPOINT multiple
times without calling VHOST_SCSI_CLEAR_ENDPOINT between calls is an
actual feature that the code was trying to support or that is the
root bug. It's so buggy I feel like it was never meant to be called
like this so we should just add a check at the beginning of the function.
The worry would be that if there are userspace tools doing this
and living with the bugs then the above patch would add a regression.
However, I think that's highly unlikely because of how useless/buggy
it is.
On 2025-01-18 01:11:01, Mike Christie wrote: > I can't tell if being able to call VHOST_SCSI_SET_ENDPOINT multiple > times without calling VHOST_SCSI_CLEAR_ENDPOINT between calls is an > actual feature that the code was trying to support or that is the > root bug. It's so buggy I feel like it was never meant to be called > like this so we should just add a check at the beginning of the function. Sure, proceed as you prefer (Maintaining a 12-year-old codebase seems quite troublesome). My suggestion would be to increase the constant VHOST_SCSI_ABI_VERSION if there are API changes, so that userspace can recognize the new version through the VHOST_SCSI_GET_ABI_VERSION command of ioctl. > The worry would be that if there are userspace tools doing this > and living with the bugs then the above patch would add a regression. > However, I think that's highly unlikely because of how useless/buggy > it is. Agreed. CVE-2024-49863 has shown that no successful SCSI AN requests have been sent from a guest to a vhost-scsi device for years.
On 1/12/25 11:35 AM, michael.christie@oracle.com wrote:
> So I think to fix the issue, we would want to:
>
> 1. move the
>
> memcpy(vs_tpg, vs->vs_tpg, len);
>
> to the end of the function after we do the vhost_scsi_flush. This will
> be more complicated than the current memcpy though. We will want to
> merge the local vs_tpg and the vs->vs_tpg like:
>
> for (i = 0; i < VHOST_SCSI_MAX_TARGET; i++) {
> if (vs_tpg[i])
> vs->vs_tpg[i] = vs_tpg[i])
> }
I think I wrote that in reverse. We would want:
vhost_scsi_flush(vs);
if (vs->vs_tpg) {
for (i = 0; i < VHOST_SCSI_MAX_TARGET; i++) {
if (vs->vs_tpg[i])
vs_tpg[i] = vs->vs_tpg[i])
}
}
kfree(vs->vs_tpg);
vs->vs_tpg = vs_tpg;
or we could just allocate the vs_tpg with the vhost_scsi like:
struct vhost_scsi {
....
struct vhost_scsi_tpg *vs_tpg[VHOST_SCSI_MAX_TARGET];
then when we loop in vhost_scsi_set/clear_endpoint set/clear the
every vs_tpg entry.
On Sun, Jan 12, 2025 at 03:19:44PM -0600, Mike Christie wrote:
> On 1/12/25 11:35 AM, michael.christie@oracle.com wrote:
> > So I think to fix the issue, we would want to:
> >
> > 1. move the
> >
> > memcpy(vs_tpg, vs->vs_tpg, len);
> >
> > to the end of the function after we do the vhost_scsi_flush. This will
> > be more complicated than the current memcpy though. We will want to
> > merge the local vs_tpg and the vs->vs_tpg like:
> >
> > for (i = 0; i < VHOST_SCSI_MAX_TARGET; i++) {
> > if (vs_tpg[i])
> > vs->vs_tpg[i] = vs_tpg[i])
> > }
>
> I think I wrote that in reverse. We would want:
>
> vhost_scsi_flush(vs);
>
> if (vs->vs_tpg) {
> for (i = 0; i < VHOST_SCSI_MAX_TARGET; i++) {
> if (vs->vs_tpg[i])
> vs_tpg[i] = vs->vs_tpg[i])
> }
> }
>
> kfree(vs->vs_tpg);
> vs->vs_tpg = vs_tpg;
>
> or we could just allocate the vs_tpg with the vhost_scsi like:
>
> struct vhost_scsi {
> ....
>
> struct vhost_scsi_tpg *vs_tpg[VHOST_SCSI_MAX_TARGET];
>
> then when we loop in vhost_scsi_set/clear_endpoint set/clear the
> every vs_tpg entry.
Wanna post the patch, Mike?
--
MST
On 1/14/25 5:26 AM, Michael S. Tsirkin wrote: > > Wanna post the patch, Mike? > Yeah, I'll send a patch.
I tested this patch with virtio-net regression tests, everything works fine.
Tested-by: Lei Yang <leiyang@redhat.com>
On Mon, Jan 13, 2025 at 5:20 AM Mike Christie
<michael.christie@oracle.com> wrote:
>
> On 1/12/25 11:35 AM, michael.christie@oracle.com wrote:
> > So I think to fix the issue, we would want to:
> >
> > 1. move the
> >
> > memcpy(vs_tpg, vs->vs_tpg, len);
> >
> > to the end of the function after we do the vhost_scsi_flush. This will
> > be more complicated than the current memcpy though. We will want to
> > merge the local vs_tpg and the vs->vs_tpg like:
> >
> > for (i = 0; i < VHOST_SCSI_MAX_TARGET; i++) {
> > if (vs_tpg[i])
> > vs->vs_tpg[i] = vs_tpg[i])
> > }
>
> I think I wrote that in reverse. We would want:
>
> vhost_scsi_flush(vs);
>
> if (vs->vs_tpg) {
> for (i = 0; i < VHOST_SCSI_MAX_TARGET; i++) {
> if (vs->vs_tpg[i])
> vs_tpg[i] = vs->vs_tpg[i])
> }
> }
>
> kfree(vs->vs_tpg);
> vs->vs_tpg = vs_tpg;
>
> or we could just allocate the vs_tpg with the vhost_scsi like:
>
> struct vhost_scsi {
> ....
>
> struct vhost_scsi_tpg *vs_tpg[VHOST_SCSI_MAX_TARGET];
>
> then when we loop in vhost_scsi_set/clear_endpoint set/clear the
> every vs_tpg entry.
>
Yes, and it also protects the kernel from the PoC, as I've tested.
On 2025-01-14 10:17:50 Lei Yang wrote:
> I tested this patch with virtio-net regression tests, everything works fine.
>
> Tested-by: Lei Yang <leiyang@redhat.com>
>
>
> On Mon, Jan 13, 2025 at 5:20 AM Mike Christie
> <michael.christie@oracle.com> wrote:
> >
> > On 1/12/25 11:35 AM, michael.christie@oracle.com wrote:
> > > So I think to fix the issue, we would want to:
> > >
> > > 1. move the
> > >
> > > memcpy(vs_tpg, vs->vs_tpg, len);
> > >
> > > to the end of the function after we do the vhost_scsi_flush. This will
> > > be more complicated than the current memcpy though. We will want to
> > > merge the local vs_tpg and the vs->vs_tpg like:
> > >
> > > for (i = 0; i < VHOST_SCSI_MAX_TARGET; i++) {
> > > if (vs_tpg[i])
> > > vs->vs_tpg[i] = vs_tpg[i])
> > > }
> >
> > I think I wrote that in reverse. We would want:
> >
> > vhost_scsi_flush(vs);
> >
> > if (vs->vs_tpg) {
> > for (i = 0; i < VHOST_SCSI_MAX_TARGET; i++) {
> > if (vs->vs_tpg[i])
> > vs_tpg[i] = vs->vs_tpg[i])
> > }
> > }
> >
> > kfree(vs->vs_tpg);
> > vs->vs_tpg = vs_tpg;
> >
> > or we could just allocate the vs_tpg with the vhost_scsi like:
> >
> > struct vhost_scsi {
> > ....
> >
> > struct vhost_scsi_tpg *vs_tpg[VHOST_SCSI_MAX_TARGET];
> >
> > then when we loop in vhost_scsi_set/clear_endpoint set/clear the
> > every vs_tpg entry.
> >
Hi Haoran,
On Sat, Jan 11, 2025 at 11:34:18AM +0800, Haoran Zhang wrote:
> Since commit 3f8ca2e115e55 ("vhost scsi: alloc cmds per vq instead of session"), a bug can be triggered when the host sends a duplicate VHOST_SCSI_SET_ENDPOINT ioctl command.
>
> In vhost_scsi_set_endpoint(), if the new `vhost_wwpn` matches the old tpg's tport_name but the tpg is still held by current vhost_scsi(i.e. it is busy), the active `tpg` will be unreferenced. Subsequently, if the owner releases vhost_scsi, the assertion `BUG_ON(sd->s_dependent_count < 1)` will be triggerred, terminating the target_undepend_item() procedure and leaving `configfs_dirent_lock` locked. If user enters configfs afterward, the CPU will become locked up.
> This issue occurs because vhost_scsi_set_endpoint() allocates a new `vs_tpg` to hold the tpg array and copies all the old tpg entries into it before proceeding. When the new target is busy, the controw flow falls back to the `undepend` label, cause ing all the target `tpg` entries to be unreferenced, including the old one, which is still in use.
>
> The backtrace is:
>
> [ 60.085044] kernel BUG at fs/configfs/dir.c:1179!
> [ 60.087729] RIP: 0010:configfs_undepend_item+0x76/0x80
> [ 60.094735] Call Trace:
> [ 60.094926] <TASK>
> [ 60.098232] target_undepend_item+0x1a/0x30
> [ 60.098745] vhost_scsi_clear_endpoint+0x363/0x3e0
> [ 60.099342] vhost_scsi_release+0xea/0x1a0
> [ 60.099860] ? __pfx_vhost_scsi_release+0x10/0x10
> [ 60.100459] ? __pfx_locks_remove_file+0x10/0x10
> [ 60.101025] ? __pfx_task_work_add+0x10/0x10
> [ 60.101565] ? evm_file_release+0xc8/0xe0
> [ 60.102074] ? __pfx_vhost_scsi_release+0x10/0x10
> [ 60.102661] __fput+0x222/0x5a0
> [ 60.102925] ____fput+0x1e/0x30
> [ 60.103187] task_work_run+0x133/0x1c0
> [ 60.103479] ? __pfx_task_work_run+0x10/0x10
> [ 60.103813] ? pick_next_task_fair+0xe1/0x6f0
> [ 60.104179] syscall_exit_to_user_mode+0x235/0x240
> [ 60.104542] do_syscall_64+0x8a/0x170
> [ 60.113301] </TASK>
> [ 60.113931] ---[ end trace 0000000000000000 ]---
> [ 60.121517] note: poc[2363] exited with preempt_count 1
>
> To fix this issue, the controw flow should be redirected to the `free_vs_tpg` label to ensure proper cleanup.
>
> Fixes: 3f8ca2e115e55 ("vhost scsi: alloc cmds per vq instead of session")
> Signed-off-by: Haoran Zhang <wh1sper@zju.edu.cn>
checkpatch.pl generated the following errors and warnings:
WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)
#59:
Since commit 3f8ca2e115e55 ("vhost scsi: alloc cmds per vq instead of session"), a bug can be triggered when the host sends a duplicate VHOST_SCSI_SET_ENDPOINT ioctl command.
ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 3f8ca2e115e5 ("vhost/scsi: Extract common handling code from control queue handler")'
#59:
Since commit 3f8ca2e115e55 ("vhost scsi: alloc cmds per vq instead of session"), a bug can be triggered when the host sends a duplicate VHOST_SCSI_SET_ENDPOINT ioctl command.
WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: 3f8ca2e115e5 ("vhost/scsi: Extract common handling code from control queue handler")'
#91:
Fixes: 3f8ca2e115e55 ("vhost scsi: alloc cmds per vq instead of session")
total: 1 errors, 2 warnings, 15 lines checked
Regards,
Kuan-Wei
Hi Kuan-Wei,
On Sat, Jan 11, 2025 at 13:45:50 +0800, Kuan-Wei Chiu wrote:
> Hi Haoran,
>
> On Sat, Jan 11, 2025 at 11:34:18AM +0800, Haoran Zhang wrote:
> > Since commit 3f8ca2e115e55 ("vhost scsi: alloc cmds per vq instead of session"), a bug can be triggered when the host sends a duplicate VHOST_SCSI_SET_ENDPOINT ioctl command.
> >
> > In vhost_scsi_set_endpoint(), if the new `vhost_wwpn` matches the old tpg's tport_name but the tpg is still held by current vhost_scsi(i.e. it is busy), the active `tpg` will be unreferenced. Subsequently, if the owner releases vhost_scsi, the assertion `BUG_ON(sd->s_dependent_count < 1)` will be triggerred, terminating the target_undepend_item() procedure and leaving `configfs_dirent_lock` locked. If user enters configfs afterward, the CPU will become locked up.
> > This issue occurs because vhost_scsi_set_endpoint() allocates a new `vs_tpg` to hold the tpg array and copies all the old tpg entries into it before proceeding. When the new target is busy, the controw flow falls back to the `undepend` label, cause ing all the target `tpg` entries to be unreferenced, including the old one, which is still in use.
> >
> > The backtrace is:
> >
> > [ 60.085044] kernel BUG at fs/configfs/dir.c:1179!
> > [ 60.087729] RIP: 0010:configfs_undepend_item+0x76/0x80
> > [ 60.094735] Call Trace:
> > [ 60.094926] <TASK>
> > [ 60.098232] target_undepend_item+0x1a/0x30
> > [ 60.098745] vhost_scsi_clear_endpoint+0x363/0x3e0
> > [ 60.099342] vhost_scsi_release+0xea/0x1a0
> > [ 60.099860] ? __pfx_vhost_scsi_release+0x10/0x10
> > [ 60.100459] ? __pfx_locks_remove_file+0x10/0x10
> > [ 60.101025] ? __pfx_task_work_add+0x10/0x10
> > [ 60.101565] ? evm_file_release+0xc8/0xe0
> > [ 60.102074] ? __pfx_vhost_scsi_release+0x10/0x10
> > [ 60.102661] __fput+0x222/0x5a0
> > [ 60.102925] ____fput+0x1e/0x30
> > [ 60.103187] task_work_run+0x133/0x1c0
> > [ 60.103479] ? __pfx_task_work_run+0x10/0x10
> > [ 60.103813] ? pick_next_task_fair+0xe1/0x6f0
> > [ 60.104179] syscall_exit_to_user_mode+0x235/0x240
> > [ 60.104542] do_syscall_64+0x8a/0x170
> > [ 60.113301] </TASK>
> > [ 60.113931] ---[ end trace 0000000000000000 ]---
> > [ 60.121517] note: poc[2363] exited with preempt_count 1
> >
> > To fix this issue, the controw flow should be redirected to the `free_vs_tpg` label to ensure proper cleanup.
> >
> > Fixes: 3f8ca2e115e55 ("vhost scsi: alloc cmds per vq instead of session")
> > Signed-off-by: Haoran Zhang <wh1sper@zju.edu.cn>
>
> checkpatch.pl generated the following errors and warnings:
>
> WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit description?)
> #59:
> Since commit 3f8ca2e115e55 ("vhost scsi: alloc cmds per vq instead of session"), a bug can be triggered when the host sends a duplicate VHOST_SCSI_SET_ENDPOINT ioctl command.
>
> ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 3f8ca2e115e5 ("vhost/scsi: Extract common handling code from control queue handler")'
> #59:
> Since commit 3f8ca2e115e55 ("vhost scsi: alloc cmds per vq instead of session"), a bug can be triggered when the host sends a duplicate VHOST_SCSI_SET_ENDPOINT ioctl command.
>
> WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: 3f8ca2e115e5 ("vhost/scsi: Extract common handling code from control queue handler")'
> #91:
> Fixes: 3f8ca2e115e55 ("vhost scsi: alloc cmds per vq instead of session")
>
> total: 1 errors, 2 warnings, 15 lines checked
>
>
> Regards,
> Kuan-Wei
Thanks for your suggestion, I will send a corrected patch later.
Best regards,
Haoran Zhang
© 2016 - 2026 Red Hat, Inc.