block/block-backend.c | 22 +++++++++++----------- hw/scsi/scsi-bus.c | 30 +++++++++++++++++++++--------- 2 files changed, 32 insertions(+), 20 deletions(-)
Hi, Without the AioContext lock, a BB's context may kind of change at any time (unless it has a root node, and I/O requests are pending). That also means that its own context (BlockBackend.ctx) and that of its root node can differ sometimes (while the context is being changed). blk_get_aio_context() doesn't know this yet and asserts that both are always equal (if there is a root node). Because it's no longer true, and because callers don't seem to really care about the root node's context, we can and should remove the assertion and just return the BB's context. Beyond that, the question is whether the callers of blk_get_aio_context() are OK with the context potentially changing concurrently. Honestly, it isn't entirely clear to me; most look OK, except for the virtio-scsi code, which operates under the general assumption that the BB's context is always equal to that of the virtio-scsi device. I doubt that this assumption always holds (it is definitely not obvious to me that it would), but then again, this series will not make matters worse in that regard, and that is what counts for me now. One clear point of contention is scsi_device_for_each_req_async(), which is addressed by patch 2. Right now, it schedules a BH in the BB context, then the BH double-checks whether the context still fits, and if not, re-schedules itself. Because virtio-scsi's context is fixed, this seems to indicate to me that it wants to be able to deal with a case where BB and virtio-scsi context differ, which seems to break that aforementioned general virtio-scsi assumption. Unfortunately, I definitely have to touch that code, because accepting concurrent changes of AioContexts breaks the double-check (just because the BB has the right context in that place does not mean it stays in that context); instead, we must prevent any concurrent change until the BH is done. Because changing contexts generally involves a drained section, we can prevent it by keeping the BB in-flight counter elevated. Question is, how to reason for that. I’d really rather not include the need to follow the BB context in my argument, because I find that part a bit fishy. Luckily, there’s a second, completely different reason for having scsi_device_for_each_req_async() increment the in-flight counter: Specifically, scsi_device_purge_requests() probably wants to await full completion of scsi_device_for_each_req_async(), and we can do that most easily in the very same way by incrementing the in-flight counter. This way, the blk_drain() in scsi_device_purge_requests() will not only await all (cancelled) I/O requests, but also the non-I/O requests. The fact that this prevents the BB AioContext from changing while the BH is scheduled/running then is just a nice side effect. Hanna Czenczek (2): block-backend: Allow concurrent context changes scsi: Await request purging block/block-backend.c | 22 +++++++++++----------- hw/scsi/scsi-bus.c | 30 +++++++++++++++++++++--------- 2 files changed, 32 insertions(+), 20 deletions(-) -- 2.43.0
02.02.2024 17:47, Hanna Czenczek : > Hi, > > Without the AioContext lock, a BB's context may kind of change at any > time (unless it has a root node, and I/O requests are pending). That > also means that its own context (BlockBackend.ctx) and that of its root > node can differ sometimes (while the context is being changed). How relevant this is for -stable (8.2 at least) which does not have "scsi: eliminate AioContext lock" patchset, and in particular,: v8.2.0-124-geaad0fe260 "scsi: only access SCSIDevice->requests from one thread"? The issue first patch "block-backend: Allow concurrent context changes" fixes (RHEL-19381) seems to be for 8.1.something, so it exists in 8.2 too, and this particular fix applies to 8.2. But with other changes around all this, I'm a bit lost as of what should be done on stable. Not even thinking about 7.2 here :) Thanks, /mjt
On 09.02.24 15:08, Michael Tokarev wrote: > 02.02.2024 17:47, Hanna Czenczek : >> Hi, >> >> Without the AioContext lock, a BB's context may kind of change at any >> time (unless it has a root node, and I/O requests are pending). That >> also means that its own context (BlockBackend.ctx) and that of its root >> node can differ sometimes (while the context is being changed). > > How relevant this is for -stable (8.2 at least) which does not have > "scsi: eliminate AioContext lock" patchset, and in particular,: > v8.2.0-124-geaad0fe260 "scsi: only access SCSIDevice->requests from > one thread"? > > The issue first patch "block-backend: Allow concurrent context changes" > fixes (RHEL-19381) seems to be for 8.1.something, so it exists in 8.2 > too, and this particular fix applies to 8.2. > > But with other changes around all this, I'm a bit lost as of what should > be done on stable. Not even thinking about 7.2 here :) Ah, sorry, yes. Since we do still have the AioContext lock, this series won’t be necessary in -stable. Sorry for the noise! Hanna
09.02.2024 19:51, Hanna Czenczek : > On 09.02.24 15:08, Michael Tokarev wrote: >> 02.02.2024 17:47, Hanna Czenczek : >>> Hi, >>> >>> Without the AioContext lock, a BB's context may kind of change at any >>> time (unless it has a root node, and I/O requests are pending). That >>> also means that its own context (BlockBackend.ctx) and that of its root >>> node can differ sometimes (while the context is being changed). >> >> How relevant this is for -stable (8.2 at least) which does not have >> "scsi: eliminate AioContext lock" patchset, and in particular,: >> v8.2.0-124-geaad0fe260 "scsi: only access SCSIDevice->requests from >> one thread"? >> >> The issue first patch "block-backend: Allow concurrent context changes" >> fixes (RHEL-19381) seems to be for 8.1.something, so it exists in 8.2 >> too, and this particular fix applies to 8.2. >> >> But with other changes around all this, I'm a bit lost as of what should >> be done on stable. Not even thinking about 7.2 here :) > > Ah, sorry, yes. Since we do still have the AioContext lock, this series won’t be necessary in -stable. Sorry for the noise! Hm. Now I'm confused even more.. :) ad89367202 "block-backend: Allow concurrent context changes" - the first one in this series - apparently is needed, as it fixes an issue reported for qemu 8.1 (https://issues.redhat.com/browse/RHEL-19381). Or is it not the case? FWIW, truth is born in the noise, not in silence ;) Thanks, /mjt
On 10.02.24 09:46, Michael Tokarev wrote: > 09.02.2024 19:51, Hanna Czenczek : >> On 09.02.24 15:08, Michael Tokarev wrote: >>> 02.02.2024 17:47, Hanna Czenczek : >>>> Hi, >>>> >>>> Without the AioContext lock, a BB's context may kind of change at any >>>> time (unless it has a root node, and I/O requests are pending). That >>>> also means that its own context (BlockBackend.ctx) and that of its >>>> root >>>> node can differ sometimes (while the context is being changed). >>> >>> How relevant this is for -stable (8.2 at least) which does not have >>> "scsi: eliminate AioContext lock" patchset, and in particular,: >>> v8.2.0-124-geaad0fe260 "scsi: only access SCSIDevice->requests from >>> one thread"? >>> >>> The issue first patch "block-backend: Allow concurrent context changes" >>> fixes (RHEL-19381) seems to be for 8.1.something, so it exists in 8.2 >>> too, and this particular fix applies to 8.2. >>> >>> But with other changes around all this, I'm a bit lost as of what >>> should >>> be done on stable. Not even thinking about 7.2 here :) >> >> Ah, sorry, yes. Since we do still have the AioContext lock, this >> series won’t be necessary in -stable. Sorry for the noise! > > Hm. Now I'm confused even more.. :) > > ad89367202 "block-backend: Allow concurrent context changes" - the first > one in this series - apparently is needed, as it fixes an issue reported > for qemu 8.1 (https://issues.redhat.com/browse/RHEL-19381). Or is it not > the case? Ah, yes, I got confused there. There are two (unfortunately? fortunately? Red-Hat-internal) comments, one of which describes the crash that’s fixed here, so I thought that bug described this crash. But the actual description in the report describes something different (more like what’s fixed by https://lists.nongnu.org/archive/html/qemu-devel/2024-01/msg03649.html, but I’m not entirely sure yet). So basically I got the bug link wrong. We now have https://issues.redhat.com/browse/RHEL-24593, which has been reported only against 8.2. Hanna > FWIW, truth is born in the noise, not in silence ;) > > Thanks, > > /mjt >
Am 02.02.2024 um 15:47 hat Hanna Czenczek geschrieben: > Hi, > > Without the AioContext lock, a BB's context may kind of change at any > time (unless it has a root node, and I/O requests are pending). That > also means that its own context (BlockBackend.ctx) and that of its root > node can differ sometimes (while the context is being changed). > > blk_get_aio_context() doesn't know this yet and asserts that both are > always equal (if there is a root node). Because it's no longer true, > and because callers don't seem to really care about the root node's > context, we can and should remove the assertion and just return the BB's > context. > > Beyond that, the question is whether the callers of > blk_get_aio_context() are OK with the context potentially changing > concurrently. Honestly, it isn't entirely clear to me; most look OK, > except for the virtio-scsi code, which operates under the general > assumption that the BB's context is always equal to that of the > virtio-scsi device. I doubt that this assumption always holds (it is > definitely not obvious to me that it would), but then again, this series > will not make matters worse in that regard, and that is what counts for > me now. > > One clear point of contention is scsi_device_for_each_req_async(), which > is addressed by patch 2. Right now, it schedules a BH in the BB > context, then the BH double-checks whether the context still fits, and > if not, re-schedules itself. Because virtio-scsi's context is fixed, > this seems to indicate to me that it wants to be able to deal with a > case where BB and virtio-scsi context differ, which seems to break that > aforementioned general virtio-scsi assumption. > > Unfortunately, I definitely have to touch that code, because accepting > concurrent changes of AioContexts breaks the double-check (just because > the BB has the right context in that place does not mean it stays in > that context); instead, we must prevent any concurrent change until the > BH is done. Because changing contexts generally involves a drained > section, we can prevent it by keeping the BB in-flight counter elevated. > > Question is, how to reason for that. I’d really rather not include the > need to follow the BB context in my argument, because I find that part a > bit fishy. > > Luckily, there’s a second, completely different reason for having > scsi_device_for_each_req_async() increment the in-flight counter: > Specifically, scsi_device_purge_requests() probably wants to await full > completion of scsi_device_for_each_req_async(), and we can do that most > easily in the very same way by incrementing the in-flight counter. This > way, the blk_drain() in scsi_device_purge_requests() will not only await > all (cancelled) I/O requests, but also the non-I/O requests. > > The fact that this prevents the BB AioContext from changing while the BH > is scheduled/running then is just a nice side effect. > > > Hanna Czenczek (2): > block-backend: Allow concurrent context changes > scsi: Await request purging > > block/block-backend.c | 22 +++++++++++----------- > hw/scsi/scsi-bus.c | 30 +++++++++++++++++++++--------- > 2 files changed, 32 insertions(+), 20 deletions(-) Thanks, applied to the block branch. Kevin
On Fri, Feb 02, 2024 at 03:47:53PM +0100, Hanna Czenczek wrote: > Hi, > > Without the AioContext lock, a BB's context may kind of change at any > time (unless it has a root node, and I/O requests are pending). That > also means that its own context (BlockBackend.ctx) and that of its root > node can differ sometimes (while the context is being changed). > > blk_get_aio_context() doesn't know this yet and asserts that both are > always equal (if there is a root node). Because it's no longer true, > and because callers don't seem to really care about the root node's > context, we can and should remove the assertion and just return the BB's > context. > > Beyond that, the question is whether the callers of > blk_get_aio_context() are OK with the context potentially changing > concurrently. Honestly, it isn't entirely clear to me; most look OK, > except for the virtio-scsi code, which operates under the general > assumption that the BB's context is always equal to that of the > virtio-scsi device. I doubt that this assumption always holds (it is > definitely not obvious to me that it would), but then again, this series > will not make matters worse in that regard, and that is what counts for > me now. > > One clear point of contention is scsi_device_for_each_req_async(), which > is addressed by patch 2. Right now, it schedules a BH in the BB > context, then the BH double-checks whether the context still fits, and > if not, re-schedules itself. Because virtio-scsi's context is fixed, > this seems to indicate to me that it wants to be able to deal with a > case where BB and virtio-scsi context differ, which seems to break that > aforementioned general virtio-scsi assumption. I don't agree with the last sentence: virtio-scsi's context isn't fixed. The AioContext changes when dataplane is started/stopped. virtio-scsi switches AioContext between the IOThread's AioContext and the main loop's qemu_aio_context. However, virtio-scsi virtqueue processing only happens in the IOThread's AioContext. Maybe this is what you meant when you said the AioContext is fixed? The BH function is aware that the current AioContext might not be the same as the AioContext at the time the BH was scheduled. That doesn't break assumptions in the code. (It may be possible to rewrite virtio-blk, virtio-scsi, and core VirtIODevice ioeventfd code to use the simpler model where the AioContext really is fixed because things have changed significantly over the years, but I looked a few weeks ago and it's difficult work.) I'm just pointing out that I think this description is incomplete. I *do* agree with what this patch series is doing :). > Unfortunately, I definitely have to touch that code, because accepting > concurrent changes of AioContexts breaks the double-check (just because > the BB has the right context in that place does not mean it stays in > that context); instead, we must prevent any concurrent change until the > BH is done. Because changing contexts generally involves a drained > section, we can prevent it by keeping the BB in-flight counter elevated. > > Question is, how to reason for that. I’d really rather not include the > need to follow the BB context in my argument, because I find that part a > bit fishy. > > Luckily, there’s a second, completely different reason for having > scsi_device_for_each_req_async() increment the in-flight counter: > Specifically, scsi_device_purge_requests() probably wants to await full > completion of scsi_device_for_each_req_async(), and we can do that most > easily in the very same way by incrementing the in-flight counter. This > way, the blk_drain() in scsi_device_purge_requests() will not only await > all (cancelled) I/O requests, but also the non-I/O requests. > > The fact that this prevents the BB AioContext from changing while the BH > is scheduled/running then is just a nice side effect. > > > Hanna Czenczek (2): > block-backend: Allow concurrent context changes > scsi: Await request purging > > block/block-backend.c | 22 +++++++++++----------- > hw/scsi/scsi-bus.c | 30 +++++++++++++++++++++--------- > 2 files changed, 32 insertions(+), 20 deletions(-) > > -- > 2.43.0 >
On 06.02.24 17:53, Stefan Hajnoczi wrote: > On Fri, Feb 02, 2024 at 03:47:53PM +0100, Hanna Czenczek wrote: >> Hi, >> >> Without the AioContext lock, a BB's context may kind of change at any >> time (unless it has a root node, and I/O requests are pending). That >> also means that its own context (BlockBackend.ctx) and that of its root >> node can differ sometimes (while the context is being changed). >> >> blk_get_aio_context() doesn't know this yet and asserts that both are >> always equal (if there is a root node). Because it's no longer true, >> and because callers don't seem to really care about the root node's >> context, we can and should remove the assertion and just return the BB's >> context. >> >> Beyond that, the question is whether the callers of >> blk_get_aio_context() are OK with the context potentially changing >> concurrently. Honestly, it isn't entirely clear to me; most look OK, >> except for the virtio-scsi code, which operates under the general >> assumption that the BB's context is always equal to that of the >> virtio-scsi device. I doubt that this assumption always holds (it is >> definitely not obvious to me that it would), but then again, this series >> will not make matters worse in that regard, and that is what counts for >> me now. >> >> One clear point of contention is scsi_device_for_each_req_async(), which >> is addressed by patch 2. Right now, it schedules a BH in the BB >> context, then the BH double-checks whether the context still fits, and >> if not, re-schedules itself. Because virtio-scsi's context is fixed, >> this seems to indicate to me that it wants to be able to deal with a >> case where BB and virtio-scsi context differ, which seems to break that >> aforementioned general virtio-scsi assumption. > I don't agree with the last sentence: virtio-scsi's context isn't fixed. > > The AioContext changes when dataplane is started/stopped. virtio-scsi > switches AioContext between the IOThread's AioContext and the main > loop's qemu_aio_context. > > However, virtio-scsi virtqueue processing only happens in the IOThread's > AioContext. Maybe this is what you meant when you said the AioContext is > fixed? Specifically, I meant VirtIOSCSI.ctx, which is set only once in virtio_scsi_dataplane_setup(). That’s at least where the virtqueue notifiers are registered, so yes, virtqueue processing should at least be fixed to that context. It seems like it’s always possible some things are processed in the main thread (not just setup/teardown, but also e.g. TMF_LOGICAL_UNIT_RESET), so to me it seems like virtio-scsi kind of runs in two contexts simultaneously. Yes, when virtqueue processing is paused, all processing VirtIOSCSI.ctx is stopped, but I wouldn’t say it switches contexts there. It just stops processing some requests. Either way, virtio-scsi request processing doesn’t stop just because a scsi-hd device is hot-plugged or -unplugged. If the BB changes contexts in the hot-unplug path (while vq request processing is continuing in the I/O thread), its context will differ from that of virtio-scsi. So should I just replace the “the context is fixed” and say that in this specific instance, virtio-scsi vq processing continues in the I/O thread? > The BH function is aware that the current AioContext might not be the > same as the AioContext at the time the BH was scheduled. That doesn't > break assumptions in the code. > > (It may be possible to rewrite virtio-blk, virtio-scsi, and core > VirtIODevice ioeventfd code to use the simpler model where the > AioContext really is fixed because things have changed significantly > over the years, but I looked a few weeks ago and it's difficult work.) > > I'm just pointing out that I think this description is incomplete. I > *do* agree with what this patch series is doing :). Well, this description won’t land in any commit log, so from my side, I’m not too worried about its correctness. O:) Hanna >> Unfortunately, I definitely have to touch that code, because accepting >> concurrent changes of AioContexts breaks the double-check (just because >> the BB has the right context in that place does not mean it stays in >> that context); instead, we must prevent any concurrent change until the >> BH is done. Because changing contexts generally involves a drained >> section, we can prevent it by keeping the BB in-flight counter elevated. >> >> Question is, how to reason for that. I’d really rather not include the >> need to follow the BB context in my argument, because I find that part a >> bit fishy. >> >> Luckily, there’s a second, completely different reason for having >> scsi_device_for_each_req_async() increment the in-flight counter: >> Specifically, scsi_device_purge_requests() probably wants to await full >> completion of scsi_device_for_each_req_async(), and we can do that most >> easily in the very same way by incrementing the in-flight counter. This >> way, the blk_drain() in scsi_device_purge_requests() will not only await >> all (cancelled) I/O requests, but also the non-I/O requests. >> >> The fact that this prevents the BB AioContext from changing while the BH >> is scheduled/running then is just a nice side effect. >> >> >> Hanna Czenczek (2): >> block-backend: Allow concurrent context changes >> scsi: Await request purging >> >> block/block-backend.c | 22 +++++++++++----------- >> hw/scsi/scsi-bus.c | 30 +++++++++++++++++++++--------- >> 2 files changed, 32 insertions(+), 20 deletions(-) >> >> -- >> 2.43.0 >>
On Wed, 7 Feb 2024 at 04:36, Hanna Czenczek <hreitz@redhat.com> wrote: > > On 06.02.24 17:53, Stefan Hajnoczi wrote: > > On Fri, Feb 02, 2024 at 03:47:53PM +0100, Hanna Czenczek wrote: > > Hi, > > Without the AioContext lock, a BB's context may kind of change at any > time (unless it has a root node, and I/O requests are pending). That > also means that its own context (BlockBackend.ctx) and that of its root > node can differ sometimes (while the context is being changed). > > blk_get_aio_context() doesn't know this yet and asserts that both are > always equal (if there is a root node). Because it's no longer true, > and because callers don't seem to really care about the root node's > context, we can and should remove the assertion and just return the BB's > context. > > Beyond that, the question is whether the callers of > blk_get_aio_context() are OK with the context potentially changing > concurrently. Honestly, it isn't entirely clear to me; most look OK, > except for the virtio-scsi code, which operates under the general > assumption that the BB's context is always equal to that of the > virtio-scsi device. I doubt that this assumption always holds (it is > definitely not obvious to me that it would), but then again, this series > will not make matters worse in that regard, and that is what counts for > me now. > > One clear point of contention is scsi_device_for_each_req_async(), which > is addressed by patch 2. Right now, it schedules a BH in the BB > context, then the BH double-checks whether the context still fits, and > if not, re-schedules itself. Because virtio-scsi's context is fixed, > this seems to indicate to me that it wants to be able to deal with a > case where BB and virtio-scsi context differ, which seems to break that > aforementioned general virtio-scsi assumption. > > I don't agree with the last sentence: virtio-scsi's context isn't fixed. > > The AioContext changes when dataplane is started/stopped. virtio-scsi > switches AioContext between the IOThread's AioContext and the main > loop's qemu_aio_context. > > However, virtio-scsi virtqueue processing only happens in the IOThread's > AioContext. Maybe this is what you meant when you said the AioContext is > fixed? > > > Specifically, I meant VirtIOSCSI.ctx, which is set only once in virtio_scsi_dataplane_setup(). That’s at least where the virtqueue notifiers are registered, so yes, virtqueue processing should at least be fixed to that context. It seems like it’s always possible some things are processed in the main thread (not just setup/teardown, but also e.g. TMF_LOGICAL_UNIT_RESET), so to me it seems like virtio-scsi kind of runs in two contexts simultaneously. Yes, when virtqueue processing is paused, all processing VirtIOSCSI.ctx is stopped, but I wouldn’t say it switches contexts there. It just stops processing some requests. > > Either way, virtio-scsi request processing doesn’t stop just because a scsi-hd device is hot-plugged or -unplugged. If the BB changes contexts in the hot-unplug path (while vq request processing is continuing in the I/O thread), its context will differ from that of virtio-scsi. > > So should I just replace the “the context is fixed” and say that in this specific instance, virtio-scsi vq processing continues in the I/O thread? > > The BH function is aware that the current AioContext might not be the > same as the AioContext at the time the BH was scheduled. That doesn't > break assumptions in the code. > > (It may be possible to rewrite virtio-blk, virtio-scsi, and core > VirtIODevice ioeventfd code to use the simpler model where the > AioContext really is fixed because things have changed significantly > over the years, but I looked a few weeks ago and it's difficult work.) > > I'm just pointing out that I think this description is incomplete. I > *do* agree with what this patch series is doing :). > > > Well, this description won’t land in any commit log, so from my side, I’m not too worried about its correctness. O:) Okay, I think we're in agreement. What you described in your reply matches how I understand the code. No need to resend anything. Stefan
© 2016 - 2024 Red Hat, Inc.