Throttle groups consist of members sharing one throttling state
(including bps/iops limits). Round-robin scheduling is used to ensure
fairness. If a group member already has a timer pending then other
groups members do not schedule their own timers. The next group member
will have its turn when the existing timer expires.
A hang may occur when a group member leaves while it had a timer
scheduled. Although the code carefully removes the group member from
the round-robin list, it does not schedule the next member. Therefore
remaining members continue to wait for the removed member's timer to
expire.
This patch schedules the next request if a timer is pending.
Unfortunately the actual bug is a race condition that I've been unable
to capture in a test case.
Sometimes drive2 hangs when drive1 is removed from the throttling group:
$ qemu ... -drive if=none,id=drive1,cache=none,format=qcow2,file=data1.qcow2,iops=100,group=foo \
-device virtio-blk-pci,id=virtio-blk-pci0,drive=drive1 \
-drive if=none,id=drive2,cache=none,format=qcow2,file=data2.qcow2,iops=10,group=foo \
-device virtio-blk-pci,id=virtio-blk-pci1,drive=drive2
(guest-console1)# fio -filename /dev/vda 4k-seq-read.job
(guest-console2)# fio -filename /dev/vdb 4k-seq-read.job
(qmp) {"execute": "block_set_io_throttle", "arguments": {"device": "drive1","bps": 0,"bps_rd": 0,"bps_wr": 0,"iops": 0,"iops_rd": 0,"iops_wr": 0}}
Reported-by: Nini Gu <ngu@redhat.com>
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1535914
Cc: Alberto Garcia <berto@igalia.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
block/throttle-groups.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 36cc0430c3..e297b04e17 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -564,6 +564,10 @@ void throttle_group_unregister_tgm(ThrottleGroupMember *tgm)
qemu_mutex_lock(&tg->lock);
for (i = 0; i < 2; i++) {
+ if (timer_pending(tgm->throttle_timers.timers[i])) {
+ tg->any_timer_armed[i] = false;
+ schedule_next_request(tgm, i);
+ }
if (tg->tokens[i] == tgm) {
token = throttle_group_next_tgm(tgm);
/* Take care of the case where this is the last tgm in the group */
--
2.17.1
On Wed, Jul 04, 2018 at 03:54:10PM +0100, Stefan Hajnoczi wrote:
Sorry you weren't CCed originally, Berto. This one is for you! :)
> Throttle groups consist of members sharing one throttling state
> (including bps/iops limits). Round-robin scheduling is used to ensure
> fairness. If a group member already has a timer pending then other
> groups members do not schedule their own timers. The next group member
> will have its turn when the existing timer expires.
>
> A hang may occur when a group member leaves while it had a timer
> scheduled. Although the code carefully removes the group member from
> the round-robin list, it does not schedule the next member. Therefore
> remaining members continue to wait for the removed member's timer to
> expire.
>
> This patch schedules the next request if a timer is pending.
> Unfortunately the actual bug is a race condition that I've been unable
> to capture in a test case.
>
> Sometimes drive2 hangs when drive1 is removed from the throttling group:
>
> $ qemu ... -drive if=none,id=drive1,cache=none,format=qcow2,file=data1.qcow2,iops=100,group=foo \
> -device virtio-blk-pci,id=virtio-blk-pci0,drive=drive1 \
> -drive if=none,id=drive2,cache=none,format=qcow2,file=data2.qcow2,iops=10,group=foo \
> -device virtio-blk-pci,id=virtio-blk-pci1,drive=drive2
> (guest-console1)# fio -filename /dev/vda 4k-seq-read.job
> (guest-console2)# fio -filename /dev/vdb 4k-seq-read.job
> (qmp) {"execute": "block_set_io_throttle", "arguments": {"device": "drive1","bps": 0,"bps_rd": 0,"bps_wr": 0,"iops": 0,"iops_rd": 0,"iops_wr": 0}}
>
> Reported-by: Nini Gu <ngu@redhat.com>
> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1535914
> Cc: Alberto Garcia <berto@igalia.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> block/throttle-groups.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/block/throttle-groups.c b/block/throttle-groups.c
> index 36cc0430c3..e297b04e17 100644
> --- a/block/throttle-groups.c
> +++ b/block/throttle-groups.c
> @@ -564,6 +564,10 @@ void throttle_group_unregister_tgm(ThrottleGroupMember *tgm)
>
> qemu_mutex_lock(&tg->lock);
> for (i = 0; i < 2; i++) {
> + if (timer_pending(tgm->throttle_timers.timers[i])) {
> + tg->any_timer_armed[i] = false;
> + schedule_next_request(tgm, i);
> + }
> if (tg->tokens[i] == tgm) {
> token = throttle_group_next_tgm(tgm);
> /* Take care of the case where this is the last tgm in the group */
> --
> 2.17.1
>
>
On Wed, Jul 04, 2018 at 03:54:10PM +0100, Stefan Hajnoczi wrote:
> Throttle groups consist of members sharing one throttling state
> (including bps/iops limits). Round-robin scheduling is used to ensure
> fairness. If a group member already has a timer pending then other
> groups members do not schedule their own timers. The next group member
> will have its turn when the existing timer expires.
>
> A hang may occur when a group member leaves while it had a timer
> scheduled. Although the code carefully removes the group member from
> the round-robin list, it does not schedule the next member. Therefore
> remaining members continue to wait for the removed member's timer to
> expire.
>
> This patch schedules the next request if a timer is pending.
> Unfortunately the actual bug is a race condition that I've been unable
> to capture in a test case.
>
> Sometimes drive2 hangs when drive1 is removed from the throttling group:
>
> $ qemu ... -drive if=none,id=drive1,cache=none,format=qcow2,file=data1.qcow2,iops=100,group=foo \
> -device virtio-blk-pci,id=virtio-blk-pci0,drive=drive1 \
> -drive if=none,id=drive2,cache=none,format=qcow2,file=data2.qcow2,iops=10,group=foo \
> -device virtio-blk-pci,id=virtio-blk-pci1,drive=drive2
> (guest-console1)# fio -filename /dev/vda 4k-seq-read.job
> (guest-console2)# fio -filename /dev/vdb 4k-seq-read.job
> (qmp) {"execute": "block_set_io_throttle", "arguments": {"device": "drive1","bps": 0,"bps_rd": 0,"bps_wr": 0,"iops": 0,"iops_rd": 0,"iops_wr": 0}}
Hi Stefan,
I realize you want to preserve the long lines to not break the JSON QMP
command. But, FWIW, you might want to format it using one of the
convenient websites: https://jsonformatter.org/
So your QMP command nicely wraps (for the 'cost' of 11 extra lines):
{
"execute": "block_set_io_throttle",
"arguments": {
"device": "drive1",
"bps": 0,
"bps_rd": 0,
"bps_wr": 0,
"iops": 0,
"iops_rd": 0,
"iops_wr": 0
}
}
[...]
--
/kashyap
On Wed, Jul 04, 2018 at 03:54:10PM +0100, Stefan Hajnoczi wrote:
> Throttle groups consist of members sharing one throttling state
> (including bps/iops limits). Round-robin scheduling is used to ensure
> fairness. If a group member already has a timer pending then other
> groups members do not schedule their own timers. The next group member
> will have its turn when the existing timer expires.
>
> A hang may occur when a group member leaves while it had a timer
> scheduled. Although the code carefully removes the group member from
> the round-robin list, it does not schedule the next member. Therefore
> remaining members continue to wait for the removed member's timer to
> expire.
>
> This patch schedules the next request if a timer is pending.
> Unfortunately the actual bug is a race condition that I've been unable
> to capture in a test case.
>
> Sometimes drive2 hangs when drive1 is removed from the throttling group:
>
> $ qemu ... -drive if=none,id=drive1,cache=none,format=qcow2,file=data1.qcow2,iops=100,group=foo \
> -device virtio-blk-pci,id=virtio-blk-pci0,drive=drive1 \
> -drive if=none,id=drive2,cache=none,format=qcow2,file=data2.qcow2,iops=10,group=foo \
> -device virtio-blk-pci,id=virtio-blk-pci1,drive=drive2
> (guest-console1)# fio -filename /dev/vda 4k-seq-read.job
> (guest-console2)# fio -filename /dev/vdb 4k-seq-read.job
> (qmp) {"execute": "block_set_io_throttle", "arguments": {"device": "drive1","bps": 0,"bps_rd": 0,"bps_wr": 0,"iops": 0,"iops_rd": 0,"iops_wr": 0}}
>
> Reported-by: Nini Gu <ngu@redhat.com>
> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1535914
> Cc: Alberto Garcia <berto@igalia.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> block/throttle-groups.c | 4 ++++
> 1 file changed, 4 insertions(+)
Berto is away in July. I am merging this fix for QEMU 3.0. If there
are any comments when Berto is back I'll send a follow-up patch.
Applied to my block tree:
https://github.com/stefanha/qemu/commits/block
Stefan
On Wed 04 Jul 2018 04:54:10 PM CEST, Stefan Hajnoczi wrote: > Throttle groups consist of members sharing one throttling state > (including bps/iops limits). Round-robin scheduling is used to ensure > fairness. If a group member already has a timer pending then other > groups members do not schedule their own timers. The next group > member will have its turn when the existing timer expires. > > A hang may occur when a group member leaves while it had a timer > scheduled. I haven't been able to reproduce this. When a member is removed from the group the pending request queue must already be empty, so does this mean that there's still a timer when the queue is already empty? Berto
On Wed 04 Jul 2018 04:54:10 PM CEST, Stefan Hajnoczi wrote: > Throttle groups consist of members sharing one throttling state > (including bps/iops limits). Round-robin scheduling is used to ensure > fairness. If a group member already has a timer pending then other > groups members do not schedule their own timers. The next group > member will have its turn when the existing timer expires. > > A hang may occur when a group member leaves while it had a timer > scheduled. Ok, I can reproduce this if I run fio with iodepth=1. We're draining the BDS before removing it from a throttle group, and therefore there cannot be any pending requests. So the problem seems to be that when throttle_co_drain_begin() runs the pending requests from a member using throttle_group_co_restart_queue(), it simply uses qemu_co_queue_next() and doesn't touch the timer at all. So it can happen that there's a request in the queue waiting for a timer, and after that call the request is gone but the timer remains. The current patch is perhaps not worth touching at this point (we're about to release QEMU 3.0), but I think that a better solution would be to either a) cancel the existing timer and reset tg->any_timer_armed on the given tgm after throttle_group_co_restart_queue() and before schedule_next_request() if the queue is empty. b) force the existing timer to run immediately instead of calling throttle_group_co_restart_queue(). Seems cleaner, but I haven't tried this one yet. I'll explore them a bit and send a patch. Berto
On Tue 31 Jul 2018 06:47:53 PM CEST, Alberto Garcia wrote: > On Wed 04 Jul 2018 04:54:10 PM CEST, Stefan Hajnoczi wrote: >> Throttle groups consist of members sharing one throttling state >> (including bps/iops limits). Round-robin scheduling is used to ensure >> fairness. If a group member already has a timer pending then other >> groups members do not schedule their own timers. The next group >> member will have its turn when the existing timer expires. >> >> A hang may occur when a group member leaves while it had a timer >> scheduled. > > Ok, I can reproduce this if I run fio with iodepth=1. I managed to write a test case for this, but unfortunately it seems that this patch is not enough and it's still possible to hang QEMU 3.0.0-rc2. I expect to have a fix for tomorrow. Berto
© 2016 - 2026 Red Hat, Inc.