[PATCH 0/2] linux-aio: fix unbalanced plugged counter in laio_io_unplug()

Stefan Hajnoczi posted 2 patches 1 year, 11 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20220609164712.1539045-1-stefanha@redhat.com
Maintainers: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>
block/linux-aio.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
[PATCH 0/2] linux-aio: fix unbalanced plugged counter in laio_io_unplug()
Posted by Stefan Hajnoczi 1 year, 11 months ago
An unlucky I/O pattern can result in stalled Linux AIO requests when the
plugged counter becomes unbalanced. See Patch 1 for details.

Patch 2 adds a comment to explain why the laio_io_unplug() even checks max
batch in the first place.

Stefan Hajnoczi (2):
  linux-aio: fix unbalanced plugged counter in laio_io_unplug()
  linux-aio: explain why max batch is checked in laio_io_unplug()

 block/linux-aio.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

-- 
2.36.1
Re: [PATCH 0/2] linux-aio: fix unbalanced plugged counter in laio_io_unplug()
Posted by Stefan Hajnoczi 1 year, 11 months ago
On Thu, Jun 09, 2022 at 05:47:10PM +0100, Stefan Hajnoczi wrote:
> An unlucky I/O pattern can result in stalled Linux AIO requests when the
> plugged counter becomes unbalanced. See Patch 1 for details.
> 
> Patch 2 adds a comment to explain why the laio_io_unplug() even checks max
> batch in the first place.
> 
> Stefan Hajnoczi (2):
>   linux-aio: fix unbalanced plugged counter in laio_io_unplug()
>   linux-aio: explain why max batch is checked in laio_io_unplug()
> 
>  block/linux-aio.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> -- 
> 2.36.1
> 

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan
Re: [PATCH 0/2] linux-aio: fix unbalanced plugged counter in laio_io_unplug()
Posted by Mark Mielke 1 year, 11 months ago
Thank you for finding this and fixing it. This issue has been giving us
grief for months, and this patch appears to resolve the problem.

In our case, it seemed to have much greater severity with the RHEL / CentOS
7.x Linux 3.10 kernel when tied to SolidFire iSCSI based storage. This
caused it to escape notice in our original soak period, and is likely a
contributor to why others didn't encounter the problem. However, I believe
this looks like a serious problem that could affect any guest machine that
does a large amount of I/O. I believe the SolidFire connection may be that
the I/O can queue up more easily than the local NVMe storage we also use,
and there could be something related to the SolidFire QoS re-balancing
where the iSCSI connection may be re-negotiated from time to time. So, I
think this is more like "happens in some environments more than others",
and unfortunately it happened a lot in one of our environments. :-(


On Tue, Jun 14, 2022 at 12:36 PM Stefan Hajnoczi <stefanha@redhat.com>
wrote:

> On Thu, Jun 09, 2022 at 05:47:10PM +0100, Stefan Hajnoczi wrote:
> > An unlucky I/O pattern can result in stalled Linux AIO requests when the
> > plugged counter becomes unbalanced. See Patch 1 for details.
> >
> > Patch 2 adds a comment to explain why the laio_io_unplug() even checks
> max
> > batch in the first place.
> >
> > Stefan Hajnoczi (2):
> >   linux-aio: fix unbalanced plugged counter in laio_io_unplug()
> >   linux-aio: explain why max batch is checked in laio_io_unplug()
> >
> >  block/linux-aio.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > --
> > 2.36.1
> >
>
> Thanks, applied to my block tree:
> https://gitlab.com/stefanha/qemu/commits/block
>
> Stefan
>


-- 
Mark Mielke <mark.mielke@gmail.com>