[RFC PATCH 0/4] async: fix hangs on weakly-ordered architectures

Paolo Bonzini posted 4 patches 4 years ago
Test docker-mingw@fedora passed
Test docker-quick@centos7 passed
Test FreeBSD passed
Test checkpatch passed
Test asan passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20200407140746.8041-1-pbonzini@redhat.com
Maintainers: Stefan Weil <sw@weilnetz.de>, Stefan Hajnoczi <stefanha@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Fam Zheng <fam@euphon.net>, Max Reitz <mreitz@redhat.com>, Kevin Wolf <kwolf@redhat.com>
docs/devel/atomics.rst   | 501 +++++++++++++++++++++++++++++++++++++++
docs/devel/atomics.txt   | 403 -------------------------------
docs/devel/index.rst     |   1 +
docs/devel/rcu.txt       |   4 +-
include/block/aio-wait.h |  22 ++
include/block/aio.h      |  29 +--
util/aio-posix.c         |  16 +-
util/aio-win32.c         |  17 +-
util/async.c             |  16 +-
9 files changed, 576 insertions(+), 433 deletions(-)
create mode 100644 docs/devel/atomics.rst
delete mode 100644 docs/devel/atomics.txt
[RFC PATCH 0/4] async: fix hangs on weakly-ordered architectures
Posted by Paolo Bonzini 4 years ago
ARM machines and other weakly-ordered architectures have been suffering
for a long time from hangs in qemu-img and qemu-io.  For QEMU binaries
these are mitigated by the timers that sooner or later fire in the main
loop, but these will not happen for the tools and probably not with I/O
threads either.

The fix is in patch 5.  Patch 1-3 are docs updates that explain the bug,
and patch 4 is a bugfix exposed by the new patch.

Paolo

Paolo Bonzini (5):
  atomics: convert to reStructuredText
  atomics: update documentation
  rcu: do not mention atomic_mb_read/set in documentation
  aio-wait: delegate polling of main AioContext if BQL not held
  async: use explicit memory barriers

 docs/devel/atomics.rst   | 501 +++++++++++++++++++++++++++++++++++++++
 docs/devel/atomics.txt   | 403 -------------------------------
 docs/devel/index.rst     |   1 +
 docs/devel/rcu.txt       |   4 +-
 include/block/aio-wait.h |  22 ++
 include/block/aio.h      |  29 +--
 util/aio-posix.c         |  16 +-
 util/aio-win32.c         |  17 +-
 util/async.c             |  16 +-
 9 files changed, 576 insertions(+), 433 deletions(-)
 create mode 100644 docs/devel/atomics.rst
 delete mode 100644 docs/devel/atomics.txt

-- 
2.18.2


Re: [RFC PATCH 0/4] async: fix hangs on weakly-ordered architectures
Posted by Ying Fang 4 years ago

On 2020/4/7 22:07, Paolo Bonzini wrote:
> ARM machines and other weakly-ordered architectures have been suffering
> for a long time from hangs in qemu-img and qemu-io.  For QEMU binaries
> these are mitigated by the timers that sooner or later fire in the main
> loop, but these will not happen for the tools and probably not with I/O
> threads either.
yes, we occasionally see qemu main thread hangs and VM stuck in in-shutdown
state on aarch64 platform. So this could happen with I/O threads.
> 
> The fix is in patch 5.  Patch 1-3 are docs updates that explain the bug,
> and patch 4 is a bugfix exposed by the new patch.
> 
> Paolo
> 
> Paolo Bonzini (5):
>    atomics: convert to reStructuredText
>    atomics: update documentation
>    rcu: do not mention atomic_mb_read/set in documentation
>    aio-wait: delegate polling of main AioContext if BQL not held
>    async: use explicit memory barriers
> 
>   docs/devel/atomics.rst   | 501 +++++++++++++++++++++++++++++++++++++++
>   docs/devel/atomics.txt   | 403 -------------------------------
>   docs/devel/index.rst     |   1 +
>   docs/devel/rcu.txt       |   4 +-
>   include/block/aio-wait.h |  22 ++
>   include/block/aio.h      |  29 +--
>   util/aio-posix.c         |  16 +-
>   util/aio-win32.c         |  17 +-
>   util/async.c             |  16 +-
>   9 files changed, 576 insertions(+), 433 deletions(-)
>   create mode 100644 docs/devel/atomics.rst
>   delete mode 100644 docs/devel/atomics.txt
> 

Re: [RFC PATCH 0/4] async: fix hangs on weakly-ordered architectures
Posted by Paolo Bonzini 4 years ago
On 08/04/20 11:12, Ying Fang wrote:
> On 2020/4/7 22:07, Paolo Bonzini wrote:
>> ARM machines and other weakly-ordered architectures have been suffering
>> for a long time from hangs in qemu-img and qemu-io.  For QEMU binaries
>> these are mitigated by the timers that sooner or later fire in the main
>> loop, but these will not happen for the tools and probably not with I/O
>> threads either.
>
> yes, we occasionally see qemu main thread hangs and VM stuck in in-shutdown
> state on aarch64 platform. So this could happen with I/O threads.

Thanks for confirming!  Have you managed to test the final version of
the patches?  It would be great to include test results.

Paolo


Re: [RFC PATCH 0/4] async: fix hangs on weakly-ordered architectures
Posted by Ying Fang 4 years ago

On 2020/4/8 23:05, Paolo Bonzini wrote:
> On 08/04/20 11:12, Ying Fang wrote:
>> On 2020/4/7 22:07, Paolo Bonzini wrote:
>>> ARM machines and other weakly-ordered architectures have been suffering
>>> for a long time from hangs in qemu-img and qemu-io.  For QEMU binaries
>>> these are mitigated by the timers that sooner or later fire in the main
>>> loop, but these will not happen for the tools and probably not with I/O
>>> threads either.
>>
>> yes, we occasionally see qemu main thread hangs and VM stuck in in-shutdown
>> state on aarch64 platform. So this could happen with I/O threads.
> 
> Thanks for confirming!  Have you managed to test the final version of
> the patches?  It would be great to include test results.

Yes, I did the test with your latest patches on both aarch64 and
x86 platform.Test results show that the hang has been fixed. Thanks.

> 
> Paolo
> 
> 
> 

Re: [RFC PATCH 0/4] async: fix hangs on weakly-ordered architectures
Posted by Stefan Hajnoczi 4 years ago
On Tue, Apr 07, 2020 at 10:07:41AM -0400, Paolo Bonzini wrote:
> ARM machines and other weakly-ordered architectures have been suffering
> for a long time from hangs in qemu-img and qemu-io.  For QEMU binaries
> these are mitigated by the timers that sooner or later fire in the main
> loop, but these will not happen for the tools and probably not with I/O
> threads either.
> 
> The fix is in patch 5.  Patch 1-3 are docs updates that explain the bug,
> and patch 4 is a bugfix exposed by the new patch.
> 
> Paolo
> 
> Paolo Bonzini (5):
>   atomics: convert to reStructuredText
>   atomics: update documentation
>   rcu: do not mention atomic_mb_read/set in documentation
>   aio-wait: delegate polling of main AioContext if BQL not held
>   async: use explicit memory barriers
> 
>  docs/devel/atomics.rst   | 501 +++++++++++++++++++++++++++++++++++++++
>  docs/devel/atomics.txt   | 403 -------------------------------
>  docs/devel/index.rst     |   1 +
>  docs/devel/rcu.txt       |   4 +-
>  include/block/aio-wait.h |  22 ++
>  include/block/aio.h      |  29 +--
>  util/aio-posix.c         |  16 +-
>  util/aio-win32.c         |  17 +-
>  util/async.c             |  16 +-
>  9 files changed, 576 insertions(+), 433 deletions(-)
>  create mode 100644 docs/devel/atomics.rst
>  delete mode 100644 docs/devel/atomics.txt

Applied patches 4 and 5 to my block branch.

Stefan