[PATCH for-5.0 v5 0/3] Fix some AIO context locking in jobs

Stefan Reiter posted 3 patches 4 years ago
Test docker-mingw@fedora passed
Test docker-quick@centos7 passed
Test checkpatch passed
Test FreeBSD passed
Test asan passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20200407115651.69472-1-s.reiter@proxmox.com
Maintainers: Xie Changlong <xiechanglong.d@gmail.com>, Kevin Wolf <kwolf@redhat.com>, Markus Armbruster <armbru@redhat.com>, John Snow <jsnow@redhat.com>, Max Reitz <mreitz@redhat.com>, Wen Congyang <wencongyang2@huawei.com>
block/backup.c        |  4 ----
block/replication.c   |  5 ++++-
blockdev.c            |  9 ++++++++
job-qmp.c             |  9 ++++++++
job.c                 | 50 ++++++++++++++++++++++++++++++++++---------
tests/test-blockjob.c |  2 ++
6 files changed, 64 insertions(+), 15 deletions(-)
[PATCH for-5.0 v5 0/3] Fix some AIO context locking in jobs
Posted by Stefan Reiter 4 years ago
Contains three seperate but related patches cleaning up and fixing some
issues regarding aio_context_acquire/aio_context_release for jobs. Mostly
affects blockjobs running for devices that have IO threads enabled AFAICT.


Changes from v4:
* Do job_ref/job_unref in job_txn_apply and job_exit since we need the job to
  survive the callback to access the potentially changed lock afterwards
* Reduce patch 2/3 to an assert, the context should already be acquired since
  it's a bdrv handler
* Collect R-by for 3/3

I've marked it 'for-5.0' this time, I think it would make sense to be
picked up together with Kevin's "block: Fix blk->in_flight during
blk_wait_while_drained()" series. With that series and these three patches
applied I can no longer reproduce any of the reported related crashes/hangs.


Changes from v3:
* commit_job appears to be unset in certain cases when replication_close is
  called, only access when necessary to avoid SIGSEGV

Missed this when shuffling around patches, sorry for noise with still-broken v3.

Changes from v2:
* reordered patch 1 to the end to not introduce temporary breakages
* added more fixes to job txn patch (should now pass the tests)

Changes from v1:
* fixed commit message for patch 1
* added patches 2 and 3


qemu: Stefan Reiter (3):
  job: take each job's lock individually in job_txn_apply
  replication: assert we own context before job_cancel_sync
  backup: don't acquire aio_context in backup_clean

 block/backup.c        |  4 ----
 block/replication.c   |  5 ++++-
 blockdev.c            |  9 ++++++++
 job-qmp.c             |  9 ++++++++
 job.c                 | 50 ++++++++++++++++++++++++++++++++++---------
 tests/test-blockjob.c |  2 ++
 6 files changed, 64 insertions(+), 15 deletions(-)

-- 
2.26.0


Re: [PATCH for-5.0 v5 0/3] Fix some AIO context locking in jobs
Posted by Kevin Wolf 4 years ago
Am 07.04.2020 um 13:56 hat Stefan Reiter geschrieben:
> Contains three seperate but related patches cleaning up and fixing some
> issues regarding aio_context_acquire/aio_context_release for jobs. Mostly
> affects blockjobs running for devices that have IO threads enabled AFAICT.
> 
> 
> Changes from v4:
> * Do job_ref/job_unref in job_txn_apply and job_exit since we need the job to
>   survive the callback to access the potentially changed lock afterwards
> * Reduce patch 2/3 to an assert, the context should already be acquired since
>   it's a bdrv handler
> * Collect R-by for 3/3
> 
> I've marked it 'for-5.0' this time, I think it would make sense to be
> picked up together with Kevin's "block: Fix blk->in_flight during
> blk_wait_while_drained()" series. With that series and these three patches
> applied I can no longer reproduce any of the reported related crashes/hangs.

Thanks, applied to the block branch.

Kevin