[PATCH v4 0/3] Fix some AIO context locking in jobs

Stefan Reiter posted 3 patches 4 years ago
Test docker-mingw@fedora passed
Test docker-quick@centos7 passed
Test checkpatch passed
Test FreeBSD passed
Test asan passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20200401081504.200017-1-s.reiter@proxmox.com
Maintainers: Wen Congyang <wencongyang2@huawei.com>, Max Reitz <mreitz@redhat.com>, Xie Changlong <xiechanglong.d@gmail.com>, John Snow <jsnow@redhat.com>, Kevin Wolf <kwolf@redhat.com>
There is a newer version of this series
block/backup.c        |  4 ----
block/replication.c   |  8 +++++++-
job.c                 | 48 ++++++++++++++++++++++++++++++++++---------
tests/test-blockjob.c |  2 ++
4 files changed, 47 insertions(+), 15 deletions(-)
[PATCH v4 0/3] Fix some AIO context locking in jobs
Posted by Stefan Reiter 4 years ago
Contains three seperate but related patches cleaning up and fixing some
issues regarding aio_context_acquire/aio_context_release for jobs. Mostly
affects blockjobs running for devices that have IO threads enabled AFAICT.

This is based on the discussions here:
https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07929.html

Changes from v3:
* commit_job appears to be unset in certain cases when replication_close is
  called, only access when necessary to avoid SIGSEGV

Missed this when shuffling around patches, sorry for noise with still-broken v3.

Changes from v2:
* reordered patch 1 to the end to not introduce temporary breakages
* added more fixes to job txn patch (should now pass the tests)

Changes from v1:
* fixed commit message for patch 1
* added patches 2 and 3


qemu: Stefan Reiter (3):
  job: take each job's lock individually in job_txn_apply
  replication: acquire aio context before calling job_cancel_sync
  backup: don't acquire aio_context in backup_clean

 block/backup.c        |  4 ----
 block/replication.c   |  8 +++++++-
 job.c                 | 48 ++++++++++++++++++++++++++++++++++---------
 tests/test-blockjob.c |  2 ++
 4 files changed, 47 insertions(+), 15 deletions(-)

-- 
2.26.0


Re: [PATCH v4 0/3] Fix some AIO context locking in jobs
Posted by Kevin Wolf 4 years ago
Am 01.04.2020 um 10:15 hat Stefan Reiter geschrieben:
> Contains three seperate but related patches cleaning up and fixing some
> issues regarding aio_context_acquire/aio_context_release for jobs. Mostly
> affects blockjobs running for devices that have IO threads enabled AFAICT.
> 
> This is based on the discussions here:
> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07929.html

I'm getting segfaults in some qemu-iotests cases:

    Failures: 155 219 245 255 257 258

This is the backtrace of one of the coredumps I got, looks like use
after free:

(gdb) bt
#0  0x000055bad36ce4dc in qemu_mutex_lock_impl (mutex=0xebebebebebebec4b, file=0x55bad38c5cbf "util/async.c", line=596) at util/qemu-thread-posix.c:76
#1  0x000055bad35d4f4f in job_txn_apply (fn=0x55bad35d58b0 <job_finalize_single>, job=<optimized out>, job=<optimized out>) at job.c:168
#2  0x000055bad33aa807 in qmp_job_finalize (id=<optimized out>, errp=errp@entry=0x7ffff6a2ad68) at job-qmp.c:117
#3  0x000055bad357fabb in qmp_marshal_job_finalize (args=<optimized out>, ret=<optimized out>, errp=0x7ffff6a2adc8) at qapi/qapi-commands-job.c:204
#4  0x000055bad367f688 in qmp_dispatch (cmds=0x55bad3df2880 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:155
#5  0x000055bad355bfb1 in monitor_qmp_dispatch (mon=0x55bad5b0d2f0, req=<optimized out>) at monitor/qmp.c:145
#6  0x000055bad355c79a in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:234
#7  0x000055bad36c7ea5 in aio_bh_call (bh=0x55bad58fa2b0) at util/async.c:164
#8  0x000055bad36c7ea5 in aio_bh_poll (ctx=ctx@entry=0x55bad58f8ee0) at util/async.c:164
#9  0x000055bad36cb52e in aio_dispatch (ctx=0x55bad58f8ee0) at util/aio-posix.c:380
#10 0x000055bad36c7d8e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:298
#11 0x00007fa3a3f7a06d in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#12 0x000055bad36ca798 in glib_pollfds_poll () at util/main-loop.c:219
#13 0x000055bad36ca798 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
#14 0x000055bad36ca798 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518
#15 0x000055bad3340559 in qemu_main_loop () at /home/kwolf/source/qemu/softmmu/vl.c:1664
#16 0x000055bad322993e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/kwolf/source/qemu/softmmu/main.c:49
(gdb) p *job
$3 = {id = 0xebebebebebebebeb <error: Cannot access memory at address 0xebebebebebebebeb>, driver = 0xebebebebebebebeb, refcnt = -336860181, status = 3958107115, 
  aio_context = 0xebebebebebebebeb, co = 0xebebebebebebebeb, sleep_timer = {expire_time = -1446803456761533461, timer_list = 0xebebebebebebebeb, cb = 0xebebebebebebebeb, 
    opaque = 0xebebebebebebebeb, next = 0xebebebebebebebeb, attributes = -336860181, scale = -336860181}, pause_count = -336860181, busy = 235, paused = 235, user_paused = 235, 
  cancelled = 235, force_cancel = 235, deferred_to_main_loop = 235, auto_finalize = 235, auto_dismiss = 235, progress = {current = 16999940616948018155, total = 16999940616948018155}, 
  ret = -336860181, err = 0xebebebebebebebeb, cb = 0xebebebebebebebeb, opaque = 0xebebebebebebebeb, on_finalize_cancelled = {notifiers = {lh_first = 0xebebebebebebebeb}}, 
  on_finalize_completed = {notifiers = {lh_first = 0xebebebebebebebeb}}, on_pending = {notifiers = {lh_first = 0xebebebebebebebeb}}, on_ready = {notifiers = {lh_first = 
    0xebebebebebebebeb}}, on_idle = {notifiers = {lh_first = 0xebebebebebebebeb}}, job_list = {le_next = 0xebebebebebebebeb, le_prev = 0xebebebebebebebeb}, txn = 0xebebebebebebebeb, 
  txn_list = {le_next = 0xebebebebebebebeb, le_prev = 0xebebebebebebebeb}}

Kevin


Re: [PATCH v4 0/3] Fix some AIO context locking in jobs
Posted by John Snow 4 years ago

On 4/2/20 8:48 AM, Kevin Wolf wrote:
> Am 01.04.2020 um 10:15 hat Stefan Reiter geschrieben:
>> Contains three seperate but related patches cleaning up and fixing some
>> issues regarding aio_context_acquire/aio_context_release for jobs. Mostly
>> affects blockjobs running for devices that have IO threads enabled AFAICT.
>>
>> This is based on the discussions here:
>> https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07929.html
> 
> I'm getting segfaults in some qemu-iotests cases:
> 
>     Failures: 155 219 245 255 257 258
> 
> This is the backtrace of one of the coredumps I got, looks like use
> after free:
> 

fwiw, this appears to be the case after the very first patch, all six tests.

--js