[RFC PATCH v2 0/8] block-backend: Introduce I/O hang

Jiahui Cen posted 8 patches 3 years, 7 months ago
Test docker-quick@centos7 failed
Test docker-mingw@fedora failed
Test checkpatch failed
Test FreeBSD failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20200930094606.5323-1-cenjiahui@huawei.com
Maintainers: Markus Armbruster <armbru@redhat.com>, Kevin Wolf <kwolf@redhat.com>, Eric Blake <eblake@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>, Max Reitz <mreitz@redhat.com>
There is a newer version of this series
block/block-backend.c          | 300 +++++++++++++++++++++++++++++++++
blockdev.c                     |  11 ++
hw/block/virtio-blk.c          |   8 +
include/sysemu/block-backend.h |   5 +
qapi/block-core.json           |  26 +++
5 files changed, 350 insertions(+)
[RFC PATCH v2 0/8] block-backend: Introduce I/O hang
Posted by Jiahui Cen 3 years, 7 months ago
A VM in the cloud environment may use a virutal disk as the backend storage,
and there are usually filesystems on the virtual block device. When backend
storage is temporarily down, any I/O issued to the virtual block device will
cause an error. For example, an error occurred in ext4 filesystem would make
the filesystem readonly. However a cloud backend storage can be soon recovered.
For example, an IP-SAN may be down due to network failure and will be online
soon after network is recovered. The error in the filesystem may not be
recovered unless a device reattach or system restart. So an I/O rehandle is
in need to implement a self-healing mechanism.

This patch series propose a feature called I/O hang. It can rehandle AIOs
with EIO error without sending error back to guest. From guest's perspective
of view it is just like an IO is hanging and not returned. Guest can get
back running smoothly when I/O is recovred with this feature enabled.

v1->v2:
* Rebase to fix compile problems.
* Fix incorrect remove of rehandle list.
* Provide rehandle pause interface.

Jiahui Cen (8):
  block-backend: introduce I/O rehandle info
  block-backend: rehandle block aios when EIO
  block-backend: add I/O hang timeout
  block-backend: add I/O rehandle pause/unpause
  block-backend: enable I/O hang when timeout is set
  virtio-blk: pause I/O hang when resetting
  qemu-option: add I/O hang timeout option
  qapi: add I/O hang and I/O hang timeout qapi event

 block/block-backend.c          | 300 +++++++++++++++++++++++++++++++++
 blockdev.c                     |  11 ++
 hw/block/virtio-blk.c          |   8 +
 include/sysemu/block-backend.h |   5 +
 qapi/block-core.json           |  26 +++
 5 files changed, 350 insertions(+)

-- 
2.28.0


Re: [RFC PATCH v2 0/8] block-backend: Introduce I/O hang
Posted by cenjiahui 3 years, 7 months ago
Hi Kevin,

Could you please spend some time reviewing and commenting on this patch series.

Thanks,
Jiahui Cen

On 2020/9/30 17:45, Jiahui Cen wrote:
> A VM in the cloud environment may use a virutal disk as the backend storage,
> and there are usually filesystems on the virtual block device. When backend
> storage is temporarily down, any I/O issued to the virtual block device will
> cause an error. For example, an error occurred in ext4 filesystem would make
> the filesystem readonly. However a cloud backend storage can be soon recovered.
> For example, an IP-SAN may be down due to network failure and will be online
> soon after network is recovered. The error in the filesystem may not be
> recovered unless a device reattach or system restart. So an I/O rehandle is
> in need to implement a self-healing mechanism.
> 
> This patch series propose a feature called I/O hang. It can rehandle AIOs
> with EIO error without sending error back to guest. From guest's perspective
> of view it is just like an IO is hanging and not returned. Guest can get
> back running smoothly when I/O is recovred with this feature enabled.
> 
> v1->v2:
> * Rebase to fix compile problems.
> * Fix incorrect remove of rehandle list.
> * Provide rehandle pause interface.
> 
> Jiahui Cen (8):
>   block-backend: introduce I/O rehandle info
>   block-backend: rehandle block aios when EIO
>   block-backend: add I/O hang timeout
>   block-backend: add I/O rehandle pause/unpause
>   block-backend: enable I/O hang when timeout is set
>   virtio-blk: pause I/O hang when resetting
>   qemu-option: add I/O hang timeout option
>   qapi: add I/O hang and I/O hang timeout qapi event
> 
>  block/block-backend.c          | 300 +++++++++++++++++++++++++++++++++
>  blockdev.c                     |  11 ++
>  hw/block/virtio-blk.c          |   8 +
>  include/sysemu/block-backend.h |   5 +
>  qapi/block-core.json           |  26 +++
>  5 files changed, 350 insertions(+)
> 

Re: [RFC PATCH v2 0/8] block-backend: Introduce I/O hang
Posted by Ying Fang 3 years, 7 months ago

On 10/10/2020 10:27 AM, cenjiahui wrote:
> Hi Kevin,
> 
> Could you please spend some time reviewing and commenting on this patch series.
> 
> Thanks,
> Jiahui Cen

This feature is confirmed effective in a cloud storage environment since
it can help to improve the availability without pausing the entire
guest. Hope it won't be lost on the thread. Any comments or reviews
are welcome.

> 
> On 2020/9/30 17:45, Jiahui Cen wrote:
>> A VM in the cloud environment may use a virutal disk as the backend storage,
>> and there are usually filesystems on the virtual block device. When backend
>> storage is temporarily down, any I/O issued to the virtual block device will
>> cause an error. For example, an error occurred in ext4 filesystem would make
>> the filesystem readonly. However a cloud backend storage can be soon recovered.
>> For example, an IP-SAN may be down due to network failure and will be online
>> soon after network is recovered. The error in the filesystem may not be
>> recovered unless a device reattach or system restart. So an I/O rehandle is
>> in need to implement a self-healing mechanism.
>>
>> This patch series propose a feature called I/O hang. It can rehandle AIOs
>> with EIO error without sending error back to guest. From guest's perspective
>> of view it is just like an IO is hanging and not returned. Guest can get
>> back running smoothly when I/O is recovred with this feature enabled.
>>
>> v1->v2:
>> * Rebase to fix compile problems.
>> * Fix incorrect remove of rehandle list.
>> * Provide rehandle pause interface.
>>
>> Jiahui Cen (8):
>>    block-backend: introduce I/O rehandle info
>>    block-backend: rehandle block aios when EIO
>>    block-backend: add I/O hang timeout
>>    block-backend: add I/O rehandle pause/unpause
>>    block-backend: enable I/O hang when timeout is set
>>    virtio-blk: pause I/O hang when resetting
>>    qemu-option: add I/O hang timeout option
>>    qapi: add I/O hang and I/O hang timeout qapi event
>>
>>   block/block-backend.c          | 300 +++++++++++++++++++++++++++++++++
>>   blockdev.c                     |  11 ++
>>   hw/block/virtio-blk.c          |   8 +
>>   include/sysemu/block-backend.h |   5 +
>>   qapi/block-core.json           |  26 +++
>>   5 files changed, 350 insertions(+)
>>
> .
>