[Qemu-devel] [PATCH 5/6] block: Fix write/resize permissions for inactive images

Kevin Wolf posted 6 patches 8 years, 9 months ago
[Qemu-devel] [PATCH 5/6] block: Fix write/resize permissions for inactive images
Posted by Kevin Wolf 8 years, 9 months ago
Format drivers for inactive nodes don't need write/resize permissions on
their bs->file and can share write/resize with another VM (in fact, this
is the whole point of keeping images inactive). Represent this fact in
the op blocker system, so that image locking does the right thing
without special-casing inactive images.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c               | 35 +++++++++++++++++++++++++++++++++--
 include/block/block.h |  1 +
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 773bd64..cd89467 100644
--- a/block.c
+++ b/block.c
@@ -192,11 +192,20 @@ void path_combine(char *dest, int dest_size,
     }
 }
 
+/* Returns whether the image file is opened as read-only. Note that this can
+ * return false and writing to the image file is still not possible because the
+ * image is inactivated. */
 bool bdrv_is_read_only(BlockDriverState *bs)
 {
     return bs->read_only;
 }
 
+/* Returns whether the image file can be written to right now */
+bool bdrv_is_writable(BlockDriverState *bs)
+{
+    return !bdrv_is_read_only(bs) && !(bs->open_flags & BDRV_O_INACTIVE);
+}
+
 int bdrv_can_set_read_only(BlockDriverState *bs, bool read_only, Error **errp)
 {
     /* Do not set read_only if copy_on_read is enabled */
@@ -1512,7 +1521,7 @@ static int bdrv_check_perm(BlockDriverState *bs, uint64_t cumulative_perms,
 
     /* Write permissions never work with read-only images */
     if ((cumulative_perms & (BLK_PERM_WRITE | BLK_PERM_WRITE_UNCHANGED)) &&
-        bdrv_is_read_only(bs))
+        !bdrv_is_writable(bs))
     {
         error_setg(errp, "Block node is read-only");
         return -EPERM;
@@ -1797,7 +1806,7 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
         bdrv_filter_default_perms(bs, c, role, perm, shared, &perm, &shared);
 
         /* Format drivers may touch metadata even if the guest doesn't write */
-        if (!bdrv_is_read_only(bs)) {
+        if (bdrv_is_writable(bs)) {
             perm |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
         }
 
@@ -1823,6 +1832,10 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
                   BLK_PERM_WRITE_UNCHANGED;
     }
 
+    if (bs->open_flags & BDRV_O_INACTIVE) {
+        shared |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
+    }
+
     *nperm = perm;
     *nshared = shared;
 }
@@ -3969,6 +3982,7 @@ void bdrv_init_with_whitelist(void)
 void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
 {
     BdrvChild *child, *parent;
+    uint64_t perm, shared_perm;
     Error *local_err = NULL;
     int ret;
 
@@ -4005,6 +4019,16 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
         return;
     }
 
+    /* Update permissions, they may differ for inactive nodes */
+    bdrv_get_cumulative_perm(bs, &perm, &shared_perm);
+    ret = bdrv_check_perm(bs, perm, shared_perm, NULL, &local_err);
+    if (ret < 0) {
+        bs->open_flags |= BDRV_O_INACTIVE;
+        error_propagate(errp, local_err);
+        return;
+    }
+    bdrv_set_perm(bs, perm, shared_perm);
+
     QLIST_FOREACH(parent, &bs->parents, next_parent) {
         if (parent->role->activate) {
             parent->role->activate(parent, &local_err);
@@ -4049,6 +4073,8 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs,
     }
 
     if (setting_flag) {
+        uint64_t perm, shared_perm;
+
         bs->open_flags |= BDRV_O_INACTIVE;
 
         QLIST_FOREACH(parent, &bs->parents, next_parent) {
@@ -4060,6 +4086,11 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs,
                 }
             }
         }
+
+        /* Update permissions, they may differ for inactive nodes */
+        bdrv_get_cumulative_perm(bs, &perm, &shared_perm);
+        bdrv_check_perm(bs, perm, shared_perm, NULL, &error_abort);
+        bdrv_set_perm(bs, perm, shared_perm);
     }
 
     QLIST_FOREACH(child, &bs->children, next) {
diff --git a/include/block/block.h b/include/block/block.h
index 80d51d8..90932b4 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -435,6 +435,7 @@ int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
                             int64_t sector_num, int nb_sectors, int *pnum);
 
 bool bdrv_is_read_only(BlockDriverState *bs);
+bool bdrv_is_writable(BlockDriverState *bs);
 int bdrv_can_set_read_only(BlockDriverState *bs, bool read_only, Error **errp);
 int bdrv_set_read_only(BlockDriverState *bs, bool read_only, Error **errp);
 bool bdrv_is_sg(BlockDriverState *bs);
-- 
1.8.3.1


Re: [Qemu-devel] [PATCH 5/6] block: Fix write/resize permissions for inactive images
Posted by Xie Changlong 8 years, 5 months ago
在 5/5/2017 12:52 AM, Kevin Wolf 写道:
>   
> +/* Returns whether the image file can be written to right now */
> +bool bdrv_is_writable(BlockDriverState *bs)
> +{
> +    return !bdrv_is_read_only(bs) && !(bs->open_flags & BDRV_O_INACTIVE);
> +}
> +

This commit use BDRV_O_INACTIVE to judge whether the image file can be 
written or not. But it blocks replication driver on the secondary node. 
For replication in secondary, we must ensure that the whole chain are 
writable:


   ||
   ||                            .----------
   ||                            | Secondary
   ||                            '----------
   ||
   ||

                                                         virtio-blk
                                                              ^
------>  3 NBD                                               |
   ||     server                                          2 filter
   ||        ^                                                ^
   ||        |                                                |
   ||  Secondary disk <--------- hidden-disk 5 <--------- active-disk 4
   ||        |          backing        ^       backing
   ||        |                         |
   ||        |                         |
   ||        '-------------------------'
   ||           drive-backup sync=none 6

The root casue is when we run replication in secondary, vmstate changes 
to RUN_STATE_INMIGRATE, then blockdev_init() sets bdrv_flags |= 
BDRV_O_INACTIVE. So the whole chain become readonly. I've tried on my 
side, but it seems not easy to fix it. I wonder if there is any way to 
bypass this? Any suggestion would be appreciated.

It's very easy to reproduce this scenario:
(gdb) r
Starting program: /root/.xie/qemu-colo/x86_64-softmmu/qemu-system-x86_64 
-boot c -m 2048 -smp 2 -qmp stdio -vnc :0 -name secondary -enable-kvm 
-cpu qemu64,+kvmclock -device piix3-usb-uhci -device usb-tablet -drive 
if=none,id=colo-disk,file.filename=/root/.xie/suse.qcow2.orgin,file.node-name=secondary_disk,driver=qcow2,node-name=sec-qcow2-driver-for-nbd 
-drive 
if=ide,id=active-disk0,node-name=active-disk111,throttling.bps-total=70000000,driver=replication,node-name=secondary-replication-driver,mode=secondary,top-id=active-disk0,file.driver=qcow2,file.node-name=active-qcow2-driver,file.file.filename=/mnt/ramfs/active_disk.img,file.file.node-name=active_disk,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.node-name=hidden-qcow2-driver,file.backing.file.node-name=hidden_disk,file.backing.backing=colo-disk 
-incoming tcp:0:8888
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff4801700 (LWP 25252)]
[New Thread 0x7ffff4000700 (LWP 25255)]
qemu-system-x86_64: -drive 
if=ide,id=active-disk0,node-name=active-disk111,throttling.bps-total=70000000,driver=replication,node-name=secondary-replication-driver,mode=secondary,top-id=active-disk0,file.driver=qcow2,file.node-name=active-qcow2-driver,file.file.filename=/mnt/ramfs/active_disk.img,file.file.node-name=active_disk,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfs/hidden_disk.img,file.backing.node-name=hidden-qcow2-driver,file.backing.file.node-name=hidden_disk,file.backing.backing=colo-disk: 
Block node is read-only
[Thread 0x7ffff4000700 (LWP 25255) exited]
[Thread 0x7ffff4801700 (LWP 25252) exited]
[Inferior 1 (process 25248) exited with code 01]
Missing separate debuginfos, use: debuginfo-install 
glib2-2.46.2-4.el7.x86_64 glibc-2.17-157.el7_3.4.x86_64 
libacl-2.2.51-12.el7.x86_64 libattr-2.4.46-12.el7.x86_64 
libgcc-4.8.5-11.el7.x86_64 libgcrypt-1.5.3-13.el7_3.1.x86_64 
libgpg-error-1.12-3.el7.x86_64 libstdc++-4.8.5-11.el7.x86_64 
libuuid-2.23.2-33.el7_3.2.x86_64 openssl-libs-1.0.1e-60.el7_3.1.x86_64 
pixman-0.34.0-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb)

-- 
Thanks
     -Xie

Re: [Qemu-devel] [PATCH 5/6] block: Fix write/resize permissions for inactive images
Posted by Fam Zheng 8 years, 5 months ago
On Fri, 08/18 18:06, Xie Changlong wrote:
> The root casue is when we run replication in secondary, vmstate changes to
> RUN_STATE_INMIGRATE, then blockdev_init() sets bdrv_flags |=
> BDRV_O_INACTIVE. So the whole chain become readonly. I've tried on my side,
> but it seems not easy to fix it. I wonder if there is any way to bypass
> this? Any suggestion would be appreciated.

The non-shared storage migration uses "nbd_server_add -w" at destinition side
where BDRV_O_INACTIVE is set for images like your case, the way it handles it is
by calling bdrv_invalidate_cache(). See nbd_export_new().

See also commit 3dff24f2dffc5f3aa46dc014122012848bd7959d.

I'm not sure if this is enough for block replication?

Fam

Re: [Qemu-devel] [PATCH 5/6] block: Fix write/resize permissions for inactive images
Posted by Eric Blake 8 years, 9 months ago
On 05/04/2017 11:52 AM, Kevin Wolf wrote:
> Format drivers for inactive nodes don't need write/resize permissions on
> their bs->file and can share write/resize with another VM (in fact, this
> is the whole point of keeping images inactive). Represent this fact in
> the op blocker system, so that image locking does the right thing
> without special-casing inactive images.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c               | 35 +++++++++++++++++++++++++++++++++--
>  include/block/block.h |  1 +
>  2 files changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 773bd64..cd89467 100644
> --- a/block.c
> +++ b/block.c
> @@ -192,11 +192,20 @@ void path_combine(char *dest, int dest_size,
>      }
>  }
>  
> +/* Returns whether the image file is opened as read-only. Note that this can
> + * return false and writing to the image file is still not possible because the

s/false and/false but/
s/is still not/still not be/

> + * image is inactivated. */
>  bool bdrv_is_read_only(BlockDriverState *bs)
>  {
>      return bs->read_only;
>  }
>  
> +/* Returns whether the image file can be written to right now */
> +bool bdrv_is_writable(BlockDriverState *bs)
> +{
> +    return !bdrv_is_read_only(bs) && !(bs->open_flags & BDRV_O_INACTIVE);
> +}

Nice.

Up to you if you think the grammar suggestion helps.
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org