[PATCH v5 29/45] block: introduce BDRV_O_NOPERM flag

Vladimir Sementsov-Ogievskiy posted 45 patches 3 years, 10 months ago
Maintainers: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>, Ari Sundholm <ari@tuxera.com>, Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>, Paolo Bonzini <pbonzini@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>, John Snow <jsnow@redhat.com>, Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru>, "Denis V. Lunev" <den@openvz.org>, Wen Congyang <wencongyang2@huawei.com>, Xie Changlong <xiechanglong.d@gmail.com>, Stefan Weil <sw@weilnetz.de>, Jeff Cody <codyprime@gmail.com>, Fam Zheng <fam@euphon.net>, Markus Armbruster <armbru@redhat.com>, Eric Blake <eblake@redhat.com>
[PATCH v5 29/45] block: introduce BDRV_O_NOPERM flag
Posted by Vladimir Sementsov-Ogievskiy 3 years, 10 months ago
Now copy-before-write filter has weak permission model: when it has no
parents, it share write permission on source. Otherwise we just can't
blockdev-add it, when existing user of source has write permission.

The situation is bad, it means that copy-before-write filter doesn't
guarantee that all write goes through it. And a lot better is unshare
write always. But how to insert the filter in this case?

The solution is to do blockdev-add and blockdev-replace in one
transaction, and more, update permissions only after both command.

For now, let's create a possibility to not update permission on file
child of copy-before-write filter at time of open.

New interfaces are:

- bds_tree_init() with flags argument, so that caller may pass
  additional flags, for example the new BDRV_O_NOPERM.

- bdrv_open_file_child_common() with boolean refresh_perms arguments.
  Drivers may use this function with refresh_perms = true, if they want
  to satisfy BDRV_O_NOPERM. No one such driver for now.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
---
 block.c                                | 82 +++++++++++++++++++-------
 block/monitor/block-hmp-cmds.c         |  2 +-
 blockdev.c                             | 13 ++--
 include/block/block-common.h           |  3 +
 include/block/block-global-state.h     | 11 ++++
 include/block/block_int-global-state.h |  3 +-
 6 files changed, 83 insertions(+), 31 deletions(-)

diff --git a/block.c b/block.c
index a7020d3cd8..ca0b629bec 100644
--- a/block.c
+++ b/block.c
@@ -3166,12 +3166,13 @@ out:
  * If @parent_bs and @child_bs are in different AioContexts, the caller must
  * hold the AioContext lock for @child_bs, but not for @parent_bs.
  */
-BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
-                             BlockDriverState *child_bs,
-                             const char *child_name,
-                             const BdrvChildClass *child_class,
-                             BdrvChildRole child_role,
-                             Error **errp)
+static BdrvChild *bdrv_do_attach_child(BlockDriverState *parent_bs,
+                                       BlockDriverState *child_bs,
+                                       const char *child_name,
+                                       const BdrvChildClass *child_class,
+                                       BdrvChildRole child_role,
+                                       bool refresh_perms,
+                                       Error **errp)
 {
     int ret;
     BdrvChild *child;
@@ -3185,9 +3186,11 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
         goto out;
     }
 
-    ret = bdrv_refresh_perms(parent_bs, tran, errp);
-    if (ret < 0) {
-        goto out;
+    if (refresh_perms) {
+        ret = bdrv_refresh_perms(parent_bs, tran, errp);
+        if (ret < 0) {
+            goto out;
+        }
     }
 
 out:
@@ -3198,6 +3201,17 @@ out:
     return ret < 0 ? NULL : child;
 }
 
+BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
+                             BlockDriverState *child_bs,
+                             const char *child_name,
+                             const BdrvChildClass *child_class,
+                             BdrvChildRole child_role,
+                             Error **errp)
+{
+    return bdrv_do_attach_child(parent_bs, child_bs, child_name, child_class,
+                                child_role, true, errp);
+}
+
 /* Caller is responsible to refresh permissions in @refresh_list */
 static void bdrv_root_unref_child_tran(BdrvChild *child, GSList **refresh_list,
                                        Transaction *tran)
@@ -3668,12 +3682,13 @@ done:
  *
  * The BlockdevRef will be removed from the options QDict.
  */
-BdrvChild *bdrv_open_child(const char *filename,
-                           QDict *options, const char *bdref_key,
-                           BlockDriverState *parent,
-                           const BdrvChildClass *child_class,
-                           BdrvChildRole child_role,
-                           bool allow_none, Error **errp)
+BdrvChild *bdrv_open_child_common(const char *filename,
+                                  QDict *options, const char *bdref_key,
+                                  BlockDriverState *parent,
+                                  const BdrvChildClass *child_class,
+                                  BdrvChildRole child_role,
+                                  bool allow_none, bool refresh_perms,
+                                  Error **errp)
 {
     BlockDriverState *bs;
 
@@ -3685,16 +3700,29 @@ BdrvChild *bdrv_open_child(const char *filename,
         return NULL;
     }
 
-    return bdrv_attach_child(parent, bs, bdref_key, child_class, child_role,
-                             errp);
+    return bdrv_do_attach_child(parent, bs, bdref_key, child_class, child_role,
+                                refresh_perms, errp);
+}
+
+BdrvChild *bdrv_open_child(const char *filename,
+                           QDict *options, const char *bdref_key,
+                           BlockDriverState *parent,
+                           const BdrvChildClass *child_class,
+                           BdrvChildRole child_role,
+                           bool allow_none, Error **errp)
+{
+    return bdrv_open_child_common(filename, options, bdref_key, parent,
+                                  child_class, child_role, allow_none, true,
+                                  errp);
 }
 
 /*
  * Wrapper on bdrv_open_child() for most popular case: open primary child of bs.
  */
-int bdrv_open_file_child(const char *filename,
-                         QDict *options, const char *bdref_key,
-                         BlockDriverState *parent, Error **errp)
+int bdrv_open_file_child_common(const char *filename,
+                                QDict *options, const char *bdref_key,
+                                BlockDriverState *parent, bool refresh_perms,
+                                Error **errp)
 {
     BdrvChildRole role;
 
@@ -3703,8 +3731,9 @@ int bdrv_open_file_child(const char *filename,
     role = parent->drv->is_filter ?
         (BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY) : BDRV_CHILD_IMAGE;
 
-    if (!bdrv_open_child(filename, options, bdref_key, parent,
-                         &child_of_bds, role, false, errp))
+    if (!bdrv_open_child_common(filename, options, bdref_key, parent,
+                                &child_of_bds, role, false, refresh_perms,
+                                errp))
     {
         return -EINVAL;
     }
@@ -3712,6 +3741,15 @@ int bdrv_open_file_child(const char *filename,
     return 0;
 }
 
+int bdrv_open_file_child(const char *filename,
+                         QDict *options, const char *bdref_key,
+                         BlockDriverState *parent,
+                         Error **errp)
+{
+    return bdrv_open_file_child_common(filename, options, bdref_key, parent,
+                                       true, errp);
+}
+
 /*
  * TODO Future callers may need to specify parent/child_class in order for
  * option inheritance to work. Existing callers use it for the root node.
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index bfb3c043a0..9145ccfc46 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -76,7 +76,7 @@ static void hmp_drive_add_node(Monitor *mon, const char *optstr)
         goto out;
     }
 
-    BlockDriverState *bs = bds_tree_init(qdict, &local_err);
+    BlockDriverState *bs = bds_tree_init(qdict, 0, &local_err);
     if (!bs) {
         error_report_err(local_err);
         goto out;
diff --git a/blockdev.c b/blockdev.c
index 1cd95f4f02..16a9b98afc 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -643,12 +643,11 @@ err_no_opts:
 }
 
 /* Takes the ownership of bs_opts */
-BlockDriverState *bds_tree_init(QDict *bs_opts, Error **errp)
+BlockDriverState *bds_tree_init(QDict *bs_opts, BdrvRequestFlags flags,
+                                Error **errp)
 {
-    int bdrv_flags = 0;
-
     GLOBAL_STATE_CODE();
-    /* bdrv_open() defaults to the values in bdrv_flags (for compatibility
+    /* bdrv_open() defaults to the values in flags (for compatibility
      * with other callers) rather than what we want as the real defaults.
      * Apply the defaults here instead. */
     qdict_set_default_str(bs_opts, BDRV_OPT_CACHE_DIRECT, "off");
@@ -656,10 +655,10 @@ BlockDriverState *bds_tree_init(QDict *bs_opts, Error **errp)
     qdict_set_default_str(bs_opts, BDRV_OPT_READ_ONLY, "off");
 
     if (runstate_check(RUN_STATE_INMIGRATE)) {
-        bdrv_flags |= BDRV_O_INACTIVE;
+        flags |= BDRV_O_INACTIVE;
     }
 
-    return bdrv_open(NULL, NULL, bs_opts, bdrv_flags, errp);
+    return bdrv_open(NULL, NULL, bs_opts, flags, errp);
 }
 
 void blockdev_close_all_bdrv_states(void)
@@ -3473,7 +3472,7 @@ void qmp_blockdev_add(BlockdevOptions *options, Error **errp)
         goto fail;
     }
 
-    bs = bds_tree_init(qdict, errp);
+    bs = bds_tree_init(qdict, 0, errp);
     if (!bs) {
         goto fail;
     }
diff --git a/include/block/block-common.h b/include/block/block-common.h
index 2f247dd607..face2d62d0 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -145,6 +145,9 @@ typedef enum {
                                       read-write fails */
 #define BDRV_O_IO_URING    0x40000 /* use io_uring instead of the thread pool */
 
+#define BDRV_O_NOPERM      0x80000 /* Don't update permissions if possible,
+                                      open() caller will do that. */
+
 #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_NO_FLUSH)
 
 
diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index f3ec72810e..8527bcad28 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -76,6 +76,17 @@ BdrvChild *bdrv_open_child(const char *filename,
                            const BdrvChildClass *child_class,
                            BdrvChildRole child_role,
                            bool allow_none, Error **errp);
+BdrvChild *bdrv_open_child_common(const char *filename,
+                                  QDict *options, const char *bdref_key,
+                                  BlockDriverState *parent,
+                                  const BdrvChildClass *child_class,
+                                  BdrvChildRole child_role,
+                                  bool allow_none, bool refresh_perms,
+                                  Error **errp);
+int bdrv_open_file_child_common(const char *filename,
+                                QDict *options, const char *bdref_key,
+                                BlockDriverState *parent, bool refresh_perms,
+                                Error **errp);
 int bdrv_open_file_child(const char *filename,
                          QDict *options, const char *bdref_key,
                          BlockDriverState *parent, Error **errp);
diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
index 0f21b0570b..aed62a45d9 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -245,7 +245,8 @@ void bdrv_set_monitor_owned(BlockDriverState *bs);
 
 void blockdev_close_all_bdrv_states(void);
 
-BlockDriverState *bds_tree_init(QDict *bs_opts, Error **errp);
+BlockDriverState *bds_tree_init(QDict *bs_opts, BdrvRequestFlags flags,
+                                Error **errp);
 
 /**
  * Simple implementation of bdrv_co_create_opts for protocol drivers
-- 
2.35.1
Re: [PATCH v5 29/45] block: introduce BDRV_O_NOPERM flag
Posted by Hanna Reitz 3 years, 8 months ago
On 30.03.22 23:28, Vladimir Sementsov-Ogievskiy wrote:
> Now copy-before-write filter has weak permission model: when it has no
> parents, it share write permission on source. Otherwise we just can't
> blockdev-add it, when existing user of source has write permission.
>
> The situation is bad, it means that copy-before-write filter doesn't
> guarantee that all write goes through it.

I don’t understand how this situation really is bad, because it sounds 
like anything else would just be a safeguard against users adding a CBW 
filter without making use of it.  Which I’d think is their own fault.

As far as I remember the actual problem is that we cannot do 
transactional graph modifications, where e.g. a CBW node is inserted and 
a bitmap is created in a single atomic transaction[1].  Which is a 
problem.  And now I just don’t quite understand how unsharing WRITE 
unconditionally would help with the actual problem.

[1] Then again, would then even be “atomic”?  For that transaction to 
work as intended, the node would need to be drained during the 
transaction (so that the bitmap stays in sync with the CBW state). It 
doesn’t look like that would be the case.

So perhaps I’m just remembering incorrectly.

> And a lot better is unshare
> write always. But how to insert the filter in this case?
>
> The solution is to do blockdev-add and blockdev-replace in one
> transaction, and more, update permissions only after both command.
>
> For now, let's create a possibility to not update permission on file
> child of copy-before-write filter at time of open.
>
> New interfaces are:
>
> - bds_tree_init() with flags argument, so that caller may pass
>    additional flags, for example the new BDRV_O_NOPERM.
>
> - bdrv_open_file_child_common() with boolean refresh_perms arguments.
>    Drivers may use this function with refresh_perms = true, if they want
>    to satisfy BDRV_O_NOPERM. No one such driver for now.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>


Re: [PATCH v5 29/45] block: introduce BDRV_O_NOPERM flag
Posted by Vladimir Sementsov-Ogievskiy 3 years, 7 months ago
On 6/13/22 12:54, Hanna Reitz wrote:
> On 30.03.22 23:28, Vladimir Sementsov-Ogievskiy wrote:
>> Now copy-before-write filter has weak permission model: when it has no
>> parents, it share write permission on source. Otherwise we just can't
>> blockdev-add it, when existing user of source has write permission.
>>
>> The situation is bad, it means that copy-before-write filter doesn't
>> guarantee that all write goes through it.
> 
> I don’t understand how this situation really is bad, because it sounds like anything else would just be a safeguard against users adding a CBW filter without making use of it.  Which I’d think is their own fault.
> 
> As far as I remember the actual problem is that we cannot do transactional graph modifications, where e.g. a CBW node is inserted and a bitmap is created in a single atomic transaction[1].  Which is a problem.  And now I just don’t quite understand how unsharing WRITE unconditionally would help with the actual problem.
> 
> [1] Then again, would then even be “atomic”?  For that transaction to work as intended, the node would need to be drained during the transaction (so that the bitmap stays in sync with the CBW state). It doesn’t look like that would be the case.

I think, we should already be in a drained section, when do the transaction.

In qmp_transaction we have bdrv_drain_all() call. It's enough if we don't yield during transaction actions (and mostly, we shouldn't) (is it enough, when we have iothreads?). Probably, it should be bdrv_drain_all_being() before all actions and bdrv_drain_all_end() after them.

> 
> So perhaps I’m just remembering incorrectly.

OK, the same answer: I should try to split these features, as they are separate:

1. transactional API

2. strict permissions for CBW

Seems that [2] is not necessary for [1]. If so, we can consider smaller picture (only [1]), and do [2] later (or not do, if it remains too complicated for the small profit).

> 
>> And a lot better is unshare
>> write always. But how to insert the filter in this case?
>>
>> The solution is to do blockdev-add and blockdev-replace in one
>> transaction, and more, update permissions only after both command.
>>
>> For now, let's create a possibility to not update permission on file
>> child of copy-before-write filter at time of open.
>>
>> New interfaces are:
>>
>> - bds_tree_init() with flags argument, so that caller may pass
>>    additional flags, for example the new BDRV_O_NOPERM.
>>
>> - bdrv_open_file_child_common() with boolean refresh_perms arguments.
>>    Drivers may use this function with refresh_perms = true, if they want
>>    to satisfy BDRV_O_NOPERM. No one such driver for now.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
> 
> 


-- 
Best regards,
Vladimir