[PATCH v2 15/36] block: use topological sort for permission update

Vladimir Sementsov-Ogievskiy posted 36 patches 5 years, 2 months ago
Maintainers: Max Reitz <mreitz@redhat.com>, John Snow <jsnow@redhat.com>, Markus Armbruster <armbru@redhat.com>, Kevin Wolf <kwolf@redhat.com>
There is a newer version of this series
[PATCH v2 15/36] block: use topological sort for permission update
Posted by Vladimir Sementsov-Ogievskiy 5 years, 2 months ago
Rewrite bdrv_check_perm(), bdrv_abort_perm_update() and bdrv_set_perm()
to update nodes in topological sort order instead of simple DFS. With
topologically sorted nodes, we update a node only when all its parents
already updated. With DFS it's not so.

Consider the following example:

    A -+
    |  |
    |  v
    |  B
    |  |
    v  |
    C<-+

A is parent for B and C, B is parent for C.

Obviously, to update permissions, we should go in order A B C, so, when
we update C, all parent permissions already updated. But with current
approach (simple recursion) we can update in sequence A C B C (C is
updated twice). On first update of C, we consider old B permissions, so
doing wrong thing. If it succeed, all is OK, on second C update we will
finish with correct graph. But if the wrong thing failed, we break the
whole process for no reason (it's possible that updated B permission
will be less strict, but we will never check it).

Also new approach gives a way to simultaneously and correctly update
several nodes, we just need to run bdrv_topological_dfs() several times
to add all nodes and their subtrees into one topologically sorted list
(next patch will update bdrv_replace_node() in this manner).

Test test_parallel_perm_update() is now passing, so move it out of
debugging "if".

We also need to support ignore_children in
bdrv_check_parents_compliance().

For test 283 order of parents compliance check is changed.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block.c                     | 103 +++++++++++++++++++++++++++++-------
 tests/test-bdrv-graph-mod.c |   4 +-
 tests/qemu-iotests/283.out  |   2 +-
 3 files changed, 86 insertions(+), 23 deletions(-)

diff --git a/block.c b/block.c
index 92bfcbedc9..81ccf51605 100644
--- a/block.c
+++ b/block.c
@@ -1994,7 +1994,9 @@ static bool bdrv_a_allow_b(BdrvChild *a, BdrvChild *b, Error **errp)
     return false;
 }
 
-static bool bdrv_check_parents_compliance(BlockDriverState *bs, Error **errp)
+static bool bdrv_check_parents_compliance(BlockDriverState *bs,
+                                          GSList *ignore_children,
+                                          Error **errp)
 {
     BdrvChild *a, *b;
 
@@ -2005,7 +2007,9 @@ static bool bdrv_check_parents_compliance(BlockDriverState *bs, Error **errp)
      */
     QLIST_FOREACH(a, &bs->parents, next_parent) {
         QLIST_FOREACH(b, &bs->parents, next_parent) {
-            if (a == b) {
+            if (a == b || g_slist_find(ignore_children, a) ||
+                g_slist_find(ignore_children, b))
+            {
                 continue;
             }
 
@@ -2034,6 +2038,29 @@ static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
     }
 }
 
+static GSList *bdrv_topological_dfs(GSList *list, GHashTable *found,
+                                    BlockDriverState *bs)
+{
+    BdrvChild *child;
+    g_autoptr(GHashTable) local_found = NULL;
+
+    if (!found) {
+        assert(!list);
+        found = local_found = g_hash_table_new(NULL, NULL);
+    }
+
+    if (g_hash_table_contains(found, bs)) {
+        return list;
+    }
+    g_hash_table_add(found, bs);
+
+    QLIST_FOREACH(child, &bs->children, next) {
+        list = bdrv_topological_dfs(list, found, child->bs);
+    }
+
+    return g_slist_prepend(list, bs);
+}
+
 static void bdrv_child_set_perm_commit(void *opaque)
 {
     BdrvChild *c = opaque;
@@ -2098,10 +2125,10 @@ static void bdrv_child_set_perm_safe(BdrvChild *c, uint64_t perm,
  * A call to this function must always be followed by a call to bdrv_set_perm()
  * or bdrv_abort_perm_update().
  */
-static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
-                           uint64_t cumulative_perms,
-                           uint64_t cumulative_shared_perms,
-                           GSList *ignore_children, Error **errp)
+static int bdrv_node_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
+                                uint64_t cumulative_perms,
+                                uint64_t cumulative_shared_perms,
+                                GSList *ignore_children, Error **errp)
 {
     BlockDriver *drv = bs->drv;
     BdrvChild *c;
@@ -2166,21 +2193,43 @@ static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
     /* Check all children */
     QLIST_FOREACH(c, &bs->children, next) {
         uint64_t cur_perm, cur_shared;
-        GSList *cur_ignore_children;
 
         bdrv_child_perm(bs, c->bs, c, c->role, q,
                         cumulative_perms, cumulative_shared_perms,
                         &cur_perm, &cur_shared);
+        bdrv_child_set_perm_safe(c, cur_perm, cur_shared, NULL);
+    }
+
+    return 0;
+}
+
+static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
+                           uint64_t cumulative_perms,
+                           uint64_t cumulative_shared_perms,
+                           GSList *ignore_children, Error **errp)
+{
+    int ret;
+    BlockDriverState *root = bs;
+    g_autoptr(GSList) list = bdrv_topological_dfs(NULL, NULL, root);
+
+    for ( ; list; list = list->next) {
+        bs = list->data;
+
+        if (bs != root) {
+            if (!bdrv_check_parents_compliance(bs, ignore_children, errp)) {
+                return -EINVAL;
+            }
+
+            bdrv_get_cumulative_perm(bs, &cumulative_perms,
+                                     &cumulative_shared_perms);
+        }
 
-        cur_ignore_children = g_slist_prepend(g_slist_copy(ignore_children), c);
-        ret = bdrv_check_update_perm(c->bs, q, cur_perm, cur_shared,
-                                     cur_ignore_children, errp);
-        g_slist_free(cur_ignore_children);
+        ret = bdrv_node_check_perm(bs, q, cumulative_perms,
+                                   cumulative_shared_perms,
+                                   ignore_children, errp);
         if (ret < 0) {
             return ret;
         }
-
-        bdrv_child_set_perm_safe(c, cur_perm, cur_shared, NULL);
     }
 
     return 0;
@@ -2190,10 +2239,8 @@ static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
  * Notifies drivers that after a previous bdrv_check_perm() call, the
  * permission update is not performed and any preparations made for it (e.g.
  * taken file locks) need to be undone.
- *
- * This function recursively notifies all child nodes.
  */
-static void bdrv_abort_perm_update(BlockDriverState *bs)
+static void bdrv_node_abort_perm_update(BlockDriverState *bs)
 {
     BlockDriver *drv = bs->drv;
     BdrvChild *c;
@@ -2208,11 +2255,19 @@ static void bdrv_abort_perm_update(BlockDriverState *bs)
 
     QLIST_FOREACH(c, &bs->children, next) {
         bdrv_child_set_perm_abort(c);
-        bdrv_abort_perm_update(c->bs);
     }
 }
 
-static void bdrv_set_perm(BlockDriverState *bs)
+static void bdrv_abort_perm_update(BlockDriverState *bs)
+{
+    g_autoptr(GSList) list = bdrv_topological_dfs(NULL, NULL, bs);
+
+    for ( ; list; list = list->next) {
+        bdrv_node_abort_perm_update((BlockDriverState *)list->data);
+    }
+}
+
+static void bdrv_node_set_perm(BlockDriverState *bs)
 {
     uint64_t cumulative_perms, cumulative_shared_perms;
     BlockDriver *drv = bs->drv;
@@ -2238,7 +2293,15 @@ static void bdrv_set_perm(BlockDriverState *bs)
     /* Update all children */
     QLIST_FOREACH(c, &bs->children, next) {
         bdrv_child_set_perm_commit(c);
-        bdrv_set_perm(c->bs);
+    }
+}
+
+static void bdrv_set_perm(BlockDriverState *bs)
+{
+    g_autoptr(GSList) list = bdrv_topological_dfs(NULL, NULL, bs);
+
+    for ( ; list; list = list->next) {
+        bdrv_node_set_perm((BlockDriverState *)list->data);
     }
 }
 
@@ -2351,7 +2414,7 @@ static int bdrv_refresh_perms(BlockDriverState *bs, Error **errp)
     int ret;
     uint64_t perm, shared_perm;
 
-    if (!bdrv_check_parents_compliance(bs, errp)) {
+    if (!bdrv_check_parents_compliance(bs, NULL, errp)) {
         return -EPERM;
     }
     bdrv_get_cumulative_perm(bs, &perm, &shared_perm);
diff --git a/tests/test-bdrv-graph-mod.c b/tests/test-bdrv-graph-mod.c
index 74f4a4153b..0d62e05ddb 100644
--- a/tests/test-bdrv-graph-mod.c
+++ b/tests/test-bdrv-graph-mod.c
@@ -316,12 +316,12 @@ int main(int argc, char *argv[])
     g_test_add_func("/bdrv-graph-mod/update-perm-tree", test_update_perm_tree);
     g_test_add_func("/bdrv-graph-mod/should-update-child",
                     test_should_update_child);
+    g_test_add_func("/bdrv-graph-mod/parallel-perm-update",
+                    test_parallel_perm_update);
 
     if (debug) {
         g_test_add_func("/bdrv-graph-mod/parallel-exclusive-write",
                         test_parallel_exclusive_write);
-        g_test_add_func("/bdrv-graph-mod/parallel-perm-update",
-                        test_parallel_perm_update);
     }
 
     return g_test_run();
diff --git a/tests/qemu-iotests/283.out b/tests/qemu-iotests/283.out
index d8cff22cc1..fbb7d0f619 100644
--- a/tests/qemu-iotests/283.out
+++ b/tests/qemu-iotests/283.out
@@ -5,4 +5,4 @@
 {"execute": "blockdev-add", "arguments": {"driver": "blkdebug", "image": "base", "node-name": "other", "take-child-perms": ["write"]}}
 {"return": {}}
 {"execute": "blockdev-backup", "arguments": {"device": "source", "sync": "full", "target": "target"}}
-{"error": {"class": "GenericError", "desc": "Cannot set permissions for backup-top filter: Conflicts with use by other as 'image', which uses 'write' on base"}}
+{"error": {"class": "GenericError", "desc": "Cannot set permissions for backup-top filter: Conflicts with use by source as 'image', which does not allow 'write' on base"}}
-- 
2.21.3


Re: [PATCH v2 15/36] block: use topological sort for permission update
Posted by Kevin Wolf 5 years ago
Am 27.11.2020 um 15:45 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Rewrite bdrv_check_perm(), bdrv_abort_perm_update() and bdrv_set_perm()
> to update nodes in topological sort order instead of simple DFS. With
> topologically sorted nodes, we update a node only when all its parents
> already updated. With DFS it's not so.
> 
> Consider the following example:
> 
>     A -+
>     |  |
>     |  v
>     |  B
>     |  |
>     v  |
>     C<-+
> 
> A is parent for B and C, B is parent for C.
> 
> Obviously, to update permissions, we should go in order A B C, so, when
> we update C, all parent permissions already updated.

I wondered for a moment why this order is obvious. Taking a permission
on A may mean that we need to take the permisson on C, too.

The answer is (or so I think) that the whole operation is atomic so the
half-updated state will never be visible to a caller, but this is about
calculating the right permissions. Permissions a node needs on its
children may depend on what its parents requested, but parent
permissions never depend on what children request.

Ok, makes sense.

> But with current
> approach (simple recursion) we can update in sequence A C B C (C is
> updated twice). On first update of C, we consider old B permissions, so
> doing wrong thing. If it succeed, all is OK, on second C update we will
> finish with correct graph. But if the wrong thing failed, we break the
> whole process for no reason (it's possible that updated B permission
> will be less strict, but we will never check it).
> 
> Also new approach gives a way to simultaneously and correctly update
> several nodes, we just need to run bdrv_topological_dfs() several times
> to add all nodes and their subtrees into one topologically sorted list
> (next patch will update bdrv_replace_node() in this manner).
> 
> Test test_parallel_perm_update() is now passing, so move it out of
> debugging "if".
> 
> We also need to support ignore_children in
> bdrv_check_parents_compliance().
> 
> For test 283 order of parents compliance check is changed.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block.c                     | 103 +++++++++++++++++++++++++++++-------
>  tests/test-bdrv-graph-mod.c |   4 +-
>  tests/qemu-iotests/283.out  |   2 +-
>  3 files changed, 86 insertions(+), 23 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 92bfcbedc9..81ccf51605 100644
> --- a/block.c
> +++ b/block.c
> @@ -1994,7 +1994,9 @@ static bool bdrv_a_allow_b(BdrvChild *a, BdrvChild *b, Error **errp)
>      return false;
>  }
>  
> -static bool bdrv_check_parents_compliance(BlockDriverState *bs, Error **errp)
> +static bool bdrv_check_parents_compliance(BlockDriverState *bs,
> +                                          GSList *ignore_children,
> +                                          Error **errp)
>  {
>      BdrvChild *a, *b;
>  
> @@ -2005,7 +2007,9 @@ static bool bdrv_check_parents_compliance(BlockDriverState *bs, Error **errp)
>       */
>      QLIST_FOREACH(a, &bs->parents, next_parent) {
>          QLIST_FOREACH(b, &bs->parents, next_parent) {
> -            if (a == b) {
> +            if (a == b || g_slist_find(ignore_children, a) ||
> +                g_slist_find(ignore_children, b))

'a' should be checked in the outer loop, no reason to repeat the same
check all the time in the inner loop.

> +            {
>                  continue;
>              }
>  
> @@ -2034,6 +2038,29 @@ static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
>      }
>  }
>  
> +static GSList *bdrv_topological_dfs(GSList *list, GHashTable *found,
> +                                    BlockDriverState *bs)

It would be good to have a comment that explains the details of the
contract.

In particular, this seems to require that @list is already topologically
sorted, and it's complete in the sense that if a node is in the list,
all of its children are in the list, too.

> +{
> +    BdrvChild *child;
> +    g_autoptr(GHashTable) local_found = NULL;
> +
> +    if (!found) {
> +        assert(!list);
> +        found = local_found = g_hash_table_new(NULL, NULL);
> +    }
> +
> +    if (g_hash_table_contains(found, bs)) {
> +        return list;
> +    }
> +    g_hash_table_add(found, bs);
> +
> +    QLIST_FOREACH(child, &bs->children, next) {
> +        list = bdrv_topological_dfs(list, found, child->bs);
> +    }
> +
> +    return g_slist_prepend(list, bs);
> +}
> +
>  static void bdrv_child_set_perm_commit(void *opaque)
>  {
>      BdrvChild *c = opaque;
> @@ -2098,10 +2125,10 @@ static void bdrv_child_set_perm_safe(BdrvChild *c, uint64_t perm,
>   * A call to this function must always be followed by a call to bdrv_set_perm()
>   * or bdrv_abort_perm_update().
>   */

One big source of confusion for me when trying to understand this was
that bdrv_check_perm() is a misnomer since commit f962e96150e and the
above comment isn't really accurate any more.

The function doesn't only check the validity of the new permissions in
advance to actually making the change, but it already updates the
permissions of all child nodes (however not of its root node).

So we have gone from the original check/set/abort model (which the
function names still suggest) to a prepare/commit/rollback model.

I think some comment updates are in order, and possibly we should rename
some functions, too.

> -static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
> -                           uint64_t cumulative_perms,
> -                           uint64_t cumulative_shared_perms,
> -                           GSList *ignore_children, Error **errp)
> +static int bdrv_node_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
> +                                uint64_t cumulative_perms,
> +                                uint64_t cumulative_shared_perms,
> +                                GSList *ignore_children, Error **errp)
>  {
>      BlockDriver *drv = bs->drv;
>      BdrvChild *c;
> @@ -2166,21 +2193,43 @@ static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
>      /* Check all children */
>      QLIST_FOREACH(c, &bs->children, next) {
>          uint64_t cur_perm, cur_shared;
> -        GSList *cur_ignore_children;
>  
>          bdrv_child_perm(bs, c->bs, c, c->role, q,
>                          cumulative_perms, cumulative_shared_perms,
>                          &cur_perm, &cur_shared);
> +        bdrv_child_set_perm_safe(c, cur_perm, cur_shared, NULL);

This "added" line is actually old code. What is removed here is the
recursive call of bdrv_check_update_perm(). This is what the code below
will have to replace.

> +    }
> +
> +    return 0;
> +}
> +
> +static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
> +                           uint64_t cumulative_perms,
> +                           uint64_t cumulative_shared_perms,
> +                           GSList *ignore_children, Error **errp)
> +{
> +    int ret;
> +    BlockDriverState *root = bs;
> +    g_autoptr(GSList) list = bdrv_topological_dfs(NULL, NULL, root);
> +
> +    for ( ; list; list = list->next) {
> +        bs = list->data;
> +
> +        if (bs != root) {
> +            if (!bdrv_check_parents_compliance(bs, ignore_children, errp)) {
> +                return -EINVAL;
> +            }

At this point bs still had the old permissions, but we don't access
them. As we're going in topological order, the parents have already been
updated if they were a child covered in bdrv_node_check_perm(), so we're
checking the relevant values. Good.

What about the root node? If I understand correctly, the parents of the
root nodes wouldn't have been checked in the old code. In the new state,
the parent BdrvChild already has to contain the new permission.

In bdrv_refresh_perms(), we already check parent conflicts, so no change
for all callers going through it. Good.

bdrv_reopen_multiple() is less obvious. It passes permissions from the
BDRVReopenState, without applying the permissions first. Do we check the
old parent permissions instead of the new state here?

> +            bdrv_get_cumulative_perm(bs, &cumulative_perms,
> +                                     &cumulative_shared_perms);
> +        }
>  
> -        cur_ignore_children = g_slist_prepend(g_slist_copy(ignore_children), c);
> -        ret = bdrv_check_update_perm(c->bs, q, cur_perm, cur_shared,
> -                                     cur_ignore_children, errp);
> -        g_slist_free(cur_ignore_children);
> +        ret = bdrv_node_check_perm(bs, q, cumulative_perms,
> +                                   cumulative_shared_perms,
> +                                   ignore_children, errp);

We use the original ignore_children for every node in the sorted list.
The old code extends it with all nodes in the path to each node.

For the bdrv_check_update_perm() call that is now replaced with
bdrv_check_parents_compliance(), I think this was necessary because
bdrv_check_update_perm() always assumes adding a new edge, so if you
update one instead of adding it, you have to ignore it so that it can't
conflict with itself. This isn't necessary any more now because we just
update and then check for consistency.

For passing to bdrv_node_check_perm() it doesn't make a difference
anyway because the parameter is now unused (and should probably be
removed).

>          if (ret < 0) {
>              return ret;
>          }
> -
> -        bdrv_child_set_perm_safe(c, cur_perm, cur_shared, NULL);
>      }
>  
>      return 0;

A tricky patch to understand, but I think it's right for the most part.

Kevin


Re: [PATCH v2 15/36] block: use topological sort for permission update
Posted by Vladimir Sementsov-Ogievskiy 5 years ago
27.01.2021 21:38, Kevin Wolf wrote:
> Am 27.11.2020 um 15:45 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Rewrite bdrv_check_perm(), bdrv_abort_perm_update() and bdrv_set_perm()
>> to update nodes in topological sort order instead of simple DFS. With
>> topologically sorted nodes, we update a node only when all its parents
>> already updated. With DFS it's not so.
>>
>> Consider the following example:
>>
>>      A -+
>>      |  |
>>      |  v
>>      |  B
>>      |  |
>>      v  |
>>      C<-+
>>
>> A is parent for B and C, B is parent for C.
>>
>> Obviously, to update permissions, we should go in order A B C, so, when
>> we update C, all parent permissions already updated.
> 
> I wondered for a moment why this order is obvious. Taking a permission
> on A may mean that we need to take the permisson on C, too.
> 
> The answer is (or so I think) that the whole operation is atomic so the
> half-updated state will never be visible to a caller, but this is about
> calculating the right permissions. Permissions a node needs on its
> children may depend on what its parents requested, but parent
> permissions never depend on what children request.
> 

yes, that's about these relations

> 
>> But with current
>> approach (simple recursion) we can update in sequence A C B C (C is
>> updated twice). On first update of C, we consider old B permissions, so
>> doing wrong thing. If it succeed, all is OK, on second C update we will
>> finish with correct graph. But if the wrong thing failed, we break the
>> whole process for no reason (it's possible that updated B permission
>> will be less strict, but we will never check it).
>>
>> Also new approach gives a way to simultaneously and correctly update
>> several nodes, we just need to run bdrv_topological_dfs() several times
>> to add all nodes and their subtrees into one topologically sorted list
>> (next patch will update bdrv_replace_node() in this manner).
>>
>> Test test_parallel_perm_update() is now passing, so move it out of
>> debugging "if".
>>
>> We also need to support ignore_children in
>> bdrv_check_parents_compliance().
>>
>> For test 283 order of parents compliance check is changed.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block.c                     | 103 +++++++++++++++++++++++++++++-------
>>   tests/test-bdrv-graph-mod.c |   4 +-
>>   tests/qemu-iotests/283.out  |   2 +-
>>   3 files changed, 86 insertions(+), 23 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 92bfcbedc9..81ccf51605 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -1994,7 +1994,9 @@ static bool bdrv_a_allow_b(BdrvChild *a, BdrvChild *b, Error **errp)
>>       return false;
>>   }
>>   
>> -static bool bdrv_check_parents_compliance(BlockDriverState *bs, Error **errp)
>> +static bool bdrv_check_parents_compliance(BlockDriverState *bs,
>> +                                          GSList *ignore_children,
>> +                                          Error **errp)
>>   {
>>       BdrvChild *a, *b;
>>   
>> @@ -2005,7 +2007,9 @@ static bool bdrv_check_parents_compliance(BlockDriverState *bs, Error **errp)
>>        */
>>       QLIST_FOREACH(a, &bs->parents, next_parent) {
>>           QLIST_FOREACH(b, &bs->parents, next_parent) {
>> -            if (a == b) {
>> +            if (a == b || g_slist_find(ignore_children, a) ||
>> +                g_slist_find(ignore_children, b))
> 
> 'a' should be checked in the outer loop, no reason to repeat the same
> check all the time in the inner loop.
> 
>> +            {
>>                   continue;
>>               }
>>   
>> @@ -2034,6 +2038,29 @@ static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
>>       }
>>   }
>>   
>> +static GSList *bdrv_topological_dfs(GSList *list, GHashTable *found,
>> +                                    BlockDriverState *bs)
> 
> It would be good to have a comment that explains the details of the
> contract.
> 
> In particular, this seems to require that @list is already topologically
> sorted, and it's complete in the sense that if a node is in the list,
> all of its children are in the list, too.

Right, will add

> 
>> +{
>> +    BdrvChild *child;
>> +    g_autoptr(GHashTable) local_found = NULL;
>> +
>> +    if (!found) {
>> +        assert(!list);
>> +        found = local_found = g_hash_table_new(NULL, NULL);
>> +    }
>> +
>> +    if (g_hash_table_contains(found, bs)) {
>> +        return list;
>> +    }
>> +    g_hash_table_add(found, bs);
>> +
>> +    QLIST_FOREACH(child, &bs->children, next) {
>> +        list = bdrv_topological_dfs(list, found, child->bs);
>> +    }
>> +
>> +    return g_slist_prepend(list, bs);
>> +}
>> +
>>   static void bdrv_child_set_perm_commit(void *opaque)
>>   {
>>       BdrvChild *c = opaque;
>> @@ -2098,10 +2125,10 @@ static void bdrv_child_set_perm_safe(BdrvChild *c, uint64_t perm,
>>    * A call to this function must always be followed by a call to bdrv_set_perm()
>>    * or bdrv_abort_perm_update().
>>    */
> 
> One big source of confusion for me when trying to understand this was
> that bdrv_check_perm() is a misnomer since commit f962e96150e and the
> above comment isn't really accurate any more.
> 
> The function doesn't only check the validity of the new permissions in
> advance to actually making the change, but it already updates the
> permissions of all child nodes (however not of its root node).
> 
> So we have gone from the original check/set/abort model (which the
> function names still suggest) to a prepare/commit/rollback model.
> 
> I think some comment updates are in order, and possibly we should rename
> some functions, too.
> 

In the end of the series they are refactored and renamed to be native part of new transaction system (introduced in [10])

>> -static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
>> -                           uint64_t cumulative_perms,
>> -                           uint64_t cumulative_shared_perms,
>> -                           GSList *ignore_children, Error **errp)
>> +static int bdrv_node_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
>> +                                uint64_t cumulative_perms,
>> +                                uint64_t cumulative_shared_perms,
>> +                                GSList *ignore_children, Error **errp)
>>   {
>>       BlockDriver *drv = bs->drv;
>>       BdrvChild *c;
>> @@ -2166,21 +2193,43 @@ static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
>>       /* Check all children */
>>       QLIST_FOREACH(c, &bs->children, next) {
>>           uint64_t cur_perm, cur_shared;
>> -        GSList *cur_ignore_children;
>>   
>>           bdrv_child_perm(bs, c->bs, c, c->role, q,
>>                           cumulative_perms, cumulative_shared_perms,
>>                           &cur_perm, &cur_shared);
>> +        bdrv_child_set_perm_safe(c, cur_perm, cur_shared, NULL);
> 
> This "added" line is actually old code. What is removed here is the
> recursive call of bdrv_check_update_perm(). This is what the code below
> will have to replace.

yes, we'll use explicit loop instead of recursion

> 
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
>> +                           uint64_t cumulative_perms,
>> +                           uint64_t cumulative_shared_perms,
>> +                           GSList *ignore_children, Error **errp)
>> +{
>> +    int ret;
>> +    BlockDriverState *root = bs;
>> +    g_autoptr(GSList) list = bdrv_topological_dfs(NULL, NULL, root);
>> +
>> +    for ( ; list; list = list->next) {
>> +        bs = list->data;
>> +
>> +        if (bs != root) {
>> +            if (!bdrv_check_parents_compliance(bs, ignore_children, errp)) {
>> +                return -EINVAL;
>> +            }
> 
> At this point bs still had the old permissions, but we don't access
> them. As we're going in topological order, the parents have already been
> updated if they were a child covered in bdrv_node_check_perm(), so we're
> checking the relevant values. Good.
> 
> What about the root node? If I understand correctly, the parents of the
> root nodes wouldn't have been checked in the old code. In the new state,
> the parent BdrvChild already has to contain the new permission.
> 
> In bdrv_refresh_perms(), we already check parent conflicts, so no change
> for all callers going through it. Good.
> 
> bdrv_reopen_multiple() is less obvious. It passes permissions from the
> BDRVReopenState, without applying the permissions first.

It will be changed in the series

> Do we check the
> old parent permissions instead of the new state here?

We use given (new) cumulative permissions for bs, and recalculate permissions for bs subtree.

It follows old behavior. The only thing is changed that pre-patch we do DFS recursion starting from bs (and probably visit some nodes several times), after-patch we first do topological sort of bs subtree and go through the list. The order of nodes is better and we visit each node once.

> 
>> +            bdrv_get_cumulative_perm(bs, &cumulative_perms,
>> +                                     &cumulative_shared_perms);
>> +        }
>>   
>> -        cur_ignore_children = g_slist_prepend(g_slist_copy(ignore_children), c);
>> -        ret = bdrv_check_update_perm(c->bs, q, cur_perm, cur_shared,
>> -                                     cur_ignore_children, errp);
>> -        g_slist_free(cur_ignore_children);
>> +        ret = bdrv_node_check_perm(bs, q, cumulative_perms,
>> +                                   cumulative_shared_perms,
>> +                                   ignore_children, errp);
> 
> We use the original ignore_children for every node in the sorted list.
> The old code extends it with all nodes in the path to each node.
> 
> For the bdrv_check_update_perm() call that is now replaced with
> bdrv_check_parents_compliance(), I think this was necessary because
> bdrv_check_update_perm() always assumes adding a new edge, so if you
> update one instead of adding it, you have to ignore it so that it can't
> conflict with itself. This isn't necessary any more now because we just
> update and then check for consistency.
> 
> For passing to bdrv_node_check_perm() it doesn't make a difference
> anyway because the parameter is now unused (and should probably be
> removed).

ignore_children will be dropped in [27]. For now it is still needed for bdrv_replace_node_common

> 
>>           if (ret < 0) {
>>               return ret;
>>           }
>> -
>> -        bdrv_child_set_perm_safe(c, cur_perm, cur_shared, NULL);
>>       }
>>   
>>       return 0;
> 
> A tricky patch to understand, but I think it's right for the most part.
> 
> Kevin
> 


-- 
Best regards,
Vladimir

Re: [PATCH v2 15/36] block: use topological sort for permission update
Posted by Kevin Wolf 5 years ago
Am 28.01.2021 um 10:34 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 27.01.2021 21:38, Kevin Wolf wrote:
> > Am 27.11.2020 um 15:45 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > -static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
> > > -                           uint64_t cumulative_perms,
> > > -                           uint64_t cumulative_shared_perms,
> > > -                           GSList *ignore_children, Error **errp)
> > > +static int bdrv_node_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
> > > +                                uint64_t cumulative_perms,
> > > +                                uint64_t cumulative_shared_perms,
> > > +                                GSList *ignore_children, Error **errp)
> > >   {
> > >       BlockDriver *drv = bs->drv;
> > >       BdrvChild *c;
> > > @@ -2166,21 +2193,43 @@ static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
> > >       /* Check all children */
> > >       QLIST_FOREACH(c, &bs->children, next) {
> > >           uint64_t cur_perm, cur_shared;
> > > -        GSList *cur_ignore_children;
> > >           bdrv_child_perm(bs, c->bs, c, c->role, q,
> > >                           cumulative_perms, cumulative_shared_perms,
> > >                           &cur_perm, &cur_shared);
> > > +        bdrv_child_set_perm_safe(c, cur_perm, cur_shared, NULL);
> > 
> > This "added" line is actually old code. What is removed here is the
> > recursive call of bdrv_check_update_perm(). This is what the code below
> > will have to replace.
> 
> yes, we'll use explicit loop instead of recursion
> 
> > 
> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
> > > +                           uint64_t cumulative_perms,
> > > +                           uint64_t cumulative_shared_perms,
> > > +                           GSList *ignore_children, Error **errp)
> > > +{
> > > +    int ret;
> > > +    BlockDriverState *root = bs;
> > > +    g_autoptr(GSList) list = bdrv_topological_dfs(NULL, NULL, root);
> > > +
> > > +    for ( ; list; list = list->next) {
> > > +        bs = list->data;
> > > +
> > > +        if (bs != root) {
> > > +            if (!bdrv_check_parents_compliance(bs, ignore_children, errp)) {
> > > +                return -EINVAL;
> > > +            }
> > 
> > At this point bs still had the old permissions, but we don't access
> > them. As we're going in topological order, the parents have already been
> > updated if they were a child covered in bdrv_node_check_perm(), so we're
> > checking the relevant values. Good.
> > 
> > What about the root node? If I understand correctly, the parents of the
> > root nodes wouldn't have been checked in the old code. In the new state,
> > the parent BdrvChild already has to contain the new permission.
> > 
> > In bdrv_refresh_perms(), we already check parent conflicts, so no change
> > for all callers going through it. Good.
> > 
> > bdrv_reopen_multiple() is less obvious. It passes permissions from the
> > BDRVReopenState, without applying the permissions first.
> 
> It will be changed in the series
> 
> > Do we check the
> > old parent permissions instead of the new state here?
> 
> We use given (new) cumulative permissions for bs, and recalculate
> permissions for bs subtree.

Where do we actually set them? I would expect a
bdrv_child_set_perm_safe() call somewhere, but I can't see it in the
call path from bdrv_reopen_multiple().

> It follows old behavior. The only thing is changed that pre-patch we
> do DFS recursion starting from bs (and probably visit some nodes
> several times), after-patch we first do topological sort of bs subtree
> and go through the list. The order of nodes is better and we visit
> each node once.

It's not the only thing that changes. Maybe this is what makes the patch
hard to understand, because it seems to do two steps at once:

1. Change the order in which nodes are processed

2. Replace bdrv_check_update_perm() with bdrv_check_parents_compliance()

In step 2, the point I mentioned above is important (new permissions
must already be set in the BdrvChild objects).

The switch to bdrv_check_parents_compliance() also means that error
messages become a bit worse because we don't know any more which of the
conflicting nodes is the new one, so we can't provide two different
messages any more. This is probably unavoidable, though.

> > 
> > > +            bdrv_get_cumulative_perm(bs, &cumulative_perms,
> > > +                                     &cumulative_shared_perms);
> > > +        }
> > > -        cur_ignore_children = g_slist_prepend(g_slist_copy(ignore_children), c);
> > > -        ret = bdrv_check_update_perm(c->bs, q, cur_perm, cur_shared,
> > > -                                     cur_ignore_children, errp);
> > > -        g_slist_free(cur_ignore_children);
> > > +        ret = bdrv_node_check_perm(bs, q, cumulative_perms,
> > > +                                   cumulative_shared_perms,
> > > +                                   ignore_children, errp);
> > 
> > We use the original ignore_children for every node in the sorted list.
> > The old code extends it with all nodes in the path to each node.
> > 
> > For the bdrv_check_update_perm() call that is now replaced with
> > bdrv_check_parents_compliance(), I think this was necessary because
> > bdrv_check_update_perm() always assumes adding a new edge, so if you
> > update one instead of adding it, you have to ignore it so that it can't
> > conflict with itself. This isn't necessary any more now because we just
> > update and then check for consistency.
> > 
> > For passing to bdrv_node_check_perm() it doesn't make a difference
> > anyway because the parameter is now unused (and should probably be
> > removed).
> 
> ignore_children will be dropped in [27]. For now it is still needed
> for bdrv_replace_node_common

In bdrv_node_check_perm(), it's already unused after this patch. But
fair enough.

Kevin


Re: [PATCH v2 15/36] block: use topological sort for permission update
Posted by Vladimir Sementsov-Ogievskiy 5 years ago
28.01.2021 20:13, Kevin Wolf wrote:
> Am 28.01.2021 um 10:34 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 27.01.2021 21:38, Kevin Wolf wrote:
>>> Am 27.11.2020 um 15:45 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> -static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
>>>> -                           uint64_t cumulative_perms,
>>>> -                           uint64_t cumulative_shared_perms,
>>>> -                           GSList *ignore_children, Error **errp)
>>>> +static int bdrv_node_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
>>>> +                                uint64_t cumulative_perms,
>>>> +                                uint64_t cumulative_shared_perms,
>>>> +                                GSList *ignore_children, Error **errp)
>>>>    {
>>>>        BlockDriver *drv = bs->drv;
>>>>        BdrvChild *c;
>>>> @@ -2166,21 +2193,43 @@ static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
>>>>        /* Check all children */
>>>>        QLIST_FOREACH(c, &bs->children, next) {
>>>>            uint64_t cur_perm, cur_shared;
>>>> -        GSList *cur_ignore_children;
>>>>            bdrv_child_perm(bs, c->bs, c, c->role, q,
>>>>                            cumulative_perms, cumulative_shared_perms,
>>>>                            &cur_perm, &cur_shared);
>>>> +        bdrv_child_set_perm_safe(c, cur_perm, cur_shared, NULL);
>>>
>>> This "added" line is actually old code. What is removed here is the
>>> recursive call of bdrv_check_update_perm(). This is what the code below
>>> will have to replace.
>>
>> yes, we'll use explicit loop instead of recursion
>>
>>>
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
>>>> +                           uint64_t cumulative_perms,
>>>> +                           uint64_t cumulative_shared_perms,
>>>> +                           GSList *ignore_children, Error **errp)
>>>> +{
>>>> +    int ret;
>>>> +    BlockDriverState *root = bs;
>>>> +    g_autoptr(GSList) list = bdrv_topological_dfs(NULL, NULL, root);
>>>> +
>>>> +    for ( ; list; list = list->next) {
>>>> +        bs = list->data;
>>>> +
>>>> +        if (bs != root) {
>>>> +            if (!bdrv_check_parents_compliance(bs, ignore_children, errp)) {
>>>> +                return -EINVAL;
>>>> +            }
>>>
>>> At this point bs still had the old permissions, but we don't access
>>> them. As we're going in topological order, the parents have already been
>>> updated if they were a child covered in bdrv_node_check_perm(), so we're
>>> checking the relevant values. Good.
>>>
>>> What about the root node? If I understand correctly, the parents of the
>>> root nodes wouldn't have been checked in the old code. In the new state,
>>> the parent BdrvChild already has to contain the new permission.
>>>
>>> In bdrv_refresh_perms(), we already check parent conflicts, so no change
>>> for all callers going through it. Good.
>>>
>>> bdrv_reopen_multiple() is less obvious. It passes permissions from the
>>> BDRVReopenState, without applying the permissions first.
>>
>> It will be changed in the series
>>
>>> Do we check the
>>> old parent permissions instead of the new state here?
>>
>> We use given (new) cumulative permissions for bs, and recalculate
>> permissions for bs subtree.
> 
> Where do we actually set them? I would expect a
> bdrv_child_set_perm_safe() call somewhere, but I can't see it in the
> call path from bdrv_reopen_multiple().

You mean parent BdrvChild objects? Then this question applies as well to pre-patch code.

So, we just call bdrv_check_perm() for bs in bdrv_reopen_multiple.. I think the answer is like this:

if state->perm and state->shared_perm are different from actual cumulative permissions (before reopne), then we must
have the parent(s) of the node in same bs_queue. Then, corresponding children are updated as part
of another bdrv_check_perm call from same loop in bdrv_reopen_multiple().

Let's check how state->perm and state->shared_perm are set:

bdrv_reopen_queue_child()

     /* This needs to be overwritten in bdrv_reopen_prepare() */
     bs_entry->state.perm = UINT64_MAX;
     bs_entry->state.shared_perm = 0;


...
  
bdrv_reopen_prepare()

        bdrv_reopen_perm(queue, reopen_state->bs,
                      &reopen_state->perm, &reopen_state->shared_perm);

and bdrv_reopen_perm() calculate cumulative permissions, taking permissions from the queue, for parents which exists in queue.

Not sure how much it correct, keeping in mind that we may look at a node in queue, for which bdrv_reopen_perm was not yet called, but the idea is clean.

> 
>> It follows old behavior. The only thing is changed that pre-patch we
>> do DFS recursion starting from bs (and probably visit some nodes
>> several times), after-patch we first do topological sort of bs subtree
>> and go through the list. The order of nodes is better and we visit
>> each node once.
> 
> It's not the only thing that changes. Maybe this is what makes the patch
> hard to understand, because it seems to do two steps at once:
> 
> 1. Change the order in which nodes are processed
> 
> 2. Replace bdrv_check_update_perm() with bdrv_check_parents_compliance()

hmm, yes. But we do bdrv_check_parents_compliance() only for nodes inside subtree, for all except root.
So, for them we have updated permissions.

> 
> In step 2, the point I mentioned above is important (new permissions
> must already be set in the BdrvChild objects).
> 
> The switch to bdrv_check_parents_compliance() also means that error
> messages become a bit worse because we don't know any more which of the
> conflicting nodes is the new one, so we can't provide two different
> messages any more. This is probably unavoidable, though.
> 
>>>
>>>> +            bdrv_get_cumulative_perm(bs, &cumulative_perms,
>>>> +                                     &cumulative_shared_perms);
>>>> +        }
>>>> -        cur_ignore_children = g_slist_prepend(g_slist_copy(ignore_children), c);
>>>> -        ret = bdrv_check_update_perm(c->bs, q, cur_perm, cur_shared,
>>>> -                                     cur_ignore_children, errp);
>>>> -        g_slist_free(cur_ignore_children);
>>>> +        ret = bdrv_node_check_perm(bs, q, cumulative_perms,
>>>> +                                   cumulative_shared_perms,
>>>> +                                   ignore_children, errp);
>>>
>>> We use the original ignore_children for every node in the sorted list.
>>> The old code extends it with all nodes in the path to each node.
>>>
>>> For the bdrv_check_update_perm() call that is now replaced with
>>> bdrv_check_parents_compliance(), I think this was necessary because
>>> bdrv_check_update_perm() always assumes adding a new edge, so if you
>>> update one instead of adding it, you have to ignore it so that it can't
>>> conflict with itself. This isn't necessary any more now because we just
>>> update and then check for consistency.
>>>
>>> For passing to bdrv_node_check_perm() it doesn't make a difference
>>> anyway because the parameter is now unused (and should probably be
>>> removed).
>>
>> ignore_children will be dropped in [27]. For now it is still needed
>> for bdrv_replace_node_common
> 
> In bdrv_node_check_perm(), it's already unused after this patch. But
> fair enough.
> 
> Kevin
> 


-- 
Best regards,
Vladimir

Re: [PATCH v2 15/36] block: use topological sort for permission update
Posted by Kevin Wolf 5 years ago
Am 28.01.2021 um 19:04 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 28.01.2021 20:13, Kevin Wolf wrote:
> > Am 28.01.2021 um 10:34 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > 27.01.2021 21:38, Kevin Wolf wrote:
> > > > Am 27.11.2020 um 15:45 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > > -static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
> > > > > -                           uint64_t cumulative_perms,
> > > > > -                           uint64_t cumulative_shared_perms,
> > > > > -                           GSList *ignore_children, Error **errp)
> > > > > +static int bdrv_node_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
> > > > > +                                uint64_t cumulative_perms,
> > > > > +                                uint64_t cumulative_shared_perms,
> > > > > +                                GSList *ignore_children, Error **errp)
> > > > >    {
> > > > >        BlockDriver *drv = bs->drv;
> > > > >        BdrvChild *c;
> > > > > @@ -2166,21 +2193,43 @@ static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
> > > > >        /* Check all children */
> > > > >        QLIST_FOREACH(c, &bs->children, next) {
> > > > >            uint64_t cur_perm, cur_shared;
> > > > > -        GSList *cur_ignore_children;
> > > > >            bdrv_child_perm(bs, c->bs, c, c->role, q,
> > > > >                            cumulative_perms, cumulative_shared_perms,
> > > > >                            &cur_perm, &cur_shared);
> > > > > +        bdrv_child_set_perm_safe(c, cur_perm, cur_shared, NULL);
> > > > 
> > > > This "added" line is actually old code. What is removed here is the
> > > > recursive call of bdrv_check_update_perm(). This is what the code below
> > > > will have to replace.
> > > 
> > > yes, we'll use explicit loop instead of recursion
> > > 
> > > > 
> > > > > +    }
> > > > > +
> > > > > +    return 0;
> > > > > +}
> > > > > +
> > > > > +static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
> > > > > +                           uint64_t cumulative_perms,
> > > > > +                           uint64_t cumulative_shared_perms,
> > > > > +                           GSList *ignore_children, Error **errp)
> > > > > +{
> > > > > +    int ret;
> > > > > +    BlockDriverState *root = bs;
> > > > > +    g_autoptr(GSList) list = bdrv_topological_dfs(NULL, NULL, root);
> > > > > +
> > > > > +    for ( ; list; list = list->next) {
> > > > > +        bs = list->data;
> > > > > +
> > > > > +        if (bs != root) {
> > > > > +            if (!bdrv_check_parents_compliance(bs, ignore_children, errp)) {
> > > > > +                return -EINVAL;
> > > > > +            }
> > > > 
> > > > At this point bs still had the old permissions, but we don't access
> > > > them. As we're going in topological order, the parents have already been
> > > > updated if they were a child covered in bdrv_node_check_perm(), so we're
> > > > checking the relevant values. Good.
> > > > 
> > > > What about the root node? If I understand correctly, the parents of the
> > > > root nodes wouldn't have been checked in the old code. In the new state,
> > > > the parent BdrvChild already has to contain the new permission.
> > > > 
> > > > In bdrv_refresh_perms(), we already check parent conflicts, so no change
> > > > for all callers going through it. Good.
> > > > 
> > > > bdrv_reopen_multiple() is less obvious. It passes permissions from the
> > > > BDRVReopenState, without applying the permissions first.
> > > 
> > > It will be changed in the series
> > > 
> > > > Do we check the
> > > > old parent permissions instead of the new state here?
> > > 
> > > We use given (new) cumulative permissions for bs, and recalculate
> > > permissions for bs subtree.
> > 
> > Where do we actually set them? I would expect a
> > bdrv_child_set_perm_safe() call somewhere, but I can't see it in the
> > call path from bdrv_reopen_multiple().
> 
> You mean parent BdrvChild objects? Then this question applies as well
> to pre-patch code.

I don't think so. The pre-patch code doesn't rely on the permissions
already being set in the BdrvChild object, but it gets them passed in
parameters. Changing the graph first and relying on the information in
BdrvChild is the new approach that you're introducing.

> So, we just call bdrv_check_perm() for bs in bdrv_reopen_multiple.. I
> think the answer is like this:
> 
> if state->perm and state->shared_perm are different from actual
> cumulative permissions (before reopne), then we must have the
> parent(s) of the node in same bs_queue. Then, corresponding children
> are updated as part of another bdrv_check_perm call from same loop in
> bdrv_reopen_multiple().
> 
> Let's check how state->perm and state->shared_perm are set:
> 
> bdrv_reopen_queue_child()
> 
>     /* This needs to be overwritten in bdrv_reopen_prepare() */
>     bs_entry->state.perm = UINT64_MAX;
>     bs_entry->state.shared_perm = 0;
> 
> 
> ...
> bdrv_reopen_prepare()
> 
>        bdrv_reopen_perm(queue, reopen_state->bs,
>                      &reopen_state->perm, &reopen_state->shared_perm);
> 
> and bdrv_reopen_perm() calculate cumulative permissions, taking
> permissions from the queue, for parents which exists in queue.

Right, but it stores the new permissions in reopen_state, not in the
BdrvChild objects that this patch is looking it. Or am I missing
something?

> Not sure how much it correct, keeping in mind that we may look at a
> node in queue, for which bdrv_reopen_perm was not yet called, but the
> idea is clean.

I don't think the above code can work correctly without something
actually updating the BdrvChild first.

> > > It follows old behavior. The only thing is changed that pre-patch we
> > > do DFS recursion starting from bs (and probably visit some nodes
> > > several times), after-patch we first do topological sort of bs subtree
> > > and go through the list. The order of nodes is better and we visit
> > > each node once.
> > 
> > It's not the only thing that changes. Maybe this is what makes the patch
> > hard to understand, because it seems to do two steps at once:
> > 
> > 1. Change the order in which nodes are processed
> > 
> > 2. Replace bdrv_check_update_perm() with bdrv_check_parents_compliance()
> 
> hmm, yes. But we do bdrv_check_parents_compliance() only for nodes
> inside subtree, for all except root.  So, for them we have updated
> permissions.

Ah! This might be the missing piece that makes it safe.

Maybe worth a comment?

Kevin


Re: [PATCH v2 15/36] block: use topological sort for permission update
Posted by Vladimir Sementsov-Ogievskiy 5 years ago
03.02.2021 21:38, Kevin Wolf wrote:
> Am 28.01.2021 um 19:04 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 28.01.2021 20:13, Kevin Wolf wrote:
>>> Am 28.01.2021 um 10:34 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> 27.01.2021 21:38, Kevin Wolf wrote:
>>>>> Am 27.11.2020 um 15:45 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>>> -static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
>>>>>> -                           uint64_t cumulative_perms,
>>>>>> -                           uint64_t cumulative_shared_perms,
>>>>>> -                           GSList *ignore_children, Error **errp)
>>>>>> +static int bdrv_node_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
>>>>>> +                                uint64_t cumulative_perms,
>>>>>> +                                uint64_t cumulative_shared_perms,
>>>>>> +                                GSList *ignore_children, Error **errp)
>>>>>>     {
>>>>>>         BlockDriver *drv = bs->drv;
>>>>>>         BdrvChild *c;
>>>>>> @@ -2166,21 +2193,43 @@ static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
>>>>>>         /* Check all children */
>>>>>>         QLIST_FOREACH(c, &bs->children, next) {
>>>>>>             uint64_t cur_perm, cur_shared;
>>>>>> -        GSList *cur_ignore_children;
>>>>>>             bdrv_child_perm(bs, c->bs, c, c->role, q,
>>>>>>                             cumulative_perms, cumulative_shared_perms,
>>>>>>                             &cur_perm, &cur_shared);
>>>>>> +        bdrv_child_set_perm_safe(c, cur_perm, cur_shared, NULL);
>>>>>
>>>>> This "added" line is actually old code. What is removed here is the
>>>>> recursive call of bdrv_check_update_perm(). This is what the code below
>>>>> will have to replace.
>>>>
>>>> yes, we'll use explicit loop instead of recursion
>>>>
>>>>>
>>>>>> +    }
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
>>>>>> +                           uint64_t cumulative_perms,
>>>>>> +                           uint64_t cumulative_shared_perms,
>>>>>> +                           GSList *ignore_children, Error **errp)
>>>>>> +{
>>>>>> +    int ret;
>>>>>> +    BlockDriverState *root = bs;
>>>>>> +    g_autoptr(GSList) list = bdrv_topological_dfs(NULL, NULL, root);
>>>>>> +
>>>>>> +    for ( ; list; list = list->next) {
>>>>>> +        bs = list->data;
>>>>>> +
>>>>>> +        if (bs != root) {
>>>>>> +            if (!bdrv_check_parents_compliance(bs, ignore_children, errp)) {
>>>>>> +                return -EINVAL;
>>>>>> +            }
>>>>>
>>>>> At this point bs still had the old permissions, but we don't access
>>>>> them. As we're going in topological order, the parents have already been
>>>>> updated if they were a child covered in bdrv_node_check_perm(), so we're
>>>>> checking the relevant values. Good.
>>>>>
>>>>> What about the root node? If I understand correctly, the parents of the
>>>>> root nodes wouldn't have been checked in the old code. In the new state,
>>>>> the parent BdrvChild already has to contain the new permission.
>>>>>
>>>>> In bdrv_refresh_perms(), we already check parent conflicts, so no change
>>>>> for all callers going through it. Good.
>>>>>
>>>>> bdrv_reopen_multiple() is less obvious. It passes permissions from the
>>>>> BDRVReopenState, without applying the permissions first.
>>>>
>>>> It will be changed in the series
>>>>
>>>>> Do we check the
>>>>> old parent permissions instead of the new state here?
>>>>
>>>> We use given (new) cumulative permissions for bs, and recalculate
>>>> permissions for bs subtree.
>>>
>>> Where do we actually set them? I would expect a
>>> bdrv_child_set_perm_safe() call somewhere, but I can't see it in the
>>> call path from bdrv_reopen_multiple().
>>
>> You mean parent BdrvChild objects? Then this question applies as well
>> to pre-patch code.
> 
> I don't think so. The pre-patch code doesn't rely on the permissions
> already being set in the BdrvChild object, but it gets them passed in
> parameters. Changing the graph first and relying on the information in
> BdrvChild is the new approach that you're introducing.
> 
>> So, we just call bdrv_check_perm() for bs in bdrv_reopen_multiple.. I
>> think the answer is like this:
>>
>> if state->perm and state->shared_perm are different from actual
>> cumulative permissions (before reopne), then we must have the
>> parent(s) of the node in same bs_queue. Then, corresponding children
>> are updated as part of another bdrv_check_perm call from same loop in
>> bdrv_reopen_multiple().
>>
>> Let's check how state->perm and state->shared_perm are set:
>>
>> bdrv_reopen_queue_child()
>>
>>      /* This needs to be overwritten in bdrv_reopen_prepare() */
>>      bs_entry->state.perm = UINT64_MAX;
>>      bs_entry->state.shared_perm = 0;
>>
>>
>> ...
>> bdrv_reopen_prepare()
>>
>>         bdrv_reopen_perm(queue, reopen_state->bs,
>>                       &reopen_state->perm, &reopen_state->shared_perm);
>>
>> and bdrv_reopen_perm() calculate cumulative permissions, taking
>> permissions from the queue, for parents which exists in queue.
> 
> Right, but it stores the new permissions in reopen_state, not in the
> BdrvChild objects that this patch is looking it. Or am I missing
> something?
> 
>> Not sure how much it correct, keeping in mind that we may look at a
>> node in queue, for which bdrv_reopen_perm was not yet called, but the
>> idea is clean.
> 
> I don't think the above code can work correctly without something
> actually updating the BdrvChild first.
> 
>>>> It follows old behavior. The only thing is changed that pre-patch we
>>>> do DFS recursion starting from bs (and probably visit some nodes
>>>> several times), after-patch we first do topological sort of bs subtree
>>>> and go through the list. The order of nodes is better and we visit
>>>> each node once.
>>>
>>> It's not the only thing that changes. Maybe this is what makes the patch
>>> hard to understand, because it seems to do two steps at once:
>>>
>>> 1. Change the order in which nodes are processed
>>>
>>> 2. Replace bdrv_check_update_perm() with bdrv_check_parents_compliance()
>>
>> hmm, yes. But we do bdrv_check_parents_compliance() only for nodes
>> inside subtree, for all except root.  So, for them we have updated
>> permissions.
> 
> Ah! This might be the missing piece that makes it safe.
> 
> Maybe worth a comment?
> 

Will add

-- 
Best regards,
Vladimir

Re: [PATCH v2 15/36] block: use topological sort for permission update
Posted by Vladimir Sementsov-Ogievskiy 4 years, 11 months ago
03.02.2021 21:38, Kevin Wolf wrote:
> Am 28.01.2021 um 19:04 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 28.01.2021 20:13, Kevin Wolf wrote:
>>> Am 28.01.2021 um 10:34 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> 27.01.2021 21:38, Kevin Wolf wrote:
>>>>> Am 27.11.2020 um 15:45 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>>> -static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
>>>>>> -                           uint64_t cumulative_perms,
>>>>>> -                           uint64_t cumulative_shared_perms,
>>>>>> -                           GSList *ignore_children, Error **errp)
>>>>>> +static int bdrv_node_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
>>>>>> +                                uint64_t cumulative_perms,
>>>>>> +                                uint64_t cumulative_shared_perms,
>>>>>> +                                GSList *ignore_children, Error **errp)
>>>>>>     {
>>>>>>         BlockDriver *drv = bs->drv;
>>>>>>         BdrvChild *c;
>>>>>> @@ -2166,21 +2193,43 @@ static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
>>>>>>         /* Check all children */
>>>>>>         QLIST_FOREACH(c, &bs->children, next) {
>>>>>>             uint64_t cur_perm, cur_shared;
>>>>>> -        GSList *cur_ignore_children;
>>>>>>             bdrv_child_perm(bs, c->bs, c, c->role, q,
>>>>>>                             cumulative_perms, cumulative_shared_perms,
>>>>>>                             &cur_perm, &cur_shared);
>>>>>> +        bdrv_child_set_perm_safe(c, cur_perm, cur_shared, NULL);
>>>>>
>>>>> This "added" line is actually old code. What is removed here is the
>>>>> recursive call of bdrv_check_update_perm(). This is what the code below
>>>>> will have to replace.
>>>>
>>>> yes, we'll use explicit loop instead of recursion
>>>>
>>>>>
>>>>>> +    }
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
>>>>>> +                           uint64_t cumulative_perms,
>>>>>> +                           uint64_t cumulative_shared_perms,
>>>>>> +                           GSList *ignore_children, Error **errp)
>>>>>> +{
>>>>>> +    int ret;
>>>>>> +    BlockDriverState *root = bs;
>>>>>> +    g_autoptr(GSList) list = bdrv_topological_dfs(NULL, NULL, root);
>>>>>> +
>>>>>> +    for ( ; list; list = list->next) {
>>>>>> +        bs = list->data;
>>>>>> +
>>>>>> +        if (bs != root) {
>>>>>> +            if (!bdrv_check_parents_compliance(bs, ignore_children, errp)) {
>>>>>> +                return -EINVAL;
>>>>>> +            }
>>>>>
>>>>> At this point bs still had the old permissions, but we don't access
>>>>> them. As we're going in topological order, the parents have already been
>>>>> updated if they were a child covered in bdrv_node_check_perm(), so we're
>>>>> checking the relevant values. Good.
>>>>>
>>>>> What about the root node? If I understand correctly, the parents of the
>>>>> root nodes wouldn't have been checked in the old code. In the new state,
>>>>> the parent BdrvChild already has to contain the new permission.
>>>>>
>>>>> In bdrv_refresh_perms(), we already check parent conflicts, so no change
>>>>> for all callers going through it. Good.
>>>>>
>>>>> bdrv_reopen_multiple() is less obvious. It passes permissions from the
>>>>> BDRVReopenState, without applying the permissions first.
>>>>
>>>> It will be changed in the series
>>>>
>>>>> Do we check the
>>>>> old parent permissions instead of the new state here?
>>>>
>>>> We use given (new) cumulative permissions for bs, and recalculate
>>>> permissions for bs subtree.
>>>
>>> Where do we actually set them? I would expect a
>>> bdrv_child_set_perm_safe() call somewhere, but I can't see it in the
>>> call path from bdrv_reopen_multiple().
>>
>> You mean parent BdrvChild objects? Then this question applies as well
>> to pre-patch code.
> 
> I don't think so. The pre-patch code doesn't rely on the permissions
> already being set in the BdrvChild object, but it gets them passed in
> parameters. Changing the graph first and relying on the information in
> BdrvChild is the new approach that you're introducing.

New code still pass permissions as parameters for root node. And only
inside subtree we rely on updated parents. But that's correct due to
topological order of updating.


OK, that's incorrect for the following case: when one of the parents (X)
of inner node in bs subtree IS NOT in the bs subtree but IS in reopen queue.
And we'll use wrong permission of X. Still:

1. It's preexisting. bdrv_check_update_perm() doesn't take reopen queue
in mind when calculate cumulative permissions (and ignore_children doesn't
help for the described case

2. We can hope that on next permission update, started from node X, permissions
will become more correct

3. At the end of series permission calculation in bdrv_reopen_multiple is
rewritten anyway.


> 
>> So, we just call bdrv_check_perm() for bs in bdrv_reopen_multiple.. I
>> think the answer is like this:
>>
>> if state->perm and state->shared_perm are different from actual
>> cumulative permissions (before reopne), then we must have the
>> parent(s) of the node in same bs_queue. Then, corresponding children
>> are updated as part of another bdrv_check_perm call from same loop in
>> bdrv_reopen_multiple().
>>
>> Let's check how state->perm and state->shared_perm are set:
>>
>> bdrv_reopen_queue_child()
>>
>>      /* This needs to be overwritten in bdrv_reopen_prepare() */
>>      bs_entry->state.perm = UINT64_MAX;
>>      bs_entry->state.shared_perm = 0;
>>
>>
>> ...
>> bdrv_reopen_prepare()
>>
>>         bdrv_reopen_perm(queue, reopen_state->bs,
>>                       &reopen_state->perm, &reopen_state->shared_perm);
>>
>> and bdrv_reopen_perm() calculate cumulative permissions, taking
>> permissions from the queue, for parents which exists in queue.
> 
> Right, but it stores the new permissions in reopen_state, not in the
> BdrvChild objects that this patch is looking it. Or am I missing
> something?
> 
>> Not sure how much it correct, keeping in mind that we may look at a
>> node in queue, for which bdrv_reopen_perm was not yet called, but the
>> idea is clean.
> 
> I don't think the above code can work correctly without something
> actually updating the BdrvChild first.
> 
>>>> It follows old behavior. The only thing is changed that pre-patch we
>>>> do DFS recursion starting from bs (and probably visit some nodes
>>>> several times), after-patch we first do topological sort of bs subtree
>>>> and go through the list. The order of nodes is better and we visit
>>>> each node once.
>>>
>>> It's not the only thing that changes. Maybe this is what makes the patch
>>> hard to understand, because it seems to do two steps at once:
>>>
>>> 1. Change the order in which nodes are processed
>>>
>>> 2. Replace bdrv_check_update_perm() with bdrv_check_parents_compliance()
>>
>> hmm, yes. But we do bdrv_check_parents_compliance() only for nodes
>> inside subtree, for all except root.  So, for them we have updated
>> permissions.
> 
> Ah! This might be the missing piece that makes it safe.
> 
> Maybe worth a comment?
> 
> Kevin
> 


-- 
Best regards,
Vladimir

Re: [PATCH v2 15/36] block: use topological sort for permission update
Posted by Kevin Wolf 4 years, 11 months ago
Am 10.03.2021 um 12:08 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 03.02.2021 21:38, Kevin Wolf wrote:
> > Am 28.01.2021 um 19:04 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > 28.01.2021 20:13, Kevin Wolf wrote:
> > > > Am 28.01.2021 um 10:34 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > > 27.01.2021 21:38, Kevin Wolf wrote:
> > > > > > Am 27.11.2020 um 15:45 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > > > > -static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
> > > > > > > -                           uint64_t cumulative_perms,
> > > > > > > -                           uint64_t cumulative_shared_perms,
> > > > > > > -                           GSList *ignore_children, Error **errp)
> > > > > > > +static int bdrv_node_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
> > > > > > > +                                uint64_t cumulative_perms,
> > > > > > > +                                uint64_t cumulative_shared_perms,
> > > > > > > +                                GSList *ignore_children, Error **errp)
> > > > > > >     {
> > > > > > >         BlockDriver *drv = bs->drv;
> > > > > > >         BdrvChild *c;
> > > > > > > @@ -2166,21 +2193,43 @@ static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
> > > > > > >         /* Check all children */
> > > > > > >         QLIST_FOREACH(c, &bs->children, next) {
> > > > > > >             uint64_t cur_perm, cur_shared;
> > > > > > > -        GSList *cur_ignore_children;
> > > > > > >             bdrv_child_perm(bs, c->bs, c, c->role, q,
> > > > > > >                             cumulative_perms, cumulative_shared_perms,
> > > > > > >                             &cur_perm, &cur_shared);
> > > > > > > +        bdrv_child_set_perm_safe(c, cur_perm, cur_shared, NULL);
> > > > > > 
> > > > > > This "added" line is actually old code. What is removed here is the
> > > > > > recursive call of bdrv_check_update_perm(). This is what the code below
> > > > > > will have to replace.
> > > > > 
> > > > > yes, we'll use explicit loop instead of recursion
> > > > > 
> > > > > > 
> > > > > > > +    }
> > > > > > > +
> > > > > > > +    return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static int bdrv_check_perm(BlockDriverState *bs, BlockReopenQueue *q,
> > > > > > > +                           uint64_t cumulative_perms,
> > > > > > > +                           uint64_t cumulative_shared_perms,
> > > > > > > +                           GSList *ignore_children, Error **errp)
> > > > > > > +{
> > > > > > > +    int ret;
> > > > > > > +    BlockDriverState *root = bs;
> > > > > > > +    g_autoptr(GSList) list = bdrv_topological_dfs(NULL, NULL, root);
> > > > > > > +
> > > > > > > +    for ( ; list; list = list->next) {
> > > > > > > +        bs = list->data;
> > > > > > > +
> > > > > > > +        if (bs != root) {
> > > > > > > +            if (!bdrv_check_parents_compliance(bs, ignore_children, errp)) {
> > > > > > > +                return -EINVAL;
> > > > > > > +            }
> > > > > > 
> > > > > > At this point bs still had the old permissions, but we don't access
> > > > > > them. As we're going in topological order, the parents have already been
> > > > > > updated if they were a child covered in bdrv_node_check_perm(), so we're
> > > > > > checking the relevant values. Good.
> > > > > > 
> > > > > > What about the root node? If I understand correctly, the parents of the
> > > > > > root nodes wouldn't have been checked in the old code. In the new state,
> > > > > > the parent BdrvChild already has to contain the new permission.
> > > > > > 
> > > > > > In bdrv_refresh_perms(), we already check parent conflicts, so no change
> > > > > > for all callers going through it. Good.
> > > > > > 
> > > > > > bdrv_reopen_multiple() is less obvious. It passes permissions from the
> > > > > > BDRVReopenState, without applying the permissions first.
> > > > > 
> > > > > It will be changed in the series
> > > > > 
> > > > > > Do we check the
> > > > > > old parent permissions instead of the new state here?
> > > > > 
> > > > > We use given (new) cumulative permissions for bs, and recalculate
> > > > > permissions for bs subtree.
> > > > 
> > > > Where do we actually set them? I would expect a
> > > > bdrv_child_set_perm_safe() call somewhere, but I can't see it in the
> > > > call path from bdrv_reopen_multiple().
> > > 
> > > You mean parent BdrvChild objects? Then this question applies as well
> > > to pre-patch code.
> > 
> > I don't think so. The pre-patch code doesn't rely on the permissions
> > already being set in the BdrvChild object, but it gets them passed in
> > parameters. Changing the graph first and relying on the information in
> > BdrvChild is the new approach that you're introducing.
> 
> New code still pass permissions as parameters for root node. And only
> inside subtree we rely on updated parents. But that's correct due to
> topological order of updating.
> 
> 
> OK, that's incorrect for the following case: when one of the parents (X)
> of inner node in bs subtree IS NOT in the bs subtree but IS in reopen queue.
> And we'll use wrong permission of X. Still:
> 
> 1. It's preexisting. bdrv_check_update_perm() doesn't take reopen queue
> in mind when calculate cumulative permissions (and ignore_children doesn't
> help for the described case
> 
> 2. We can hope that on next permission update, started from node X, permissions
> will become more correct
> 
> 3. At the end of series permission calculation in bdrv_reopen_multiple is
> rewritten anyway.

Yes, I think 3. is the strongest argument in favour of the patch.

Kevin