[v3] block: Fix check_to_replace_node()

[PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced

Posted by Max Reitz 6 years ago

We will need this to verify that Quorum can let one of its children be
replaced without breaking anything else.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/quorum.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/block/quorum.c b/block/quorum.c
index 59cd524502..6a7224c9e4 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -67,6 +67,13 @@ typedef struct QuorumVotes {
 
 typedef struct QuorumChild {
     BdrvChild *child;
+
+    /*
+     * If set, check whether this node can be replaced without any
+     * other parent noticing: Unshare CONSISTENT_READ, and take the
+     * WRITE permission.
+     */
+    bool to_be_replaced;
 } QuorumChild;
 
 /* the following structure holds the state of one quorum instance */
@@ -1128,6 +1135,18 @@ static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
                               uint64_t perm, uint64_t shared,
                               uint64_t *nperm, uint64_t *nshared)
 {
+    BDRVQuorumState *s = bs->opaque;
+    int child_i = -1;
+
+    if (c) {
+        for (child_i = 0; child_i < s->num_children; child_i++) {
+            if (s->children[child_i].child == c) {
+                break;
+            }
+        }
+        assert(child_i < s->num_children);
+    }
+
     *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
 
     /*
@@ -1137,6 +1156,12 @@ static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
     *nshared = (shared & (BLK_PERM_CONSISTENT_READ |
                           BLK_PERM_WRITE_UNCHANGED))
              | DEFAULT_PERM_UNCHANGED;
+
+    if (child_i >= 0 && s->children[child_i].to_be_replaced) {
+        /* Prepare for sudden data changes */
+        *nperm |= BLK_PERM_WRITE;
+        *nshared &= ~BLK_PERM_CONSISTENT_READ;
+    }
 }
 
 static const char *const quorum_strong_runtime_opts[] = {
-- 
2.24.1

Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced

Posted by Vladimir Sementsov-Ogievskiy 6 years ago

31.01.2020 0:44, Max Reitz wrote:
> We will need this to verify that Quorum can let one of its children be
> replaced without breaking anything else.
> 
> Signed-off-by: Max Reitz<mreitz@redhat.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir

Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced

Posted by Kevin Wolf 6 years ago

Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
> We will need this to verify that Quorum can let one of its children be
> replaced without breaking anything else.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/quorum.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/block/quorum.c b/block/quorum.c
> index 59cd524502..6a7224c9e4 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
>  
>  typedef struct QuorumChild {
>      BdrvChild *child;
> +
> +    /*
> +     * If set, check whether this node can be replaced without any
> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
> +     * WRITE permission.
> +     */
> +    bool to_be_replaced;

I don't understand these permission changes. How does (preparing for)
detaching a node from quorum make its content invalid? And why do we
suddenly need WRITE permissions even if the quorum node is only used
read-only?

The comment is a bit unclear, too. "check whether" implies that both
outcomes could be true, but it doesn't say what happens in either case.
Is this really "make sure that"?

Kevin

Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced

Posted by Max Reitz 6 years ago

On 05.02.20 16:38, Kevin Wolf wrote:
> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
>> We will need this to verify that Quorum can let one of its children be
>> replaced without breaking anything else.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>  block/quorum.c | 25 +++++++++++++++++++++++++
>>  1 file changed, 25 insertions(+)
>>
>> diff --git a/block/quorum.c b/block/quorum.c
>> index 59cd524502..6a7224c9e4 100644
>> --- a/block/quorum.c
>> +++ b/block/quorum.c
>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
>>  
>>  typedef struct QuorumChild {
>>      BdrvChild *child;
>> +
>> +    /*
>> +     * If set, check whether this node can be replaced without any
>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
>> +     * WRITE permission.
>> +     */
>> +    bool to_be_replaced;
> 
> I don't understand these permission changes. How does (preparing for)
> detaching a node from quorum make its content invalid?

It doesn’t, of course.  What we are preparing for is to replace it by
some other node with some other content.

> And why do we
> suddenly need WRITE permissions even if the quorum node is only used
> read-only?
> 
> The comment is a bit unclear, too. "check whether" implies that both
> outcomes could be true, but it doesn't say what happens in either case.
> Is this really "make sure that"?

I think the comment is not only unclear, it is the problem.  (Well,
maybe the code is also.)

This series is about fixing at least some things about replacing nodes
by mirroring.  The original use cases this was introduced for was to fix
broken quorum children: The other children are still intact, so you read
from the quorum node and replace the broken child (which maybe shows
invalid data, or maybe just EIO) by the fixed mirror result.

Replacing that broken node by the fixed one changes the data that’s
visible on that node.

That’s fine with quorum because that one child never influenced its
result anyway.  In fact, we know that the new child agrees with the
quorum, so it actually reduces conflict.

But if there are other parents on the node, they may disagree.  So we
need to warn them that we will disrupt consistency by replacing the node
with a potentially completely different one.  If those other parents
don’t care about consistency (CONSISTENT_READ) and don’t mind data
changes (other WRITE users), then we can freely replace the node still.

Now (assuming that this reasoning makes sense) it may seem as if the
general block layer should handle such cases; like it should unshare
CONSISTENT_READ and take WRITE.  But that isn’t true, because it calls
bdrv_recurse_can_replace() specifically to check that the some node can
be replaced by the new node without changing any visible data.  So
usually there are no such sudden data changes.

Quorum is the only node that can tolerate such data changes on its
children without changing its own visible data.  Hence it’s responsible
for ensuring that there’s noone else that minds if one of its child
nodes is replaced by something else.

(Note that this isn’t about replacing a BdrvChild, but about replacing a
BDS.  It isn’t like only quorum’s BdrvChild will be made to point
somewhere else; all BdrvChild pointers to the old node will be made to
point to the new one.)

Again assuming this makes sense, I wonder how we could condense that
into a reasonable comment.

Max

Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced

Posted by Kevin Wolf 6 years ago

Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
> On 05.02.20 16:38, Kevin Wolf wrote:
> > Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
> >> We will need this to verify that Quorum can let one of its children be
> >> replaced without breaking anything else.
> >>
> >> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >> ---
> >>  block/quorum.c | 25 +++++++++++++++++++++++++
> >>  1 file changed, 25 insertions(+)
> >>
> >> diff --git a/block/quorum.c b/block/quorum.c
> >> index 59cd524502..6a7224c9e4 100644
> >> --- a/block/quorum.c
> >> +++ b/block/quorum.c
> >> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
> >>  
> >>  typedef struct QuorumChild {
> >>      BdrvChild *child;
> >> +
> >> +    /*
> >> +     * If set, check whether this node can be replaced without any
> >> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
> >> +     * WRITE permission.
> >> +     */
> >> +    bool to_be_replaced;
> > 
> > I don't understand these permission changes. How does (preparing for)
> > detaching a node from quorum make its content invalid?
> 
> It doesn’t, of course.  What we are preparing for is to replace it by
> some other node with some other content.
> 
> > And why do we
> > suddenly need WRITE permissions even if the quorum node is only used
> > read-only?
> > 
> > The comment is a bit unclear, too. "check whether" implies that both
> > outcomes could be true, but it doesn't say what happens in either case.
> > Is this really "make sure that"?
> 
> I think the comment is not only unclear, it is the problem.  (Well,
> maybe the code is also.)
> 
> This series is about fixing at least some things about replacing nodes
> by mirroring.  The original use cases this was introduced for was to fix
> broken quorum children: The other children are still intact, so you read
> from the quorum node and replace the broken child (which maybe shows
> invalid data, or maybe just EIO) by the fixed mirror result.
> 
> Replacing that broken node by the fixed one changes the data that’s
> visible on that node.

Hm, yes, that's true. But I wonder if this is really something that the
permission system must catch. Like other graph manipulations, it's
essentially the user saying "trust me, I know what I'm doing, this node
makes sense in this place".

Because if you assume that the user could add a node with unsuitable
content and you want to prevent this, where do we stop?
blockdev-snapshot can insert a non-empty overlay, which would result in
visible data change. Should we therefore only allow snapshots when
shared writes are allowed? This doesn't work obviously.

So I'm inclined to say that this is the user's responsibility and we
don't have to jump through hoops to prevent every possible way that the
user could mess up. (Which often also result in preventing legitimate
cases like here a quorum of read-only nodes.)

Kevin

Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced

Posted by Max Reitz 6 years ago

On 06.02.20 15:58, Kevin Wolf wrote:
> Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
>> On 05.02.20 16:38, Kevin Wolf wrote:
>>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
>>>> We will need this to verify that Quorum can let one of its children be
>>>> replaced without breaking anything else.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>  block/quorum.c | 25 +++++++++++++++++++++++++
>>>>  1 file changed, 25 insertions(+)
>>>>
>>>> diff --git a/block/quorum.c b/block/quorum.c
>>>> index 59cd524502..6a7224c9e4 100644
>>>> --- a/block/quorum.c
>>>> +++ b/block/quorum.c
>>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
>>>>  
>>>>  typedef struct QuorumChild {
>>>>      BdrvChild *child;
>>>> +
>>>> +    /*
>>>> +     * If set, check whether this node can be replaced without any
>>>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
>>>> +     * WRITE permission.
>>>> +     */
>>>> +    bool to_be_replaced;
>>>
>>> I don't understand these permission changes. How does (preparing for)
>>> detaching a node from quorum make its content invalid?
>>
>> It doesn’t, of course.  What we are preparing for is to replace it by
>> some other node with some other content.
>>
>>> And why do we
>>> suddenly need WRITE permissions even if the quorum node is only used
>>> read-only?
>>>
>>> The comment is a bit unclear, too. "check whether" implies that both
>>> outcomes could be true, but it doesn't say what happens in either case.
>>> Is this really "make sure that"?
>>
>> I think the comment is not only unclear, it is the problem.  (Well,
>> maybe the code is also.)
>>
>> This series is about fixing at least some things about replacing nodes
>> by mirroring.  The original use cases this was introduced for was to fix
>> broken quorum children: The other children are still intact, so you read
>> from the quorum node and replace the broken child (which maybe shows
>> invalid data, or maybe just EIO) by the fixed mirror result.
>>
>> Replacing that broken node by the fixed one changes the data that’s
>> visible on that node.
> 
> Hm, yes, that's true. But I wonder if this is really something that the
> permission system must catch. Like other graph manipulations, it's
> essentially the user saying "trust me, I know what I'm doing, this node
> makes sense in this place".
> 
> Because if you assume that the user could add a node with unsuitable
> content and you want to prevent this, where do we stop?
> blockdev-snapshot can insert a non-empty overlay, which would result in
> visible data change. Should we therefore only allow snapshots when
> shared writes are allowed? This doesn't work obviously.
> 
> So I'm inclined to say that this is the user's responsibility and we
> don't have to jump through hoops to prevent every possible way that the
> user could mess up. (Which often also result in preventing legitimate
> cases like here a quorum of read-only nodes.)

Well, if you ask the question “where do we stop”, we also have to ask
the question “where do we start”.  If we say the user knows what they’re
doing, we might as well drop the whole can_replace infrastructure
altogether and just assume that you can replace any node by anything.

If the WRITE permission is the problem, then I suppose we can drop that.
 Unsharing CONSISTENT_READ is bad enough that it effectively deters all
other parents anyway.

Max

Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced

Posted by Kevin Wolf 6 years ago

Am 06.02.2020 um 16:21 hat Max Reitz geschrieben:
> On 06.02.20 15:58, Kevin Wolf wrote:
> > Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
> >> On 05.02.20 16:38, Kevin Wolf wrote:
> >>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
> >>>> We will need this to verify that Quorum can let one of its children be
> >>>> replaced without breaking anything else.
> >>>>
> >>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>> ---
> >>>>  block/quorum.c | 25 +++++++++++++++++++++++++
> >>>>  1 file changed, 25 insertions(+)
> >>>>
> >>>> diff --git a/block/quorum.c b/block/quorum.c
> >>>> index 59cd524502..6a7224c9e4 100644
> >>>> --- a/block/quorum.c
> >>>> +++ b/block/quorum.c
> >>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
> >>>>  
> >>>>  typedef struct QuorumChild {
> >>>>      BdrvChild *child;
> >>>> +
> >>>> +    /*
> >>>> +     * If set, check whether this node can be replaced without any
> >>>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
> >>>> +     * WRITE permission.
> >>>> +     */
> >>>> +    bool to_be_replaced;
> >>>
> >>> I don't understand these permission changes. How does (preparing for)
> >>> detaching a node from quorum make its content invalid?
> >>
> >> It doesn’t, of course.  What we are preparing for is to replace it by
> >> some other node with some other content.
> >>
> >>> And why do we
> >>> suddenly need WRITE permissions even if the quorum node is only used
> >>> read-only?
> >>>
> >>> The comment is a bit unclear, too. "check whether" implies that both
> >>> outcomes could be true, but it doesn't say what happens in either case.
> >>> Is this really "make sure that"?
> >>
> >> I think the comment is not only unclear, it is the problem.  (Well,
> >> maybe the code is also.)
> >>
> >> This series is about fixing at least some things about replacing nodes
> >> by mirroring.  The original use cases this was introduced for was to fix
> >> broken quorum children: The other children are still intact, so you read
> >> from the quorum node and replace the broken child (which maybe shows
> >> invalid data, or maybe just EIO) by the fixed mirror result.
> >>
> >> Replacing that broken node by the fixed one changes the data that’s
> >> visible on that node.
> > 
> > Hm, yes, that's true. But I wonder if this is really something that the
> > permission system must catch. Like other graph manipulations, it's
> > essentially the user saying "trust me, I know what I'm doing, this node
> > makes sense in this place".
> > 
> > Because if you assume that the user could add a node with unsuitable
> > content and you want to prevent this, where do we stop?
> > blockdev-snapshot can insert a non-empty overlay, which would result in
> > visible data change. Should we therefore only allow snapshots when
> > shared writes are allowed? This doesn't work obviously.
> > 
> > So I'm inclined to say that this is the user's responsibility and we
> > don't have to jump through hoops to prevent every possible way that the
> > user could mess up. (Which often also result in preventing legitimate
> > cases like here a quorum of read-only nodes.)
> 
> Well, if you ask the question “where do we stop”, we also have to ask
> the question “where do we start”.  If we say the user knows what they’re
> doing, we might as well drop the whole can_replace infrastructure
> altogether and just assume that you can replace any node by anything.

Well, I don't actually know if that would be completely unreasonable.
The idea was obviously to keep graph changes restricted to very specific
cases to avoid nasty surprises like triggering latent bugs. Meanwhile we
have quite a few more operations that allow changing the graph.

So if preventing some cases gives us headaches and is probably more work
than dealing with any bugs they might reveal, maybe preventing them is
wrong.

I'm just afraid that we might be overengineering this and waste time on
things that we don't actually get much use from.

> If the WRITE permission is the problem, then I suppose we can drop that.
>  Unsharing CONSISTENT_READ is bad enough that it effectively deters all
> other parents anyway.

WRITE is probably the more practical problem, though it's technically
the correct one to take.

CONSISTENT_READ is already a problem in theory because replacing a child
node with different content doesn't even match its definition:

    /**
     * A user that has the "permission" of consistent reads is guaranteed that
     * their view of the contents of the block device is complete and
     * self-consistent, representing the contents of a disk at a specific
     * point.
     *
     * For most block devices (including their backing files) this is true, but
     * the property cannot be maintained in a few situations like for
     * intermediate nodes of a commit block job.
     */
    BLK_PERM_CONSISTENT_READ    = 0x01,

Replacing an image with a different image means that the node represents
the content of a different disk now, but it's probably still complete
and self-consistent.

Kevin

Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced

Posted by Max Reitz 6 years ago

On 06.02.20 16:51, Kevin Wolf wrote:
> Am 06.02.2020 um 16:21 hat Max Reitz geschrieben:
>> On 06.02.20 15:58, Kevin Wolf wrote:
>>> Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
>>>> On 05.02.20 16:38, Kevin Wolf wrote:
>>>>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
>>>>>> We will need this to verify that Quorum can let one of its children be
>>>>>> replaced without breaking anything else.
>>>>>>
>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>> ---
>>>>>>  block/quorum.c | 25 +++++++++++++++++++++++++
>>>>>>  1 file changed, 25 insertions(+)
>>>>>>
>>>>>> diff --git a/block/quorum.c b/block/quorum.c
>>>>>> index 59cd524502..6a7224c9e4 100644
>>>>>> --- a/block/quorum.c
>>>>>> +++ b/block/quorum.c
>>>>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
>>>>>>  
>>>>>>  typedef struct QuorumChild {
>>>>>>      BdrvChild *child;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * If set, check whether this node can be replaced without any
>>>>>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
>>>>>> +     * WRITE permission.
>>>>>> +     */
>>>>>> +    bool to_be_replaced;
>>>>>
>>>>> I don't understand these permission changes. How does (preparing for)
>>>>> detaching a node from quorum make its content invalid?
>>>>
>>>> It doesn’t, of course.  What we are preparing for is to replace it by
>>>> some other node with some other content.
>>>>
>>>>> And why do we
>>>>> suddenly need WRITE permissions even if the quorum node is only used
>>>>> read-only?
>>>>>
>>>>> The comment is a bit unclear, too. "check whether" implies that both
>>>>> outcomes could be true, but it doesn't say what happens in either case.
>>>>> Is this really "make sure that"?
>>>>
>>>> I think the comment is not only unclear, it is the problem.  (Well,
>>>> maybe the code is also.)
>>>>
>>>> This series is about fixing at least some things about replacing nodes
>>>> by mirroring.  The original use cases this was introduced for was to fix
>>>> broken quorum children: The other children are still intact, so you read
>>>> from the quorum node and replace the broken child (which maybe shows
>>>> invalid data, or maybe just EIO) by the fixed mirror result.
>>>>
>>>> Replacing that broken node by the fixed one changes the data that’s
>>>> visible on that node.
>>>
>>> Hm, yes, that's true. But I wonder if this is really something that the
>>> permission system must catch. Like other graph manipulations, it's
>>> essentially the user saying "trust me, I know what I'm doing, this node
>>> makes sense in this place".
>>>
>>> Because if you assume that the user could add a node with unsuitable
>>> content and you want to prevent this, where do we stop?
>>> blockdev-snapshot can insert a non-empty overlay, which would result in
>>> visible data change. Should we therefore only allow snapshots when
>>> shared writes are allowed? This doesn't work obviously.
>>>
>>> So I'm inclined to say that this is the user's responsibility and we
>>> don't have to jump through hoops to prevent every possible way that the
>>> user could mess up. (Which often also result in preventing legitimate
>>> cases like here a quorum of read-only nodes.)
>>
>> Well, if you ask the question “where do we stop”, we also have to ask
>> the question “where do we start”.  If we say the user knows what they’re
>> doing, we might as well drop the whole can_replace infrastructure
>> altogether and just assume that you can replace any node by anything.
> 
> Well, I don't actually know if that would be completely unreasonable.
> The idea was obviously to keep graph changes restricted to very specific
> cases to avoid nasty surprises like triggering latent bugs. Meanwhile we
> have quite a few more operations that allow changing the graph.
> 
> So if preventing some cases gives us headaches and is probably more work
> than dealing with any bugs they might reveal, maybe preventing them is
> wrong.
> 
> I'm just afraid that we might be overengineering this and waste time on
> things that we don't actually get much use from.

That’s why I’m asking.

>> If the WRITE permission is the problem, then I suppose we can drop that.
>>  Unsharing CONSISTENT_READ is bad enough that it effectively deters all
>> other parents anyway.
> 
> WRITE is probably the more practical problem, though it's technically
> the correct one to take.
> 
> CONSISTENT_READ is already a problem in theory because replacing a child
> node with different content doesn't even match its definition:
> 
>     /**
>      * A user that has the "permission" of consistent reads is guaranteed that
>      * their view of the contents of the block device is complete and
>      * self-consistent, representing the contents of a disk at a specific
>      * point.
>      *
>      * For most block devices (including their backing files) this is true, but
>      * the property cannot be maintained in a few situations like for
>      * intermediate nodes of a commit block job.
>      */
>     BLK_PERM_CONSISTENT_READ    = 0x01,
> 
> Replacing an image with a different image means that the node represents
> the content of a different disk now, but it's probably still complete
> and self-consistent.

At any point in time yes, but not over the time span of the change.  The
definition doesn’t say that the node represents the contents of a disk
at a specific point, but the view from the parent.

I argue that that view is always over some period of time, so if you
suddenly switch out the whole disk, then it isn’t a self-consistent view.

Alternatively, we could of course also just forego the permission system
here altogether and just check that there are no other parents at all.
(Which is effectively the same as unsharing CONSISTENT_READ.)

Max

Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced

Posted by Kevin Wolf 6 years ago

Am 06.02.2020 um 17:43 hat Max Reitz geschrieben:
> On 06.02.20 16:51, Kevin Wolf wrote:
> > Am 06.02.2020 um 16:21 hat Max Reitz geschrieben:
> >> On 06.02.20 15:58, Kevin Wolf wrote:
> >>> Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
> >>>> On 05.02.20 16:38, Kevin Wolf wrote:
> >>>>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
> >>>>>> We will need this to verify that Quorum can let one of its children be
> >>>>>> replaced without breaking anything else.
> >>>>>>
> >>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>>>> ---
> >>>>>>  block/quorum.c | 25 +++++++++++++++++++++++++
> >>>>>>  1 file changed, 25 insertions(+)
> >>>>>>
> >>>>>> diff --git a/block/quorum.c b/block/quorum.c
> >>>>>> index 59cd524502..6a7224c9e4 100644
> >>>>>> --- a/block/quorum.c
> >>>>>> +++ b/block/quorum.c
> >>>>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
> >>>>>>  
> >>>>>>  typedef struct QuorumChild {
> >>>>>>      BdrvChild *child;
> >>>>>> +
> >>>>>> +    /*
> >>>>>> +     * If set, check whether this node can be replaced without any
> >>>>>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
> >>>>>> +     * WRITE permission.
> >>>>>> +     */
> >>>>>> +    bool to_be_replaced;
> >>>>>
> >>>>> I don't understand these permission changes. How does (preparing for)
> >>>>> detaching a node from quorum make its content invalid?
> >>>>
> >>>> It doesn’t, of course.  What we are preparing for is to replace it by
> >>>> some other node with some other content.
> >>>>
> >>>>> And why do we
> >>>>> suddenly need WRITE permissions even if the quorum node is only used
> >>>>> read-only?
> >>>>>
> >>>>> The comment is a bit unclear, too. "check whether" implies that both
> >>>>> outcomes could be true, but it doesn't say what happens in either case.
> >>>>> Is this really "make sure that"?
> >>>>
> >>>> I think the comment is not only unclear, it is the problem.  (Well,
> >>>> maybe the code is also.)
> >>>>
> >>>> This series is about fixing at least some things about replacing nodes
> >>>> by mirroring.  The original use cases this was introduced for was to fix
> >>>> broken quorum children: The other children are still intact, so you read
> >>>> from the quorum node and replace the broken child (which maybe shows
> >>>> invalid data, or maybe just EIO) by the fixed mirror result.
> >>>>
> >>>> Replacing that broken node by the fixed one changes the data that’s
> >>>> visible on that node.
> >>>
> >>> Hm, yes, that's true. But I wonder if this is really something that the
> >>> permission system must catch. Like other graph manipulations, it's
> >>> essentially the user saying "trust me, I know what I'm doing, this node
> >>> makes sense in this place".
> >>>
> >>> Because if you assume that the user could add a node with unsuitable
> >>> content and you want to prevent this, where do we stop?
> >>> blockdev-snapshot can insert a non-empty overlay, which would result in
> >>> visible data change. Should we therefore only allow snapshots when
> >>> shared writes are allowed? This doesn't work obviously.
> >>>
> >>> So I'm inclined to say that this is the user's responsibility and we
> >>> don't have to jump through hoops to prevent every possible way that the
> >>> user could mess up. (Which often also result in preventing legitimate
> >>> cases like here a quorum of read-only nodes.)
> >>
> >> Well, if you ask the question “where do we stop”, we also have to ask
> >> the question “where do we start”.  If we say the user knows what they’re
> >> doing, we might as well drop the whole can_replace infrastructure
> >> altogether and just assume that you can replace any node by anything.
> > 
> > Well, I don't actually know if that would be completely unreasonable.
> > The idea was obviously to keep graph changes restricted to very specific
> > cases to avoid nasty surprises like triggering latent bugs. Meanwhile we
> > have quite a few more operations that allow changing the graph.
> > 
> > So if preventing some cases gives us headaches and is probably more work
> > than dealing with any bugs they might reveal, maybe preventing them is
> > wrong.
> > 
> > I'm just afraid that we might be overengineering this and waste time on
> > things that we don't actually get much use from.
> 
> That’s why I’m asking.

Did I answer your question sufficiently then?

> >> If the WRITE permission is the problem, then I suppose we can drop that.
> >>  Unsharing CONSISTENT_READ is bad enough that it effectively deters all
> >> other parents anyway.
> > 
> > WRITE is probably the more practical problem, though it's technically
> > the correct one to take.
> > 
> > CONSISTENT_READ is already a problem in theory because replacing a child
> > node with different content doesn't even match its definition:
> > 
> >     /**
> >      * A user that has the "permission" of consistent reads is guaranteed that
> >      * their view of the contents of the block device is complete and
> >      * self-consistent, representing the contents of a disk at a specific
> >      * point.
> >      *
> >      * For most block devices (including their backing files) this is true, but
> >      * the property cannot be maintained in a few situations like for
> >      * intermediate nodes of a commit block job.
> >      */
> >     BLK_PERM_CONSISTENT_READ    = 0x01,
> > 
> > Replacing an image with a different image means that the node represents
> > the content of a different disk now, but it's probably still complete
> > and self-consistent.
> 
> At any point in time yes, but not over the time span of the change.  The
> definition doesn’t say that the node represents the contents of a disk
> at a specific point, but the view from the parent.
> 
> I argue that that view is always over some period of time, so if you
> suddenly switch out the whole disk, then it isn’t a self-consistent view.

I think your theory that it's over some period of time conflicts with
the documentation that says "at a specific point".

> Alternatively, we could of course also just forego the permission system
> here altogether and just check that there are no other parents at all.
> (Which is effectively the same as unsharing CONSISTENT_READ.)

This would sidestep all of the artificial permission twiddling, which
sounds good.

It would probably also needlessly restrict the allowed use cases, but
then, who cares about nodes with multiple parents, one of which is a
quorum node?

So I guess I would be fine with either checking that there are no
parents or maybe even just dropping the check completely.

Kevin

Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced

Posted by Max Reitz 6 years ago

On 06.02.20 17:57, Kevin Wolf wrote:
> Am 06.02.2020 um 17:43 hat Max Reitz geschrieben:
>> On 06.02.20 16:51, Kevin Wolf wrote:
>>> Am 06.02.2020 um 16:21 hat Max Reitz geschrieben:
>>>> On 06.02.20 15:58, Kevin Wolf wrote:
>>>>> Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
>>>>>> On 05.02.20 16:38, Kevin Wolf wrote:
>>>>>>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
>>>>>>>> We will need this to verify that Quorum can let one of its children be
>>>>>>>> replaced without breaking anything else.
>>>>>>>>
>>>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>>>> ---
>>>>>>>>  block/quorum.c | 25 +++++++++++++++++++++++++
>>>>>>>>  1 file changed, 25 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/block/quorum.c b/block/quorum.c
>>>>>>>> index 59cd524502..6a7224c9e4 100644
>>>>>>>> --- a/block/quorum.c
>>>>>>>> +++ b/block/quorum.c
>>>>>>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
>>>>>>>>  
>>>>>>>>  typedef struct QuorumChild {
>>>>>>>>      BdrvChild *child;
>>>>>>>> +
>>>>>>>> +    /*
>>>>>>>> +     * If set, check whether this node can be replaced without any
>>>>>>>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
>>>>>>>> +     * WRITE permission.
>>>>>>>> +     */
>>>>>>>> +    bool to_be_replaced;
>>>>>>>
>>>>>>> I don't understand these permission changes. How does (preparing for)
>>>>>>> detaching a node from quorum make its content invalid?
>>>>>>
>>>>>> It doesn’t, of course.  What we are preparing for is to replace it by
>>>>>> some other node with some other content.
>>>>>>
>>>>>>> And why do we
>>>>>>> suddenly need WRITE permissions even if the quorum node is only used
>>>>>>> read-only?
>>>>>>>
>>>>>>> The comment is a bit unclear, too. "check whether" implies that both
>>>>>>> outcomes could be true, but it doesn't say what happens in either case.
>>>>>>> Is this really "make sure that"?
>>>>>>
>>>>>> I think the comment is not only unclear, it is the problem.  (Well,
>>>>>> maybe the code is also.)
>>>>>>
>>>>>> This series is about fixing at least some things about replacing nodes
>>>>>> by mirroring.  The original use cases this was introduced for was to fix
>>>>>> broken quorum children: The other children are still intact, so you read
>>>>>> from the quorum node and replace the broken child (which maybe shows
>>>>>> invalid data, or maybe just EIO) by the fixed mirror result.
>>>>>>
>>>>>> Replacing that broken node by the fixed one changes the data that’s
>>>>>> visible on that node.
>>>>>
>>>>> Hm, yes, that's true. But I wonder if this is really something that the
>>>>> permission system must catch. Like other graph manipulations, it's
>>>>> essentially the user saying "trust me, I know what I'm doing, this node
>>>>> makes sense in this place".
>>>>>
>>>>> Because if you assume that the user could add a node with unsuitable
>>>>> content and you want to prevent this, where do we stop?
>>>>> blockdev-snapshot can insert a non-empty overlay, which would result in
>>>>> visible data change. Should we therefore only allow snapshots when
>>>>> shared writes are allowed? This doesn't work obviously.
>>>>>
>>>>> So I'm inclined to say that this is the user's responsibility and we
>>>>> don't have to jump through hoops to prevent every possible way that the
>>>>> user could mess up. (Which often also result in preventing legitimate
>>>>> cases like here a quorum of read-only nodes.)
>>>>
>>>> Well, if you ask the question “where do we stop”, we also have to ask
>>>> the question “where do we start”.  If we say the user knows what they’re
>>>> doing, we might as well drop the whole can_replace infrastructure
>>>> altogether and just assume that you can replace any node by anything.
>>>
>>> Well, I don't actually know if that would be completely unreasonable.
>>> The idea was obviously to keep graph changes restricted to very specific
>>> cases to avoid nasty surprises like triggering latent bugs. Meanwhile we
>>> have quite a few more operations that allow changing the graph.
>>>
>>> So if preventing some cases gives us headaches and is probably more work
>>> than dealing with any bugs they might reveal, maybe preventing them is
>>> wrong.
>>>
>>> I'm just afraid that we might be overengineering this and waste time on
>>> things that we don't actually get much use from.
>>
>> That’s why I’m asking.
> 
> Did I answer your question sufficiently then?

No, because “I’m afraid” is a sentiment I fully share, but it doesn’t
answer the question whether we are indeed overengineering this or not. :-)

I suppose my stance now is “This is probably overengineered, but now we
might as well roll with it”.

>>>> If the WRITE permission is the problem, then I suppose we can drop that.
>>>>  Unsharing CONSISTENT_READ is bad enough that it effectively deters all
>>>> other parents anyway.
>>>
>>> WRITE is probably the more practical problem, though it's technically
>>> the correct one to take.
>>>
>>> CONSISTENT_READ is already a problem in theory because replacing a child
>>> node with different content doesn't even match its definition:
>>>
>>>     /**
>>>      * A user that has the "permission" of consistent reads is guaranteed that
>>>      * their view of the contents of the block device is complete and
>>>      * self-consistent, representing the contents of a disk at a specific
>>>      * point.
>>>      *
>>>      * For most block devices (including their backing files) this is true, but
>>>      * the property cannot be maintained in a few situations like for
>>>      * intermediate nodes of a commit block job.
>>>      */
>>>     BLK_PERM_CONSISTENT_READ    = 0x01,
>>>
>>> Replacing an image with a different image means that the node represents
>>> the content of a different disk now, but it's probably still complete
>>> and self-consistent.
>>
>> At any point in time yes, but not over the time span of the change.  The
>> definition doesn’t say that the node represents the contents of a disk
>> at a specific point, but the view from the parent.
>>
>> I argue that that view is always over some period of time, so if you
>> suddenly switch out the whole disk, then it isn’t a self-consistent view.
> 
> I think your theory that it's over some period of time conflicts with
> the documentation that says "at a specific point".

I’d rather not get into a deeper discussion on what CONSISTENT_READ
means again... :-/

I always feel like if you really take only a single point in time, then
anything could be some hypothetical disk.

So to me, unsharing CONSISTENT_READ effectively just means “Don’t touch
this, you don’t want to”.

>> Alternatively, we could of course also just forego the permission system
>> here altogether and just check that there are no other parents at all.
>> (Which is effectively the same as unsharing CONSISTENT_READ.)
> 
> This would sidestep all of the artificial permission twiddling, which
> sounds good.
> 
> It would probably also needlessly restrict the allowed use cases,

Only in theory, though, because in practice basically everything useful
takes CONSISTENT_READ anyway.

> but
> then, who cares about nodes with multiple parents, one of which is a
> quorum node?
> 
> So I guess I would be fine with either checking that there are no
> parents or maybe even just dropping the check completely.

OK, I’ll check the parent list then.  (Except it must be exactly one
parent, namely Quorum.)

Max

Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced

Posted by Kevin Wolf 6 years ago

Am 06.02.2020 um 18:06 hat Max Reitz geschrieben:
> On 06.02.20 17:57, Kevin Wolf wrote:
> > Am 06.02.2020 um 17:43 hat Max Reitz geschrieben:
> >> On 06.02.20 16:51, Kevin Wolf wrote:
> >>> Am 06.02.2020 um 16:21 hat Max Reitz geschrieben:
> >>>> On 06.02.20 15:58, Kevin Wolf wrote:
> >>>>> Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
> >>>>>> On 05.02.20 16:38, Kevin Wolf wrote:
> >>>>>>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
> >>>>>>>> We will need this to verify that Quorum can let one of its children be
> >>>>>>>> replaced without breaking anything else.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>>>>>> ---
> >>>>>>>>  block/quorum.c | 25 +++++++++++++++++++++++++
> >>>>>>>>  1 file changed, 25 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/block/quorum.c b/block/quorum.c
> >>>>>>>> index 59cd524502..6a7224c9e4 100644
> >>>>>>>> --- a/block/quorum.c
> >>>>>>>> +++ b/block/quorum.c
> >>>>>>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
> >>>>>>>>  
> >>>>>>>>  typedef struct QuorumChild {
> >>>>>>>>      BdrvChild *child;
> >>>>>>>> +
> >>>>>>>> +    /*
> >>>>>>>> +     * If set, check whether this node can be replaced without any
> >>>>>>>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
> >>>>>>>> +     * WRITE permission.
> >>>>>>>> +     */
> >>>>>>>> +    bool to_be_replaced;
> >>>>>>>
> >>>>>>> I don't understand these permission changes. How does (preparing for)
> >>>>>>> detaching a node from quorum make its content invalid?
> >>>>>>
> >>>>>> It doesn’t, of course.  What we are preparing for is to replace it by
> >>>>>> some other node with some other content.
> >>>>>>
> >>>>>>> And why do we
> >>>>>>> suddenly need WRITE permissions even if the quorum node is only used
> >>>>>>> read-only?
> >>>>>>>
> >>>>>>> The comment is a bit unclear, too. "check whether" implies that both
> >>>>>>> outcomes could be true, but it doesn't say what happens in either case.
> >>>>>>> Is this really "make sure that"?
> >>>>>>
> >>>>>> I think the comment is not only unclear, it is the problem.  (Well,
> >>>>>> maybe the code is also.)
> >>>>>>
> >>>>>> This series is about fixing at least some things about replacing nodes
> >>>>>> by mirroring.  The original use cases this was introduced for was to fix
> >>>>>> broken quorum children: The other children are still intact, so you read
> >>>>>> from the quorum node and replace the broken child (which maybe shows
> >>>>>> invalid data, or maybe just EIO) by the fixed mirror result.
> >>>>>>
> >>>>>> Replacing that broken node by the fixed one changes the data that’s
> >>>>>> visible on that node.
> >>>>>
> >>>>> Hm, yes, that's true. But I wonder if this is really something that the
> >>>>> permission system must catch. Like other graph manipulations, it's
> >>>>> essentially the user saying "trust me, I know what I'm doing, this node
> >>>>> makes sense in this place".
> >>>>>
> >>>>> Because if you assume that the user could add a node with unsuitable
> >>>>> content and you want to prevent this, where do we stop?
> >>>>> blockdev-snapshot can insert a non-empty overlay, which would result in
> >>>>> visible data change. Should we therefore only allow snapshots when
> >>>>> shared writes are allowed? This doesn't work obviously.
> >>>>>
> >>>>> So I'm inclined to say that this is the user's responsibility and we
> >>>>> don't have to jump through hoops to prevent every possible way that the
> >>>>> user could mess up. (Which often also result in preventing legitimate
> >>>>> cases like here a quorum of read-only nodes.)
> >>>>
> >>>> Well, if you ask the question “where do we stop”, we also have to ask
> >>>> the question “where do we start”.  If we say the user knows what they’re
> >>>> doing, we might as well drop the whole can_replace infrastructure
> >>>> altogether and just assume that you can replace any node by anything.
> >>>
> >>> Well, I don't actually know if that would be completely unreasonable.
> >>> The idea was obviously to keep graph changes restricted to very specific
> >>> cases to avoid nasty surprises like triggering latent bugs. Meanwhile we
> >>> have quite a few more operations that allow changing the graph.
> >>>
> >>> So if preventing some cases gives us headaches and is probably more work
> >>> than dealing with any bugs they might reveal, maybe preventing them is
> >>> wrong.
> >>>
> >>> I'm just afraid that we might be overengineering this and waste time on
> >>> things that we don't actually get much use from.
> >>
> >> That’s why I’m asking.
> > 
> > Did I answer your question sufficiently then?
> 
> No, because “I’m afraid” is a sentiment I fully share, but it doesn’t
> answer the question whether we are indeed overengineering this or not. :-)

Well, I guess I can only answer this after seeing the bug reports we
would get after removing the check. :-)

> I suppose my stance now is “This is probably overengineered, but now we
> might as well roll with it”.

Your choice. I'm not opposed to anything that feels like it makes sense.

> >>>> If the WRITE permission is the problem, then I suppose we can drop that.
> >>>>  Unsharing CONSISTENT_READ is bad enough that it effectively deters all
> >>>> other parents anyway.
> >>>
> >>> WRITE is probably the more practical problem, though it's technically
> >>> the correct one to take.
> >>>
> >>> CONSISTENT_READ is already a problem in theory because replacing a child
> >>> node with different content doesn't even match its definition:
> >>>
> >>>     /**
> >>>      * A user that has the "permission" of consistent reads is guaranteed that
> >>>      * their view of the contents of the block device is complete and
> >>>      * self-consistent, representing the contents of a disk at a specific
> >>>      * point.
> >>>      *
> >>>      * For most block devices (including their backing files) this is true, but
> >>>      * the property cannot be maintained in a few situations like for
> >>>      * intermediate nodes of a commit block job.
> >>>      */
> >>>     BLK_PERM_CONSISTENT_READ    = 0x01,
> >>>
> >>> Replacing an image with a different image means that the node represents
> >>> the content of a different disk now, but it's probably still complete
> >>> and self-consistent.
> >>
> >> At any point in time yes, but not over the time span of the change.  The
> >> definition doesn’t say that the node represents the contents of a disk
> >> at a specific point, but the view from the parent.
> >>
> >> I argue that that view is always over some period of time, so if you
> >> suddenly switch out the whole disk, then it isn’t a self-consistent view.
> > 
> > I think your theory that it's over some period of time conflicts with
> > the documentation that says "at a specific point".
> 
> I’d rather not get into a deeper discussion on what CONSISTENT_READ
> means again... :-/
> 
> I always feel like if you really take only a single point in time, then
> anything could be some hypothetical disk.
> 
> So to me, unsharing CONSISTENT_READ effectively just means “Don’t touch
> this, you don’t want to”.

The difference is that with the replace operation we aren't talking
about hypothetical corrupted disks (like we would get when accessing
intermediate nodes of the commit job), but about two actual disk images
that are both valid, though different.

But yes, maybe we should avoid this discussion...

(I mean, what it *really* means is "this is not an intermediate node
of commit". ;-))

> >> Alternatively, we could of course also just forego the permission system
> >> here altogether and just check that there are no other parents at all.
> >> (Which is effectively the same as unsharing CONSISTENT_READ.)
> > 
> > This would sidestep all of the artificial permission twiddling, which
> > sounds good.
> > 
> > It would probably also needlessly restrict the allowed use cases,
> 
> Only in theory, though, because in practice basically everything useful
> takes CONSISTENT_READ anyway.

Oh, compared to taking WRITE and unsharing CONSISTENT_READ it's probably
not more restrictice. I was comparing with the case that drops the
checks altogether.

> > but
> > then, who cares about nodes with multiple parents, one of which is a
> > quorum node?
> > 
> > So I guess I would be fine with either checking that there are no
> > parents or maybe even just dropping the check completely.
> 
> OK, I’ll check the parent list then.  (Except it must be exactly one
> parent, namely Quorum.)

Fine with me.

Kevin

Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced

Posted by Max Reitz 6 years ago

On 06.02.20 17:43, Max Reitz wrote:
> On 06.02.20 16:51, Kevin Wolf wrote:
>> Am 06.02.2020 um 16:21 hat Max Reitz geschrieben:
>>> On 06.02.20 15:58, Kevin Wolf wrote:
>>>> Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
>>>>> On 05.02.20 16:38, Kevin Wolf wrote:
>>>>>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
>>>>>>> We will need this to verify that Quorum can let one of its children be
>>>>>>> replaced without breaking anything else.
>>>>>>>
>>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>>>>> ---
>>>>>>>  block/quorum.c | 25 +++++++++++++++++++++++++
>>>>>>>  1 file changed, 25 insertions(+)
>>>>>>>
>>>>>>> diff --git a/block/quorum.c b/block/quorum.c
>>>>>>> index 59cd524502..6a7224c9e4 100644
>>>>>>> --- a/block/quorum.c
>>>>>>> +++ b/block/quorum.c
>>>>>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
>>>>>>>  
>>>>>>>  typedef struct QuorumChild {
>>>>>>>      BdrvChild *child;
>>>>>>> +
>>>>>>> +    /*
>>>>>>> +     * If set, check whether this node can be replaced without any
>>>>>>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
>>>>>>> +     * WRITE permission.
>>>>>>> +     */
>>>>>>> +    bool to_be_replaced;
>>>>>>
>>>>>> I don't understand these permission changes. How does (preparing for)
>>>>>> detaching a node from quorum make its content invalid?
>>>>>
>>>>> It doesn’t, of course.  What we are preparing for is to replace it by
>>>>> some other node with some other content.
>>>>>
>>>>>> And why do we
>>>>>> suddenly need WRITE permissions even if the quorum node is only used
>>>>>> read-only?
>>>>>>
>>>>>> The comment is a bit unclear, too. "check whether" implies that both
>>>>>> outcomes could be true, but it doesn't say what happens in either case.
>>>>>> Is this really "make sure that"?
>>>>>
>>>>> I think the comment is not only unclear, it is the problem.  (Well,
>>>>> maybe the code is also.)
>>>>>
>>>>> This series is about fixing at least some things about replacing nodes
>>>>> by mirroring.  The original use cases this was introduced for was to fix
>>>>> broken quorum children: The other children are still intact, so you read
>>>>> from the quorum node and replace the broken child (which maybe shows
>>>>> invalid data, or maybe just EIO) by the fixed mirror result.
>>>>>
>>>>> Replacing that broken node by the fixed one changes the data that’s
>>>>> visible on that node.
>>>>
>>>> Hm, yes, that's true. But I wonder if this is really something that the
>>>> permission system must catch. Like other graph manipulations, it's
>>>> essentially the user saying "trust me, I know what I'm doing, this node
>>>> makes sense in this place".
>>>>
>>>> Because if you assume that the user could add a node with unsuitable
>>>> content and you want to prevent this, where do we stop?
>>>> blockdev-snapshot can insert a non-empty overlay, which would result in
>>>> visible data change. Should we therefore only allow snapshots when
>>>> shared writes are allowed? This doesn't work obviously.
>>>>
>>>> So I'm inclined to say that this is the user's responsibility and we
>>>> don't have to jump through hoops to prevent every possible way that the
>>>> user could mess up. (Which often also result in preventing legitimate
>>>> cases like here a quorum of read-only nodes.)
>>>
>>> Well, if you ask the question “where do we stop”, we also have to ask
>>> the question “where do we start”.  If we say the user knows what they’re
>>> doing, we might as well drop the whole can_replace infrastructure
>>> altogether and just assume that you can replace any node by anything.
>>
>> Well, I don't actually know if that would be completely unreasonable.
>> The idea was obviously to keep graph changes restricted to very specific
>> cases to avoid nasty surprises like triggering latent bugs. Meanwhile we
>> have quite a few more operations that allow changing the graph.
>>
>> So if preventing some cases gives us headaches and is probably more work
>> than dealing with any bugs they might reveal, maybe preventing them is
>> wrong.
>>
>> I'm just afraid that we might be overengineering this and waste time on
>> things that we don't actually get much use from.
> 
> That’s why I’m asking.

(One thing to consider here, though, is that this series exists and has
been reviewed by Vladimir in full, so most of the engineering effort has
already been done.  In contrast, writing a new series to drop the whole
can_replace infrastructure with no replacement may actually cost more.)

Max

Re: [PATCH v3 09/21] quorum: Add QuorumChild.to_be_replaced

Posted by Kevin Wolf 6 years ago

Am 06.02.2020 um 17:47 hat Max Reitz geschrieben:
> On 06.02.20 17:43, Max Reitz wrote:
> > On 06.02.20 16:51, Kevin Wolf wrote:
> >> Am 06.02.2020 um 16:21 hat Max Reitz geschrieben:
> >>> On 06.02.20 15:58, Kevin Wolf wrote:
> >>>> Am 06.02.2020 um 11:11 hat Max Reitz geschrieben:
> >>>>> On 05.02.20 16:38, Kevin Wolf wrote:
> >>>>>> Am 30.01.2020 um 22:44 hat Max Reitz geschrieben:
> >>>>>>> We will need this to verify that Quorum can let one of its children be
> >>>>>>> replaced without breaking anything else.
> >>>>>>>
> >>>>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>>>>> ---
> >>>>>>>  block/quorum.c | 25 +++++++++++++++++++++++++
> >>>>>>>  1 file changed, 25 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/block/quorum.c b/block/quorum.c
> >>>>>>> index 59cd524502..6a7224c9e4 100644
> >>>>>>> --- a/block/quorum.c
> >>>>>>> +++ b/block/quorum.c
> >>>>>>> @@ -67,6 +67,13 @@ typedef struct QuorumVotes {
> >>>>>>>  
> >>>>>>>  typedef struct QuorumChild {
> >>>>>>>      BdrvChild *child;
> >>>>>>> +
> >>>>>>> +    /*
> >>>>>>> +     * If set, check whether this node can be replaced without any
> >>>>>>> +     * other parent noticing: Unshare CONSISTENT_READ, and take the
> >>>>>>> +     * WRITE permission.
> >>>>>>> +     */
> >>>>>>> +    bool to_be_replaced;
> >>>>>>
> >>>>>> I don't understand these permission changes. How does (preparing for)
> >>>>>> detaching a node from quorum make its content invalid?
> >>>>>
> >>>>> It doesn’t, of course.  What we are preparing for is to replace it by
> >>>>> some other node with some other content.
> >>>>>
> >>>>>> And why do we
> >>>>>> suddenly need WRITE permissions even if the quorum node is only used
> >>>>>> read-only?
> >>>>>>
> >>>>>> The comment is a bit unclear, too. "check whether" implies that both
> >>>>>> outcomes could be true, but it doesn't say what happens in either case.
> >>>>>> Is this really "make sure that"?
> >>>>>
> >>>>> I think the comment is not only unclear, it is the problem.  (Well,
> >>>>> maybe the code is also.)
> >>>>>
> >>>>> This series is about fixing at least some things about replacing nodes
> >>>>> by mirroring.  The original use cases this was introduced for was to fix
> >>>>> broken quorum children: The other children are still intact, so you read
> >>>>> from the quorum node and replace the broken child (which maybe shows
> >>>>> invalid data, or maybe just EIO) by the fixed mirror result.
> >>>>>
> >>>>> Replacing that broken node by the fixed one changes the data that’s
> >>>>> visible on that node.
> >>>>
> >>>> Hm, yes, that's true. But I wonder if this is really something that the
> >>>> permission system must catch. Like other graph manipulations, it's
> >>>> essentially the user saying "trust me, I know what I'm doing, this node
> >>>> makes sense in this place".
> >>>>
> >>>> Because if you assume that the user could add a node with unsuitable
> >>>> content and you want to prevent this, where do we stop?
> >>>> blockdev-snapshot can insert a non-empty overlay, which would result in
> >>>> visible data change. Should we therefore only allow snapshots when
> >>>> shared writes are allowed? This doesn't work obviously.
> >>>>
> >>>> So I'm inclined to say that this is the user's responsibility and we
> >>>> don't have to jump through hoops to prevent every possible way that the
> >>>> user could mess up. (Which often also result in preventing legitimate
> >>>> cases like here a quorum of read-only nodes.)
> >>>
> >>> Well, if you ask the question “where do we stop”, we also have to ask
> >>> the question “where do we start”.  If we say the user knows what they’re
> >>> doing, we might as well drop the whole can_replace infrastructure
> >>> altogether and just assume that you can replace any node by anything.
> >>
> >> Well, I don't actually know if that would be completely unreasonable.
> >> The idea was obviously to keep graph changes restricted to very specific
> >> cases to avoid nasty surprises like triggering latent bugs. Meanwhile we
> >> have quite a few more operations that allow changing the graph.
> >>
> >> So if preventing some cases gives us headaches and is probably more work
> >> than dealing with any bugs they might reveal, maybe preventing them is
> >> wrong.
> >>
> >> I'm just afraid that we might be overengineering this and waste time on
> >> things that we don't actually get much use from.
> > 
> > That’s why I’m asking.
> 
> (One thing to consider here, though, is that this series exists and has
> been reviewed by Vladimir in full, so most of the engineering effort has
> already been done.  In contrast, writing a new series to drop the whole
> can_replace infrastructure with no replacement may actually cost more.)

Fair enough. If the artificial permission changes didn't feel so wrong,
I think I would just say "let's merge it and forget about it". But they
do feel wrong, so I'm not as sure.

Kevin