block/file-posix: do not fail on unlock bytes

[Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes

Posted by Vladimir Sementsov-Ogievskiy 6 years, 10 months ago

bdrv_replace_child() calls bdrv_check_perm() with error_abort on
loosening permissions. However file-locking operations may fail even
in this case, for example on NFS. And this leads to Qemu crash.

Let's avoid such errors. Note, that we ignore such things anyway on
permission update commit and abort.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/file-posix.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index db4cccbe51..1cf4ee49eb 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
 
     switch (op) {
     case RAW_PL_PREPARE:
+        if ((s->perm | new_perm) == s->perm &&
+            (s->shared_perm & new_shared) == s->shared_perm)
+        {
+            /*
+             * We are going to unlock bytes, it should not fail. If it fail due
+             * to some fs-dependent permission-unrelated reasons (which occurs
+             * sometimes on NFS and leads to abort in bdrv_replace_child) we
+             * can't prevent such errors by any check here. And we ignore them
+             * anyway in ABORT and COMMIT.
+             */
+            return 0;
+        }
         ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
                                    ~s->shared_perm | ~new_shared,
                                    false, errp);
-- 
2.18.0

Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes

Posted by Kevin Wolf 6 years, 10 months ago

Am 29.03.2019 um 12:04 hat Vladimir Sementsov-Ogievskiy geschrieben:
> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
> loosening permissions. However file-locking operations may fail even
> in this case, for example on NFS. And this leads to Qemu crash.
> 
> Let's avoid such errors. Note, that we ignore such things anyway on
> permission update commit and abort.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Thanks, applied to the block branch.

Kevin

Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes

Posted by Max Reitz 6 years, 10 months ago

On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote:
> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
> loosening permissions. However file-locking operations may fail even
> in this case, for example on NFS. And this leads to Qemu crash.
> 
> Let's avoid such errors. Note, that we ignore such things anyway on
> permission update commit and abort.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/file-posix.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index db4cccbe51..1cf4ee49eb 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
>  
>      switch (op) {
>      case RAW_PL_PREPARE:
> +        if ((s->perm | new_perm) == s->perm &&
> +            (s->shared_perm & new_shared) == s->shared_perm)
> +        {
> +            /*
> +             * We are going to unlock bytes, it should not fail. If it fail due
> +             * to some fs-dependent permission-unrelated reasons (which occurs
> +             * sometimes on NFS and leads to abort in bdrv_replace_child) we
> +             * can't prevent such errors by any check here. And we ignore them
> +             * anyway in ABORT and COMMIT.
> +             */
> +            return 0;
> +        }
>          ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
>                                     ~s->shared_perm | ~new_shared,
>                                     false, errp);

Help me understand the exact issue, please.  I understand that there are
operations like bdrv_replace_child() that pass &error_abort to
bdrv_check_perm() because they just loosen the permissions, so it should
not fail.

However, if the whole effect really would be to loosen permissions,
raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway:
@unlock is passed as false, so no bytes will be unlocked.  And if
permissions are just loosened (as your condition checks), it should not
lock any bytes.

So why does it attempt lock any bytes in the first place?  There must be
some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm
and s->locked_shared_perm.  How does that occur?

Max

Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes

Posted by Kevin Wolf 6 years, 10 months ago

Am 29.03.2019 um 18:15 hat Max Reitz geschrieben:
> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote:
> > bdrv_replace_child() calls bdrv_check_perm() with error_abort on
> > loosening permissions. However file-locking operations may fail even
> > in this case, for example on NFS. And this leads to Qemu crash.
> > 
> > Let's avoid such errors. Note, that we ignore such things anyway on
> > permission update commit and abort.
> > 
> > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > ---
> >  block/file-posix.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index db4cccbe51..1cf4ee49eb 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
> >  
> >      switch (op) {
> >      case RAW_PL_PREPARE:
> > +        if ((s->perm | new_perm) == s->perm &&
> > +            (s->shared_perm & new_shared) == s->shared_perm)
> > +        {
> > +            /*
> > +             * We are going to unlock bytes, it should not fail. If it fail due
> > +             * to some fs-dependent permission-unrelated reasons (which occurs
> > +             * sometimes on NFS and leads to abort in bdrv_replace_child) we
> > +             * can't prevent such errors by any check here. And we ignore them
> > +             * anyway in ABORT and COMMIT.
> > +             */
> > +            return 0;
> > +        }
> >          ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
> >                                     ~s->shared_perm | ~new_shared,
> >                                     false, errp);
> 
> Help me understand the exact issue, please.  I understand that there are
> operations like bdrv_replace_child() that pass &error_abort to
> bdrv_check_perm() because they just loosen the permissions, so it should
> not fail.
> 
> However, if the whole effect really would be to loosen permissions,
> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway:
> @unlock is passed as false, so no bytes will be unlocked.  And if
> permissions are just loosened (as your condition checks), it should not
> lock any bytes.
> 
> So why does it attempt lock any bytes in the first place?  There must be
> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm
> and s->locked_shared_perm.  How does that occur?

I suppose raw_check_lock_bytes() is what is failing, not
raw_apply_lock_bytes().

Kevin

Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes

Posted by Max Reitz 6 years, 10 months ago

On 29.03.19 18:24, Kevin Wolf wrote:
> Am 29.03.2019 um 18:15 hat Max Reitz geschrieben:
>> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote:
>>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
>>> loosening permissions. However file-locking operations may fail even
>>> in this case, for example on NFS. And this leads to Qemu crash.
>>>
>>> Let's avoid such errors. Note, that we ignore such things anyway on
>>> permission update commit and abort.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>  block/file-posix.c | 12 ++++++++++++
>>>  1 file changed, 12 insertions(+)
>>>
>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>> index db4cccbe51..1cf4ee49eb 100644
>>> --- a/block/file-posix.c
>>> +++ b/block/file-posix.c
>>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
>>>  
>>>      switch (op) {
>>>      case RAW_PL_PREPARE:
>>> +        if ((s->perm | new_perm) == s->perm &&
>>> +            (s->shared_perm & new_shared) == s->shared_perm)
>>> +        {
>>> +            /*
>>> +             * We are going to unlock bytes, it should not fail. If it fail due
>>> +             * to some fs-dependent permission-unrelated reasons (which occurs
>>> +             * sometimes on NFS and leads to abort in bdrv_replace_child) we
>>> +             * can't prevent such errors by any check here. And we ignore them
>>> +             * anyway in ABORT and COMMIT.
>>> +             */
>>> +            return 0;
>>> +        }
>>>          ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
>>>                                     ~s->shared_perm | ~new_shared,
>>>                                     false, errp);
>>
>> Help me understand the exact issue, please.  I understand that there are
>> operations like bdrv_replace_child() that pass &error_abort to
>> bdrv_check_perm() because they just loosen the permissions, so it should
>> not fail.
>>
>> However, if the whole effect really would be to loosen permissions,
>> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway:
>> @unlock is passed as false, so no bytes will be unlocked.  And if
>> permissions are just loosened (as your condition checks), it should not
>> lock any bytes.
>>
>> So why does it attempt lock any bytes in the first place?  There must be
>> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm
>> and s->locked_shared_perm.  How does that occur?
> 
> I suppose raw_check_lock_bytes() is what is failing, not
> raw_apply_lock_bytes().

Hm, maybe in Vladimir's case, but not in e.g.
https://bugzilla.redhat.com/show_bug.cgi?id=1652572 .

Max

Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes

Posted by Vladimir Sementsov-Ogievskiy 6 years, 10 months ago

29.03.2019 20:30, Max Reitz wrote:
> On 29.03.19 18:24, Kevin Wolf wrote:
>> Am 29.03.2019 um 18:15 hat Max Reitz geschrieben:
>>> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote:
>>>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
>>>> loosening permissions. However file-locking operations may fail even
>>>> in this case, for example on NFS. And this leads to Qemu crash.
>>>>
>>>> Let's avoid such errors. Note, that we ignore such things anyway on
>>>> permission update commit and abort.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>   block/file-posix.c | 12 ++++++++++++
>>>>   1 file changed, 12 insertions(+)
>>>>
>>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>>> index db4cccbe51..1cf4ee49eb 100644
>>>> --- a/block/file-posix.c
>>>> +++ b/block/file-posix.c
>>>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
>>>>   
>>>>       switch (op) {
>>>>       case RAW_PL_PREPARE:
>>>> +        if ((s->perm | new_perm) == s->perm &&
>>>> +            (s->shared_perm & new_shared) == s->shared_perm)
>>>> +        {
>>>> +            /*
>>>> +             * We are going to unlock bytes, it should not fail. If it fail due
>>>> +             * to some fs-dependent permission-unrelated reasons (which occurs
>>>> +             * sometimes on NFS and leads to abort in bdrv_replace_child) we
>>>> +             * can't prevent such errors by any check here. And we ignore them
>>>> +             * anyway in ABORT and COMMIT.
>>>> +             */
>>>> +            return 0;
>>>> +        }
>>>>           ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
>>>>                                      ~s->shared_perm | ~new_shared,
>>>>                                      false, errp);
>>>
>>> Help me understand the exact issue, please.  I understand that there are
>>> operations like bdrv_replace_child() that pass &error_abort to
>>> bdrv_check_perm() because they just loosen the permissions, so it should
>>> not fail.
>>>
>>> However, if the whole effect really would be to loosen permissions,
>>> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway:
>>> @unlock is passed as false, so no bytes will be unlocked.  And if
>>> permissions are just loosened (as your condition checks), it should not
>>> lock any bytes.
>>>
>>> So why does it attempt lock any bytes in the first place?  There must be
>>> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm
>>> and s->locked_shared_perm.  How does that occur?
>>
>> I suppose raw_check_lock_bytes() is what is failing, not
>> raw_apply_lock_bytes().
> 
> Hm, maybe in Vladimir's case, but not in e.g.
> https://bugzilla.redhat.com/show_bug.cgi?id=1652572 .
> 

raw_apply_lock_bytes failes in my cases. And it is because it calls fcntl to lock bytes
even on loosening.


-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes

Posted by Kevin Wolf 6 years, 10 months ago

Am 29.03.2019 um 18:30 hat Max Reitz geschrieben:
> On 29.03.19 18:24, Kevin Wolf wrote:
> > Am 29.03.2019 um 18:15 hat Max Reitz geschrieben:
> >> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote:
> >>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
> >>> loosening permissions. However file-locking operations may fail even
> >>> in this case, for example on NFS. And this leads to Qemu crash.
> >>>
> >>> Let's avoid such errors. Note, that we ignore such things anyway on
> >>> permission update commit and abort.
> >>>
> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >>> ---
> >>>  block/file-posix.c | 12 ++++++++++++
> >>>  1 file changed, 12 insertions(+)
> >>>
> >>> diff --git a/block/file-posix.c b/block/file-posix.c
> >>> index db4cccbe51..1cf4ee49eb 100644
> >>> --- a/block/file-posix.c
> >>> +++ b/block/file-posix.c
> >>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
> >>>  
> >>>      switch (op) {
> >>>      case RAW_PL_PREPARE:
> >>> +        if ((s->perm | new_perm) == s->perm &&
> >>> +            (s->shared_perm & new_shared) == s->shared_perm)
> >>> +        {
> >>> +            /*
> >>> +             * We are going to unlock bytes, it should not fail. If it fail due
> >>> +             * to some fs-dependent permission-unrelated reasons (which occurs
> >>> +             * sometimes on NFS and leads to abort in bdrv_replace_child) we
> >>> +             * can't prevent such errors by any check here. And we ignore them
> >>> +             * anyway in ABORT and COMMIT.
> >>> +             */
> >>> +            return 0;
> >>> +        }
> >>>          ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
> >>>                                     ~s->shared_perm | ~new_shared,
> >>>                                     false, errp);
> >>
> >> Help me understand the exact issue, please.  I understand that there are
> >> operations like bdrv_replace_child() that pass &error_abort to
> >> bdrv_check_perm() because they just loosen the permissions, so it should
> >> not fail.
> >>
> >> However, if the whole effect really would be to loosen permissions,
> >> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway:
> >> @unlock is passed as false, so no bytes will be unlocked.  And if
> >> permissions are just loosened (as your condition checks), it should not
> >> lock any bytes.
> >>
> >> So why does it attempt lock any bytes in the first place?  There must be
> >> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm
> >> and s->locked_shared_perm.  How does that occur?
> > 
> > I suppose raw_check_lock_bytes() is what is failing, not
> > raw_apply_lock_bytes().
> 
> Hm, maybe in Vladimir's case, but not in e.g.
> https://bugzilla.redhat.com/show_bug.cgi?id=1652572 .

This is reported against 3.0, which didn't avoid re-locking permissions
that we already hold, so there raw_apply_lock_bytes() can still fail.

Kevin

Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes

Posted by Max Reitz 6 years, 10 months ago

On 29.03.19 18:40, Kevin Wolf wrote:
> Am 29.03.2019 um 18:30 hat Max Reitz geschrieben:
>> On 29.03.19 18:24, Kevin Wolf wrote:
>>> Am 29.03.2019 um 18:15 hat Max Reitz geschrieben:
>>>> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote:
>>>>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
>>>>> loosening permissions. However file-locking operations may fail even
>>>>> in this case, for example on NFS. And this leads to Qemu crash.
>>>>>
>>>>> Let's avoid such errors. Note, that we ignore such things anyway on
>>>>> permission update commit and abort.
>>>>>
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>> ---
>>>>>  block/file-posix.c | 12 ++++++++++++
>>>>>  1 file changed, 12 insertions(+)
>>>>>
>>>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>>>> index db4cccbe51..1cf4ee49eb 100644
>>>>> --- a/block/file-posix.c
>>>>> +++ b/block/file-posix.c
>>>>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
>>>>>  
>>>>>      switch (op) {
>>>>>      case RAW_PL_PREPARE:
>>>>> +        if ((s->perm | new_perm) == s->perm &&
>>>>> +            (s->shared_perm & new_shared) == s->shared_perm)
>>>>> +        {
>>>>> +            /*
>>>>> +             * We are going to unlock bytes, it should not fail. If it fail due
>>>>> +             * to some fs-dependent permission-unrelated reasons (which occurs
>>>>> +             * sometimes on NFS and leads to abort in bdrv_replace_child) we
>>>>> +             * can't prevent such errors by any check here. And we ignore them
>>>>> +             * anyway in ABORT and COMMIT.
>>>>> +             */
>>>>> +            return 0;
>>>>> +        }
>>>>>          ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
>>>>>                                     ~s->shared_perm | ~new_shared,
>>>>>                                     false, errp);
>>>>
>>>> Help me understand the exact issue, please.  I understand that there are
>>>> operations like bdrv_replace_child() that pass &error_abort to
>>>> bdrv_check_perm() because they just loosen the permissions, so it should
>>>> not fail.
>>>>
>>>> However, if the whole effect really would be to loosen permissions,
>>>> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway:
>>>> @unlock is passed as false, so no bytes will be unlocked.  And if
>>>> permissions are just loosened (as your condition checks), it should not
>>>> lock any bytes.
>>>>
>>>> So why does it attempt lock any bytes in the first place?  There must be
>>>> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm
>>>> and s->locked_shared_perm.  How does that occur?
>>>
>>> I suppose raw_check_lock_bytes() is what is failing, not
>>> raw_apply_lock_bytes().
>>
>> Hm, maybe in Vladimir's case, but not in e.g.
>> https://bugzilla.redhat.com/show_bug.cgi?id=1652572 .
> 
> This is reported against 3.0, which didn't avoid re-locking permissions
> that we already hold, so there raw_apply_lock_bytes() can still fail.

That makes sense.  Which leaves the question why Vladimir still seems to
see the error there...?

Max

Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes

Posted by Vladimir Sementsov-Ogievskiy 6 years, 10 months ago

29.03.2019 20:44, Max Reitz wrote:
> On 29.03.19 18:40, Kevin Wolf wrote:
>> Am 29.03.2019 um 18:30 hat Max Reitz geschrieben:
>>> On 29.03.19 18:24, Kevin Wolf wrote:
>>>> Am 29.03.2019 um 18:15 hat Max Reitz geschrieben:
>>>>> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
>>>>>> loosening permissions. However file-locking operations may fail even
>>>>>> in this case, for example on NFS. And this leads to Qemu crash.
>>>>>>
>>>>>> Let's avoid such errors. Note, that we ignore such things anyway on
>>>>>> permission update commit and abort.
>>>>>>
>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>> ---
>>>>>>   block/file-posix.c | 12 ++++++++++++
>>>>>>   1 file changed, 12 insertions(+)
>>>>>>
>>>>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>>>>> index db4cccbe51..1cf4ee49eb 100644
>>>>>> --- a/block/file-posix.c
>>>>>> +++ b/block/file-posix.c
>>>>>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
>>>>>>   
>>>>>>       switch (op) {
>>>>>>       case RAW_PL_PREPARE:
>>>>>> +        if ((s->perm | new_perm) == s->perm &&
>>>>>> +            (s->shared_perm & new_shared) == s->shared_perm)
>>>>>> +        {
>>>>>> +            /*
>>>>>> +             * We are going to unlock bytes, it should not fail. If it fail due
>>>>>> +             * to some fs-dependent permission-unrelated reasons (which occurs
>>>>>> +             * sometimes on NFS and leads to abort in bdrv_replace_child) we
>>>>>> +             * can't prevent such errors by any check here. And we ignore them
>>>>>> +             * anyway in ABORT and COMMIT.
>>>>>> +             */
>>>>>> +            return 0;
>>>>>> +        }
>>>>>>           ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
>>>>>>                                      ~s->shared_perm | ~new_shared,
>>>>>>                                      false, errp);
>>>>>
>>>>> Help me understand the exact issue, please.  I understand that there are
>>>>> operations like bdrv_replace_child() that pass &error_abort to
>>>>> bdrv_check_perm() because they just loosen the permissions, so it should
>>>>> not fail.
>>>>>
>>>>> However, if the whole effect really would be to loosen permissions,
>>>>> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway:
>>>>> @unlock is passed as false, so no bytes will be unlocked.  And if
>>>>> permissions are just loosened (as your condition checks), it should not
>>>>> lock any bytes.
>>>>>
>>>>> So why does it attempt lock any bytes in the first place?  There must be
>>>>> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm
>>>>> and s->locked_shared_perm.  How does that occur?
>>>>
>>>> I suppose raw_check_lock_bytes() is what is failing, not
>>>> raw_apply_lock_bytes().
>>>
>>> Hm, maybe in Vladimir's case, but not in e.g.
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652572 .
>>
>> This is reported against 3.0, which didn't avoid re-locking permissions
>> that we already hold, so there raw_apply_lock_bytes() can still fail.
> 
> That makes sense.  Which leaves the question why Vladimir still seems to
> see the error there...?
> 

I'm sorry :(. I'm trying to fix bug based on 2.10, and now I see that is already fixed
  upstream. I don't have a reproducer, only old coredumps.

So, now it looks like we don't need this patch, as on permission loosening file-posix
don't call any FS apis, yes?


-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes

Posted by Vladimir Sementsov-Ogievskiy 6 years, 10 months ago

29.03.2019 20:58, Vladimir Sementsov-Ogievskiy wrote:
> 29.03.2019 20:44, Max Reitz wrote:
>> On 29.03.19 18:40, Kevin Wolf wrote:
>>> Am 29.03.2019 um 18:30 hat Max Reitz geschrieben:
>>>> On 29.03.19 18:24, Kevin Wolf wrote:
>>>>> Am 29.03.2019 um 18:15 hat Max Reitz geschrieben:
>>>>>> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
>>>>>>> loosening permissions. However file-locking operations may fail even
>>>>>>> in this case, for example on NFS. And this leads to Qemu crash.
>>>>>>>
>>>>>>> Let's avoid such errors. Note, that we ignore such things anyway on
>>>>>>> permission update commit and abort.
>>>>>>>
>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>>> ---
>>>>>>>   block/file-posix.c | 12 ++++++++++++
>>>>>>>   1 file changed, 12 insertions(+)
>>>>>>>
>>>>>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>>>>>> index db4cccbe51..1cf4ee49eb 100644
>>>>>>> --- a/block/file-posix.c
>>>>>>> +++ b/block/file-posix.c
>>>>>>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
>>>>>>>       switch (op) {
>>>>>>>       case RAW_PL_PREPARE:
>>>>>>> +        if ((s->perm | new_perm) == s->perm &&
>>>>>>> +            (s->shared_perm & new_shared) == s->shared_perm)
>>>>>>> +        {
>>>>>>> +            /*
>>>>>>> +             * We are going to unlock bytes, it should not fail. If it fail due
>>>>>>> +             * to some fs-dependent permission-unrelated reasons (which occurs
>>>>>>> +             * sometimes on NFS and leads to abort in bdrv_replace_child) we
>>>>>>> +             * can't prevent such errors by any check here. And we ignore them
>>>>>>> +             * anyway in ABORT and COMMIT.
>>>>>>> +             */
>>>>>>> +            return 0;
>>>>>>> +        }
>>>>>>>           ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
>>>>>>>                                      ~s->shared_perm | ~new_shared,
>>>>>>>                                      false, errp);
>>>>>>
>>>>>> Help me understand the exact issue, please.  I understand that there are
>>>>>> operations like bdrv_replace_child() that pass &error_abort to
>>>>>> bdrv_check_perm() because they just loosen the permissions, so it should
>>>>>> not fail.
>>>>>>
>>>>>> However, if the whole effect really would be to loosen permissions,
>>>>>> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway:
>>>>>> @unlock is passed as false, so no bytes will be unlocked.  And if
>>>>>> permissions are just loosened (as your condition checks), it should not
>>>>>> lock any bytes.
>>>>>>
>>>>>> So why does it attempt lock any bytes in the first place?  There must be
>>>>>> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm
>>>>>> and s->locked_shared_perm.  How does that occur?
>>>>>
>>>>> I suppose raw_check_lock_bytes() is what is failing, not
>>>>> raw_apply_lock_bytes().
>>>>
>>>> Hm, maybe in Vladimir's case, but not in e.g.
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652572 .
>>>
>>> This is reported against 3.0, which didn't avoid re-locking permissions
>>> that we already hold, so there raw_apply_lock_bytes() can still fail.
>>
>> That makes sense.  Which leaves the question why Vladimir still seems to
>> see the error there...?
>>
> 
> I'm sorry :(. I'm trying to fix bug based on 2.10, and now I see that is already fixed
>   upstream. I don't have a reproducer, only old coredumps.
> 
> So, now it looks like we don't need this patch, as on permission loosening file-posix
> don't call any FS apis, yes?
> 


Ah, you mentioned, that raw_check_lock_bytes is still buggy.


-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes

Posted by Kevin Wolf 6 years, 10 months ago

Am 29.03.2019 um 19:00 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 29.03.2019 20:58, Vladimir Sementsov-Ogievskiy wrote:
> > 29.03.2019 20:44, Max Reitz wrote:
> >> On 29.03.19 18:40, Kevin Wolf wrote:
> >>> Am 29.03.2019 um 18:30 hat Max Reitz geschrieben:
> >>>> On 29.03.19 18:24, Kevin Wolf wrote:
> >>>>> Am 29.03.2019 um 18:15 hat Max Reitz geschrieben:
> >>>>>> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote:
> >>>>>>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
> >>>>>>> loosening permissions. However file-locking operations may fail even
> >>>>>>> in this case, for example on NFS. And this leads to Qemu crash.
> >>>>>>>
> >>>>>>> Let's avoid such errors. Note, that we ignore such things anyway on
> >>>>>>> permission update commit and abort.
> >>>>>>>
> >>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >>>>>>> ---
> >>>>>>>   block/file-posix.c | 12 ++++++++++++
> >>>>>>>   1 file changed, 12 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/block/file-posix.c b/block/file-posix.c
> >>>>>>> index db4cccbe51..1cf4ee49eb 100644
> >>>>>>> --- a/block/file-posix.c
> >>>>>>> +++ b/block/file-posix.c
> >>>>>>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
> >>>>>>>       switch (op) {
> >>>>>>>       case RAW_PL_PREPARE:
> >>>>>>> +        if ((s->perm | new_perm) == s->perm &&
> >>>>>>> +            (s->shared_perm & new_shared) == s->shared_perm)
> >>>>>>> +        {
> >>>>>>> +            /*
> >>>>>>> +             * We are going to unlock bytes, it should not fail. If it fail due
> >>>>>>> +             * to some fs-dependent permission-unrelated reasons (which occurs
> >>>>>>> +             * sometimes on NFS and leads to abort in bdrv_replace_child) we
> >>>>>>> +             * can't prevent such errors by any check here. And we ignore them
> >>>>>>> +             * anyway in ABORT and COMMIT.
> >>>>>>> +             */
> >>>>>>> +            return 0;
> >>>>>>> +        }
> >>>>>>>           ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
> >>>>>>>                                      ~s->shared_perm | ~new_shared,
> >>>>>>>                                      false, errp);
> >>>>>>
> >>>>>> Help me understand the exact issue, please.  I understand that there are
> >>>>>> operations like bdrv_replace_child() that pass &error_abort to
> >>>>>> bdrv_check_perm() because they just loosen the permissions, so it should
> >>>>>> not fail.
> >>>>>>
> >>>>>> However, if the whole effect really would be to loosen permissions,
> >>>>>> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway:
> >>>>>> @unlock is passed as false, so no bytes will be unlocked.  And if
> >>>>>> permissions are just loosened (as your condition checks), it should not
> >>>>>> lock any bytes.
> >>>>>>
> >>>>>> So why does it attempt lock any bytes in the first place?  There must be
> >>>>>> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm
> >>>>>> and s->locked_shared_perm.  How does that occur?
> >>>>>
> >>>>> I suppose raw_check_lock_bytes() is what is failing, not
> >>>>> raw_apply_lock_bytes().
> >>>>
> >>>> Hm, maybe in Vladimir's case, but not in e.g.
> >>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652572 .
> >>>
> >>> This is reported against 3.0, which didn't avoid re-locking permissions
> >>> that we already hold, so there raw_apply_lock_bytes() can still fail.
> >>
> >> That makes sense.  Which leaves the question why Vladimir still seems to
> >> see the error there...?
> >>
> > 
> > I'm sorry :(. I'm trying to fix bug based on 2.10, and now I see that is already fixed
> >   upstream. I don't have a reproducer, only old coredumps.
> > 
> > So, now it looks like we don't need this patch, as on permission loosening file-posix
> > don't call any FS apis, yes?
> > 
> 
> 
> Ah, you mentioned, that raw_check_lock_bytes is still buggy.

I haven't tried it out, but from looking at the code it seems so. Maybe
you can reproduce on master just to be sure?

Kevin

Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes

Posted by Vladimir Sementsov-Ogievskiy 6 years, 10 months ago

29.03.2019 22:32, Kevin Wolf wrote:
> Am 29.03.2019 um 19:00 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 29.03.2019 20:58, Vladimir Sementsov-Ogievskiy wrote:
>>> 29.03.2019 20:44, Max Reitz wrote:
>>>> On 29.03.19 18:40, Kevin Wolf wrote:
>>>>> Am 29.03.2019 um 18:30 hat Max Reitz geschrieben:
>>>>>> On 29.03.19 18:24, Kevin Wolf wrote:
>>>>>>> Am 29.03.2019 um 18:15 hat Max Reitz geschrieben:
>>>>>>>> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
>>>>>>>>> loosening permissions. However file-locking operations may fail even
>>>>>>>>> in this case, for example on NFS. And this leads to Qemu crash.
>>>>>>>>>
>>>>>>>>> Let's avoid such errors. Note, that we ignore such things anyway on
>>>>>>>>> permission update commit and abort.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>>>>> ---
>>>>>>>>>    block/file-posix.c | 12 ++++++++++++
>>>>>>>>>    1 file changed, 12 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>>>>>>>> index db4cccbe51..1cf4ee49eb 100644
>>>>>>>>> --- a/block/file-posix.c
>>>>>>>>> +++ b/block/file-posix.c
>>>>>>>>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
>>>>>>>>>        switch (op) {
>>>>>>>>>        case RAW_PL_PREPARE:
>>>>>>>>> +        if ((s->perm | new_perm) == s->perm &&
>>>>>>>>> +            (s->shared_perm & new_shared) == s->shared_perm)
>>>>>>>>> +        {
>>>>>>>>> +            /*
>>>>>>>>> +             * We are going to unlock bytes, it should not fail. If it fail due
>>>>>>>>> +             * to some fs-dependent permission-unrelated reasons (which occurs
>>>>>>>>> +             * sometimes on NFS and leads to abort in bdrv_replace_child) we
>>>>>>>>> +             * can't prevent such errors by any check here. And we ignore them
>>>>>>>>> +             * anyway in ABORT and COMMIT.
>>>>>>>>> +             */
>>>>>>>>> +            return 0;
>>>>>>>>> +        }
>>>>>>>>>            ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
>>>>>>>>>                                       ~s->shared_perm | ~new_shared,
>>>>>>>>>                                       false, errp);
>>>>>>>>
>>>>>>>> Help me understand the exact issue, please.  I understand that there are
>>>>>>>> operations like bdrv_replace_child() that pass &error_abort to
>>>>>>>> bdrv_check_perm() because they just loosen the permissions, so it should
>>>>>>>> not fail.
>>>>>>>>
>>>>>>>> However, if the whole effect really would be to loosen permissions,
>>>>>>>> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway:
>>>>>>>> @unlock is passed as false, so no bytes will be unlocked.  And if
>>>>>>>> permissions are just loosened (as your condition checks), it should not
>>>>>>>> lock any bytes.
>>>>>>>>
>>>>>>>> So why does it attempt lock any bytes in the first place?  There must be
>>>>>>>> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm
>>>>>>>> and s->locked_shared_perm.  How does that occur?
>>>>>>>
>>>>>>> I suppose raw_check_lock_bytes() is what is failing, not
>>>>>>> raw_apply_lock_bytes().
>>>>>>
>>>>>> Hm, maybe in Vladimir's case, but not in e.g.
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652572 .
>>>>>
>>>>> This is reported against 3.0, which didn't avoid re-locking permissions
>>>>> that we already hold, so there raw_apply_lock_bytes() can still fail.
>>>>
>>>> That makes sense.  Which leaves the question why Vladimir still seems to
>>>> see the error there...?
>>>>
>>>
>>> I'm sorry :(. I'm trying to fix bug based on 2.10, and now I see that is already fixed
>>>    upstream. I don't have a reproducer, only old coredumps.
>>>
>>> So, now it looks like we don't need this patch, as on permission loosening file-posix
>>> don't call any FS apis, yes?
>>>
>>
>>
>> Ah, you mentioned, that raw_check_lock_bytes is still buggy.
> 
> I haven't tried it out, but from looking at the code it seems so. Maybe
> you can reproduce on master just to be sure?
> 

I don't have a reproducer :(

-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes

Posted by Max Reitz 6 years, 10 months ago

On 01.04.19 09:21, Vladimir Sementsov-Ogievskiy wrote:
> 29.03.2019 22:32, Kevin Wolf wrote:
>> Am 29.03.2019 um 19:00 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> 29.03.2019 20:58, Vladimir Sementsov-Ogievskiy wrote:
>>>> 29.03.2019 20:44, Max Reitz wrote:
>>>>> On 29.03.19 18:40, Kevin Wolf wrote:
>>>>>> Am 29.03.2019 um 18:30 hat Max Reitz geschrieben:
>>>>>>> On 29.03.19 18:24, Kevin Wolf wrote:
>>>>>>>> Am 29.03.2019 um 18:15 hat Max Reitz geschrieben:
>>>>>>>>> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>>>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
>>>>>>>>>> loosening permissions. However file-locking operations may fail even
>>>>>>>>>> in this case, for example on NFS. And this leads to Qemu crash.
>>>>>>>>>>
>>>>>>>>>> Let's avoid such errors. Note, that we ignore such things anyway on
>>>>>>>>>> permission update commit and abort.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>>>>>> ---
>>>>>>>>>>    block/file-posix.c | 12 ++++++++++++
>>>>>>>>>>    1 file changed, 12 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>>>>>>>>> index db4cccbe51..1cf4ee49eb 100644
>>>>>>>>>> --- a/block/file-posix.c
>>>>>>>>>> +++ b/block/file-posix.c
>>>>>>>>>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
>>>>>>>>>>        switch (op) {
>>>>>>>>>>        case RAW_PL_PREPARE:
>>>>>>>>>> +        if ((s->perm | new_perm) == s->perm &&
>>>>>>>>>> +            (s->shared_perm & new_shared) == s->shared_perm)
>>>>>>>>>> +        {
>>>>>>>>>> +            /*
>>>>>>>>>> +             * We are going to unlock bytes, it should not fail. If it fail due
>>>>>>>>>> +             * to some fs-dependent permission-unrelated reasons (which occurs
>>>>>>>>>> +             * sometimes on NFS and leads to abort in bdrv_replace_child) we
>>>>>>>>>> +             * can't prevent such errors by any check here. And we ignore them
>>>>>>>>>> +             * anyway in ABORT and COMMIT.
>>>>>>>>>> +             */
>>>>>>>>>> +            return 0;
>>>>>>>>>> +        }
>>>>>>>>>>            ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
>>>>>>>>>>                                       ~s->shared_perm | ~new_shared,
>>>>>>>>>>                                       false, errp);
>>>>>>>>>
>>>>>>>>> Help me understand the exact issue, please.  I understand that there are
>>>>>>>>> operations like bdrv_replace_child() that pass &error_abort to
>>>>>>>>> bdrv_check_perm() because they just loosen the permissions, so it should
>>>>>>>>> not fail.
>>>>>>>>>
>>>>>>>>> However, if the whole effect really would be to loosen permissions,
>>>>>>>>> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway:
>>>>>>>>> @unlock is passed as false, so no bytes will be unlocked.  And if
>>>>>>>>> permissions are just loosened (as your condition checks), it should not
>>>>>>>>> lock any bytes.
>>>>>>>>>
>>>>>>>>> So why does it attempt lock any bytes in the first place?  There must be
>>>>>>>>> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm
>>>>>>>>> and s->locked_shared_perm.  How does that occur?
>>>>>>>>
>>>>>>>> I suppose raw_check_lock_bytes() is what is failing, not
>>>>>>>> raw_apply_lock_bytes().
>>>>>>>
>>>>>>> Hm, maybe in Vladimir's case, but not in e.g.
>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652572 .
>>>>>>
>>>>>> This is reported against 3.0, which didn't avoid re-locking permissions
>>>>>> that we already hold, so there raw_apply_lock_bytes() can still fail.
>>>>>
>>>>> That makes sense.  Which leaves the question why Vladimir still seems to
>>>>> see the error there...?
>>>>>
>>>>
>>>> I'm sorry :(. I'm trying to fix bug based on 2.10, and now I see that is already fixed
>>>>    upstream. I don't have a reproducer, only old coredumps.
>>>>
>>>> So, now it looks like we don't need this patch, as on permission loosening file-posix
>>>> don't call any FS apis, yes?
>>>>
>>>
>>>
>>> Ah, you mentioned, that raw_check_lock_bytes is still buggy.
>>
>> I haven't tried it out, but from looking at the code it seems so. Maybe
>> you can reproduce on master just to be sure?
>>
> 
> I don't have a reproducer :(

I have one, but it only breaks before
2996ffad3acabe890fbb4f84a069cdc325a68108:

First, setup on an NFS mount on /mnt/nfs.  Second:

$ qemu-img create -f qcow2 /mnt/nfs/foo.qcow2 64M
Formatting '/mnt/nfs/foo.qcow2', fmt=qcow2 size=67108864
cluster_size=65536 lazy_refcounts=off refcount_bits=16
$ (sleep 5; echo "{'execute':'qmp_capabilities'}"; \
   echo "{'execute':'blockdev-del','arguments':{'node-name':'fmt'}}";
   echo "{'execute':'quit'}") \
  | x86_64-softmmu/qemu-system-x86_64 -qmp stdio \
    -blockdev node-name=proto,driver=file,filename=/mnt/nfs/foo.qcow2 \
    -blockdev node-name=fmt,driver=qcow2,file=proto
{"QMP": {"version": {"qemu": {"micro": 90, "minor": 0, "major": 3},
"package": "v3.1.0-rc0-71-ga883d6a0bc"}, "capabilities": []}}

Before the sleep is done, stop the service on the NFS host:

$ systemctl stop nfs-service

Once the sleep has run out (you get a {"return": {}} over QMP), start
the service again:

$ systemctl start nfs-service

And then this happens:

Unexpected error in raw_apply_lock_bytes() at block/file-posix.c:705:
Failed to lock byte 100
[1]    30486 done                 ( sleep 5; echo
"{'execute':'qmp_capabilities'}"; echo ; echo ; ) |
       30487 abort (core dumped)  x86_64-softmmu/qemu-system-x86_64 -qmp
stdio -blockdev  -blockdev

It works fine after 2996ffad3acabe890fbb4f84a069cdc325a68108.

Max

Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes

Posted by Max Reitz 6 years, 10 months ago

On 03.04.19 18:41, Max Reitz wrote:
> On 01.04.19 09:21, Vladimir Sementsov-Ogievskiy wrote:
>> 29.03.2019 22:32, Kevin Wolf wrote:
>>> Am 29.03.2019 um 19:00 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>> 29.03.2019 20:58, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 29.03.2019 20:44, Max Reitz wrote:
>>>>>> On 29.03.19 18:40, Kevin Wolf wrote:
>>>>>>> Am 29.03.2019 um 18:30 hat Max Reitz geschrieben:
>>>>>>>> On 29.03.19 18:24, Kevin Wolf wrote:
>>>>>>>>> Am 29.03.2019 um 18:15 hat Max Reitz geschrieben:
>>>>>>>>>> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>>>>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
>>>>>>>>>>> loosening permissions. However file-locking operations may fail even
>>>>>>>>>>> in this case, for example on NFS. And this leads to Qemu crash.
>>>>>>>>>>>
>>>>>>>>>>> Let's avoid such errors. Note, that we ignore such things anyway on
>>>>>>>>>>> permission update commit and abort.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>>>>>>> ---
>>>>>>>>>>>    block/file-posix.c | 12 ++++++++++++
>>>>>>>>>>>    1 file changed, 12 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>>>>>>>>>> index db4cccbe51..1cf4ee49eb 100644
>>>>>>>>>>> --- a/block/file-posix.c
>>>>>>>>>>> +++ b/block/file-posix.c
>>>>>>>>>>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
>>>>>>>>>>>        switch (op) {
>>>>>>>>>>>        case RAW_PL_PREPARE:
>>>>>>>>>>> +        if ((s->perm | new_perm) == s->perm &&
>>>>>>>>>>> +            (s->shared_perm & new_shared) == s->shared_perm)
>>>>>>>>>>> +        {
>>>>>>>>>>> +            /*
>>>>>>>>>>> +             * We are going to unlock bytes, it should not fail. If it fail due
>>>>>>>>>>> +             * to some fs-dependent permission-unrelated reasons (which occurs
>>>>>>>>>>> +             * sometimes on NFS and leads to abort in bdrv_replace_child) we
>>>>>>>>>>> +             * can't prevent such errors by any check here. And we ignore them
>>>>>>>>>>> +             * anyway in ABORT and COMMIT.
>>>>>>>>>>> +             */
>>>>>>>>>>> +            return 0;
>>>>>>>>>>> +        }
>>>>>>>>>>>            ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
>>>>>>>>>>>                                       ~s->shared_perm | ~new_shared,
>>>>>>>>>>>                                       false, errp);
>>>>>>>>>>
>>>>>>>>>> Help me understand the exact issue, please.  I understand that there are
>>>>>>>>>> operations like bdrv_replace_child() that pass &error_abort to
>>>>>>>>>> bdrv_check_perm() because they just loosen the permissions, so it should
>>>>>>>>>> not fail.
>>>>>>>>>>
>>>>>>>>>> However, if the whole effect really would be to loosen permissions,
>>>>>>>>>> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway:
>>>>>>>>>> @unlock is passed as false, so no bytes will be unlocked.  And if
>>>>>>>>>> permissions are just loosened (as your condition checks), it should not
>>>>>>>>>> lock any bytes.
>>>>>>>>>>
>>>>>>>>>> So why does it attempt lock any bytes in the first place?  There must be
>>>>>>>>>> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm
>>>>>>>>>> and s->locked_shared_perm.  How does that occur?
>>>>>>>>>
>>>>>>>>> I suppose raw_check_lock_bytes() is what is failing, not
>>>>>>>>> raw_apply_lock_bytes().
>>>>>>>>
>>>>>>>> Hm, maybe in Vladimir's case, but not in e.g.
>>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1652572 .
>>>>>>>
>>>>>>> This is reported against 3.0, which didn't avoid re-locking permissions
>>>>>>> that we already hold, so there raw_apply_lock_bytes() can still fail.
>>>>>>
>>>>>> That makes sense.  Which leaves the question why Vladimir still seems to
>>>>>> see the error there...?
>>>>>>
>>>>>
>>>>> I'm sorry :(. I'm trying to fix bug based on 2.10, and now I see that is already fixed
>>>>>    upstream. I don't have a reproducer, only old coredumps.
>>>>>
>>>>> So, now it looks like we don't need this patch, as on permission loosening file-posix
>>>>> don't call any FS apis, yes?
>>>>>
>>>>
>>>>
>>>> Ah, you mentioned, that raw_check_lock_bytes is still buggy.
>>>
>>> I haven't tried it out, but from looking at the code it seems so. Maybe
>>> you can reproduce on master just to be sure?
>>>
>>
>> I don't have a reproducer :(
> 
> I have one, but it only breaks before
> 2996ffad3acabe890fbb4f84a069cdc325a68108:
> 
> First, setup on an NFS mount on /mnt/nfs.  Second:
> 
> $ qemu-img create -f qcow2 /mnt/nfs/foo.qcow2 64M
> Formatting '/mnt/nfs/foo.qcow2', fmt=qcow2 size=67108864
> cluster_size=65536 lazy_refcounts=off refcount_bits=16
> $ (sleep 5; echo "{'execute':'qmp_capabilities'}"; \
>    echo "{'execute':'blockdev-del','arguments':{'node-name':'fmt'}}";
>    echo "{'execute':'quit'}") \
>   | x86_64-softmmu/qemu-system-x86_64 -qmp stdio \
>     -blockdev node-name=proto,driver=file,filename=/mnt/nfs/foo.qcow2 \
>     -blockdev node-name=fmt,driver=qcow2,file=proto
> {"QMP": {"version": {"qemu": {"micro": 90, "minor": 0, "major": 3},
> "package": "v3.1.0-rc0-71-ga883d6a0bc"}, "capabilities": []}}
> 
> Before the sleep is done, stop the service on the NFS host:
> 
> $ systemctl stop nfs-service
> 
> Once the sleep has run out (you get a {"return": {}} over QMP), start
> the service again:
> 
> $ systemctl start nfs-service
> 
> And then this happens:
> 
> Unexpected error in raw_apply_lock_bytes() at block/file-posix.c:705:
> Failed to lock byte 100
> [1]    30486 done                 ( sleep 5; echo
> "{'execute':'qmp_capabilities'}"; echo ; echo ; ) |
>        30487 abort (core dumped)  x86_64-softmmu/qemu-system-x86_64 -qmp
> stdio -blockdev  -blockdev
> 
> It works fine after 2996ffad3acabe890fbb4f84a069cdc325a68108.

Now I have a reproducer that breaks before this patch here and works
afterwards: You just need two parents and delete one of them, so some
permissions stay taken.

So, we can do this:

$ (echo "{'execute':'qmp_capabilities'}"; \
   echo "{'execute':'nbd-server-start',
          'arguments':{'addr':{'type':'inet',
              'data':{'host':'0.0.0.0','port':'10809'}}}}"; \
   echo "{'execute':'nbd-server-add',
          'arguments':{'device':'proto'}}"; \
   sleep 5; \
   echo "{'execute':'nbd-server-stop'}"; \
   echo "{'execute':'quit'}") \
  | x86_64-softmmu/qemu-system-x86_64 -qmp stdio \
     -blockdev node-name=proto,driver=file,filename=/mnt/nfs/foo.img \
     -device virtio-blk,drive=proto
{"QMP": {"version": {"qemu": {"micro": 91, "minor": 1, "major": 3},
"package": "v4.0.0-rc1-74-g38e694fcc9"}, "capabilities": ["oob"]}}
{"return": {}}
{"return": {}}
{"return": {}}

Then immediately this on the NFS host:

$ sudo systemctl stop nfs-server; sleep 6; \
  sudo systemctl start nfs-server

And this happens on the client:

Unexpected error in raw_check_lock_bytes() at block/file-posix.c:775:
Failed to get "consistent read" lock
[1]    21289 done                 ( echo
"{'execute':'qmp_capabilities'}"; echo ; echo ; sleep 5; echo ; echo ;  |
       21290 abort (core dumped)  x86_64-softmmu/qemu-system-x86_64 -qmp
stdio -blockdev  -device

No issues after 696aaaed579ac5bf5fa336216909b46d3d8f07a8 (this patch here).

Max

Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes

Posted by Vladimir Sementsov-Ogievskiy 6 years, 10 months ago

29.03.2019 20:15, Max Reitz wrote:
> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote:
>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
>> loosening permissions. However file-locking operations may fail even
>> in this case, for example on NFS. And this leads to Qemu crash.
>>
>> Let's avoid such errors. Note, that we ignore such things anyway on
>> permission update commit and abort.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/file-posix.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/block/file-posix.c b/block/file-posix.c
>> index db4cccbe51..1cf4ee49eb 100644
>> --- a/block/file-posix.c
>> +++ b/block/file-posix.c
>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
>>   
>>       switch (op) {
>>       case RAW_PL_PREPARE:
>> +        if ((s->perm | new_perm) == s->perm &&
>> +            (s->shared_perm & new_shared) == s->shared_perm)
>> +        {
>> +            /*
>> +             * We are going to unlock bytes, it should not fail. If it fail due
>> +             * to some fs-dependent permission-unrelated reasons (which occurs
>> +             * sometimes on NFS and leads to abort in bdrv_replace_child) we
>> +             * can't prevent such errors by any check here. And we ignore them
>> +             * anyway in ABORT and COMMIT.
>> +             */
>> +            return 0;
>> +        }
>>           ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
>>                                      ~s->shared_perm | ~new_shared,
>>                                      false, errp);
> 
> Help me understand the exact issue, please.  I understand that there are
> operations like bdrv_replace_child() that pass &error_abort to
> bdrv_check_perm() because they just loosen the permissions, so it should
> not fail.
> 
> However, if the whole effect really would be to loosen permissions,
> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway:
> @unlock is passed as false, so no bytes will be unlocked.  And if
> permissions are just loosened (as your condition checks), it should not
> lock any bytes.

but it does qemu_lock_fd unconditionally on any locked byte (it may be loosening,
but not of all bytes). and fcntl is called, which may fail.

> 
> So why does it attempt lock any bytes in the first place?  There must be
> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm
> and s->locked_shared_perm.  How does that occur?
> 
> Max
> 


-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes

Posted by Max Reitz 6 years, 10 months ago

On 29.03.19 18:31, Vladimir Sementsov-Ogievskiy wrote:
> 29.03.2019 20:15, Max Reitz wrote:
>> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote:
>>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
>>> loosening permissions. However file-locking operations may fail even
>>> in this case, for example on NFS. And this leads to Qemu crash.
>>>
>>> Let's avoid such errors. Note, that we ignore such things anyway on
>>> permission update commit and abort.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/file-posix.c | 12 ++++++++++++
>>>   1 file changed, 12 insertions(+)
>>>
>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>> index db4cccbe51..1cf4ee49eb 100644
>>> --- a/block/file-posix.c
>>> +++ b/block/file-posix.c
>>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
>>>   
>>>       switch (op) {
>>>       case RAW_PL_PREPARE:
>>> +        if ((s->perm | new_perm) == s->perm &&
>>> +            (s->shared_perm & new_shared) == s->shared_perm)
>>> +        {
>>> +            /*
>>> +             * We are going to unlock bytes, it should not fail. If it fail due
>>> +             * to some fs-dependent permission-unrelated reasons (which occurs
>>> +             * sometimes on NFS and leads to abort in bdrv_replace_child) we
>>> +             * can't prevent such errors by any check here. And we ignore them
>>> +             * anyway in ABORT and COMMIT.
>>> +             */
>>> +            return 0;
>>> +        }
>>>           ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
>>>                                      ~s->shared_perm | ~new_shared,
>>>                                      false, errp);
>>
>> Help me understand the exact issue, please.  I understand that there are
>> operations like bdrv_replace_child() that pass &error_abort to
>> bdrv_check_perm() because they just loosen the permissions, so it should
>> not fail.
>>
>> However, if the whole effect really would be to loosen permissions,
>> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway:
>> @unlock is passed as false, so no bytes will be unlocked.  And if
>> permissions are just loosened (as your condition checks), it should not
>> lock any bytes.
> 
> but it does qemu_lock_fd unconditionally on any locked byte (it may be loosening,
> but not of all bytes).

It only locks bytes that aren't already locked ("!(locked_perm & bit)").
 So if (s->perm | new_perm) & ~s->locked_perm == 0 (same for shared), it
won't lock anything.  That's the same as s->perm | new_perm == s->perm
and s->perm == s->locked_perm.  So if this patch prevents qemu_lock_fd()
errors in raw_apply_lock_bytes(), s->perm must be different from
s->locked_perm.

> and fcntl is called, which may fail.

Hm, yeah, and as Kevin said, raw_check_lock_bytes() if new_perm and
new_shared are already covered (though this could be fixed by passing
new_perm & ~s->perm and... new_shared | ~s->shared_perm? to
raw_check_lock_bytes).

It just seems to me that we sometimes do end up in a position where
s->perm and s->locked_perm (or ~s->shared_perm and
s->locked_shared_perm) do not align, which should not be happening: This
would mean that the block layer thinks the permission locks are taken,
but file-posix disagrees.

Max

>> So why does it attempt lock any bytes in the first place?  There must be
>> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm
>> and s->locked_shared_perm.  How does that occur?
>>
>> Max
>>
> 
>

Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes

Posted by Vladimir Sementsov-Ogievskiy 6 years, 10 months ago

29.03.2019 20:39, Max Reitz wrote:
> On 29.03.19 18:31, Vladimir Sementsov-Ogievskiy wrote:
>> 29.03.2019 20:15, Max Reitz wrote:
>>> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote:
>>>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
>>>> loosening permissions. However file-locking operations may fail even
>>>> in this case, for example on NFS. And this leads to Qemu crash.
>>>>
>>>> Let's avoid such errors. Note, that we ignore such things anyway on
>>>> permission update commit and abort.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>    block/file-posix.c | 12 ++++++++++++
>>>>    1 file changed, 12 insertions(+)
>>>>
>>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>>> index db4cccbe51..1cf4ee49eb 100644
>>>> --- a/block/file-posix.c
>>>> +++ b/block/file-posix.c
>>>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
>>>>    
>>>>        switch (op) {
>>>>        case RAW_PL_PREPARE:
>>>> +        if ((s->perm | new_perm) == s->perm &&
>>>> +            (s->shared_perm & new_shared) == s->shared_perm)
>>>> +        {
>>>> +            /*
>>>> +             * We are going to unlock bytes, it should not fail. If it fail due
>>>> +             * to some fs-dependent permission-unrelated reasons (which occurs
>>>> +             * sometimes on NFS and leads to abort in bdrv_replace_child) we
>>>> +             * can't prevent such errors by any check here. And we ignore them
>>>> +             * anyway in ABORT and COMMIT.
>>>> +             */
>>>> +            return 0;
>>>> +        }
>>>>            ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
>>>>                                       ~s->shared_perm | ~new_shared,
>>>>                                       false, errp);
>>>
>>> Help me understand the exact issue, please.  I understand that there are
>>> operations like bdrv_replace_child() that pass &error_abort to
>>> bdrv_check_perm() because they just loosen the permissions, so it should
>>> not fail.
>>>
>>> However, if the whole effect really would be to loosen permissions,
>>> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway:
>>> @unlock is passed as false, so no bytes will be unlocked.  And if
>>> permissions are just loosened (as your condition checks), it should not
>>> lock any bytes.
>>
>> but it does qemu_lock_fd unconditionally on any locked byte (it may be loosening,
>> but not of all bytes).
> 
> It only locks bytes that aren't already locked ("!(locked_perm & bit)").
>   So if (s->perm | new_perm) & ~s->locked_perm == 0 (same for shared), it

Oops, missed that. We don't have this code in rhel base..

> won't lock anything.  That's the same as s->perm | new_perm == s->perm
> and s->perm == s->locked_perm.  So if this patch prevents qemu_lock_fd()
> errors in raw_apply_lock_bytes(), s->perm must be different from
> s->locked_perm.
> 
>> and fcntl is called, which may fail.
> 
> Hm, yeah, and as Kevin said, raw_check_lock_bytes() if new_perm and
> new_shared are already covered (though this could be fixed by passing
> new_perm & ~s->perm and... new_shared | ~s->shared_perm? to
> raw_check_lock_bytes).
> 
> It just seems to me that we sometimes do end up in a position where
> s->perm and s->locked_perm (or ~s->shared_perm and
> s->locked_shared_perm) do not align, which should not be happening: This
> would mean that the block layer thinks the permission locks are taken,
> but file-posix disagrees.

So if it is it's possibly not my case, as all my crashes are on old code, which
don't have this additional checking  in raw_apply_lock_bytes

> 
> Max
> 
>>> So why does it attempt lock any bytes in the first place?  There must be
>>> some discrepancy between s->perm and s->locked_perm, or ~s->shared_perm
>>> and s->locked_shared_perm.  How does that occur?
>>>
>>> Max
>>>
>>
>>
> 
> 


-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH v3] block/file-posix: do not fail on unlock bytes

Posted by Max Reitz 6 years, 10 months ago

On 29.03.19 18:54, Vladimir Sementsov-Ogievskiy wrote:
> 29.03.2019 20:39, Max Reitz wrote:
>> On 29.03.19 18:31, Vladimir Sementsov-Ogievskiy wrote:
>>> 29.03.2019 20:15, Max Reitz wrote:
>>>> On 29.03.19 12:04, Vladimir Sementsov-Ogievskiy wrote:
>>>>> bdrv_replace_child() calls bdrv_check_perm() with error_abort on
>>>>> loosening permissions. However file-locking operations may fail even
>>>>> in this case, for example on NFS. And this leads to Qemu crash.
>>>>>
>>>>> Let's avoid such errors. Note, that we ignore such things anyway on
>>>>> permission update commit and abort.
>>>>>
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>> ---
>>>>>    block/file-posix.c | 12 ++++++++++++
>>>>>    1 file changed, 12 insertions(+)
>>>>>
>>>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>>>> index db4cccbe51..1cf4ee49eb 100644
>>>>> --- a/block/file-posix.c
>>>>> +++ b/block/file-posix.c
>>>>> @@ -815,6 +815,18 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
>>>>>    
>>>>>        switch (op) {
>>>>>        case RAW_PL_PREPARE:
>>>>> +        if ((s->perm | new_perm) == s->perm &&
>>>>> +            (s->shared_perm & new_shared) == s->shared_perm)
>>>>> +        {
>>>>> +            /*
>>>>> +             * We are going to unlock bytes, it should not fail. If it fail due
>>>>> +             * to some fs-dependent permission-unrelated reasons (which occurs
>>>>> +             * sometimes on NFS and leads to abort in bdrv_replace_child) we
>>>>> +             * can't prevent such errors by any check here. And we ignore them
>>>>> +             * anyway in ABORT and COMMIT.
>>>>> +             */
>>>>> +            return 0;
>>>>> +        }
>>>>>            ret = raw_apply_lock_bytes(s, s->fd, s->perm | new_perm,
>>>>>                                       ~s->shared_perm | ~new_shared,
>>>>>                                       false, errp);
>>>>
>>>> Help me understand the exact issue, please.  I understand that there are
>>>> operations like bdrv_replace_child() that pass &error_abort to
>>>> bdrv_check_perm() because they just loosen the permissions, so it should
>>>> not fail.
>>>>
>>>> However, if the whole effect really would be to loosen permissions,
>>>> raw_apply_lock_bytes() wouldn't have failed here in PREPARE anyway:
>>>> @unlock is passed as false, so no bytes will be unlocked.  And if
>>>> permissions are just loosened (as your condition checks), it should not
>>>> lock any bytes.
>>>
>>> but it does qemu_lock_fd unconditionally on any locked byte (it may be loosening,
>>> but not of all bytes).
>>
>> It only locks bytes that aren't already locked ("!(locked_perm & bit)").
>>   So if (s->perm | new_perm) & ~s->locked_perm == 0 (same for shared), it
> 
> Oops, missed that. We don't have this code in rhel base..
> 
>> won't lock anything.  That's the same as s->perm | new_perm == s->perm
>> and s->perm == s->locked_perm.  So if this patch prevents qemu_lock_fd()
>> errors in raw_apply_lock_bytes(), s->perm must be different from
>> s->locked_perm.
>>
>>> and fcntl is called, which may fail.
>>
>> Hm, yeah, and as Kevin said, raw_check_lock_bytes() if new_perm and
>> new_shared are already covered (though this could be fixed by passing
>> new_perm & ~s->perm and... new_shared | ~s->shared_perm? to
>> raw_check_lock_bytes).
>>
>> It just seems to me that we sometimes do end up in a position where
>> s->perm and s->locked_perm (or ~s->shared_perm and
>> s->locked_shared_perm) do not align, which should not be happening: This
>> would mean that the block layer thinks the permission locks are taken,
>> but file-posix disagrees.
> 
> So if it is it's possibly not my case, as all my crashes are on old code, which
> don't have this additional checking  in raw_apply_lock_bytes

OK, that solves the mystery. :-)

Now I wonder whether it would be better to make raw_check_lock_bytes()
skip the bytes that have been locked already instead of skipping both
calls altogether.

OTOH, there should be no functional difference, so why not just keep the
patch that's already here...

Max