file-posix: make lock_fd read-only

[Qemu-devel] [PATCH RFC] file-posix: make lock_fd read-only

Posted by Vladimir Sementsov-Ogievskiy 8 years, 4 months ago

We do not reopen lock_fd on bdrv_reopen which leads to problems on
reopen image RO. So, lets make lock_fd be always RO.
This is correct, because qemu_lock_fd always called with exclusive=false
on lock_fd.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---

Hi all!

We've faced the following problem with our shared-storage migration
scheme. We make an external snapshot and need base image to be reopened
RO. However, bdrv_reopen reopens only .fd of BDRVRawState but not
.lock_fd. So, .lock_fd is left opened RW and this breaks the whole
thing.

The simple fix is here: let's just open lock_fd as RO always. This
looks fine for current code, as we never try to set write locks
(qemu_lock_fd always called with exclusive=false).

However it will not work if we are going to use write locks.


 block/file-posix.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 36ee89e940..5c52df9174 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -514,7 +514,8 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
 
     s->lock_fd = -1;
     if (s->use_lock) {
-        fd = qemu_open(filename, s->open_flags);
+        /* open it read-only, as we do not reopen it on bdrv_reopen */
+        fd = qemu_open(filename, (s->open_flags & ~BDRV_O_RDWR));
         if (fd < 0) {
             ret = -errno;
             error_setg_errno(errp, errno, "Could not open '%s' for locking",
-- 
2.11.1

Re: [Qemu-devel] [PATCH RFC] file-posix: make lock_fd read-only

Posted by Eric Blake 8 years, 4 months ago

On 10/10/2017 08:42 AM, Vladimir Sementsov-Ogievskiy wrote:
> We do not reopen lock_fd on bdrv_reopen which leads to problems on
> reopen image RO. So, lets make lock_fd be always RO.
> This is correct, because qemu_lock_fd always called with exclusive=false
> on lock_fd.

How is that correct?  file-posix.c calls:
            ret = qemu_lock_fd_test(s->lock_fd, off, 1, true);
where exclusive being true in turn sets:
        .l_type   = exclusive ? F_WRLCK : F_RDLCK,

and F_WRLCK requests fail on a RO fd with EBADF.

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> 
> Hi all!
> 
> We've faced the following problem with our shared-storage migration
> scheme. We make an external snapshot and need base image to be reopened
> RO. However, bdrv_reopen reopens only .fd of BDRVRawState but not
> .lock_fd. So, .lock_fd is left opened RW and this breaks the whole
> thing.

What exactly breaks?  Any specific error message, or a backtrace, or
something with more details?

> 
> The simple fix is here: let's just open lock_fd as RO always. This
> looks fine for current code, as we never try to set write locks
> (qemu_lock_fd always called with exclusive=false).

I just argued that we DO try to set write locks in file-posix, and
therefore, we need more details of what the real problem is, because
this does not feel like the right fix.

> 
> However it will not work if we are going to use write locks.
> 
> 
>  block/file-posix.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 36ee89e940..5c52df9174 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -514,7 +514,8 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
>  
>      s->lock_fd = -1;
>      if (s->use_lock) {
> -        fd = qemu_open(filename, s->open_flags);
> +        /* open it read-only, as we do not reopen it on bdrv_reopen */
> +        fd = qemu_open(filename, (s->open_flags & ~BDRV_O_RDWR));
>          if (fd < 0) {
>              ret = -errno;
>              error_setg_errno(errp, errno, "Could not open '%s' for locking",
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH RFC] file-posix: make lock_fd read-only

Posted by Denis V. Lunev 8 years, 4 months ago

On 10/10/2017 09:50 PM, Eric Blake wrote:
> On 10/10/2017 08:42 AM, Vladimir Sementsov-Ogievskiy wrote:
>> We do not reopen lock_fd on bdrv_reopen which leads to problems on
>> reopen image RO. So, lets make lock_fd be always RO.
>> This is correct, because qemu_lock_fd always called with exclusive=false
>> on lock_fd.
> How is that correct?  file-posix.c calls:
>             ret = qemu_lock_fd_test(s->lock_fd, off, 1, true);
> where exclusive being true in turn sets:
>         .l_type   = exclusive ? F_WRLCK : F_RDLCK,
>
> and F_WRLCK requests fail on a RO fd with EBADF.
this works fine for us as here we do not get the lock but just
query it.

#include <stdio.h>
#include <fcntl.h>
#include <sys/types.h>
#include <unistd.h>

int main()
{
        int fd = open("file", O_RDONLY | O_CREAT, 0666);
        struct flock fl_rd = {
                .l_whence = SEEK_SET,
                .l_start  = 0,
                .l_len    = 1,
                .l_type   = F_RDLCK,
        };
        struct flock fl_wr = {
                .l_whence = SEEK_SET,
                .l_start  = 0,
                .l_len    = 1,
                .l_type   = F_WRLCK,
        };

        ftruncate(fd, 1024);
        fcntl(fd, F_SETLK, &fl_rd);
        fcntl(fd, F_GETLK, &fl_wr);
        sleep(1000);
        return 0;
}

The first process:
open("file", O_RDONLY|O_CREAT, 0666)    = 3
ftruncate(3, 1024)                      = -1 EINVAL (Invalid argument)
fcntl(3, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=0,
l_len=1}) = 0
fcntl(3, F_GETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0,
l_len=1, l_pid=0}) = 0
nanosleep({1000, 0},

The second process:
open("file", O_RDONLY|O_CREAT, 0666)    = 3
ftruncate(3, 1024)                      = -1 EINVAL (Invalid argument)
fcntl(3, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=0,
l_len=1}) = 0
fcntl(3, F_GETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=0,
l_len=1, l_pid=19540}) = 0
nanosleep({1000, 0},

The key is that in test cod we do not _set_ the lock, but query it! This
is allowed even on
RDONLY file descriptor.

Den

Re: [Qemu-devel] [PATCH RFC] file-posix: make lock_fd read-only

Posted by Eric Blake 8 years, 4 months ago

On 10/10/2017 02:30 PM, Denis V. Lunev wrote:
> On 10/10/2017 09:50 PM, Eric Blake wrote:
>> On 10/10/2017 08:42 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> We do not reopen lock_fd on bdrv_reopen which leads to problems on
>>> reopen image RO. So, lets make lock_fd be always RO.
>>> This is correct, because qemu_lock_fd always called with exclusive=false
>>> on lock_fd.
>> How is that correct?  file-posix.c calls:
>>             ret = qemu_lock_fd_test(s->lock_fd, off, 1, true);
>> where exclusive being true in turn sets:
>>         .l_type   = exclusive ? F_WRLCK : F_RDLCK,
>>
>> and F_WRLCK requests fail on a RO fd with EBADF.
> this works fine for us as here we do not get the lock but just
> query it.
> 

>         ftruncate(fd, 1024);
>         fcntl(fd, F_SETLK, &fl_rd);
>         fcntl(fd, F_GETLK, &fl_wr);

Trying with F_OFD_SETLK (which is more important, because it matches
what we prefer to use) gives the same results as your strace (remember
to #define _GNU_SOURCE 1 before compilation, to get F_OFD_SETLK into scope).

> 
> The key is that in test cod we do not _set_ the lock, but query it! This
> is allowed even on
> RDONLY file descriptor.

Hmm, you're right: file-posix only uses qemu_lock_fd() to set locks
(with exclusive == false), and qemu_lock_fd_test() to query locks
(there, exclusive == true means to probe if a write lock can be grabbed,
but does not actually grab one so it doesn't care whether the fd is RW.
We ).

We'd need a lot more explanation in the commit message (for example, you
still haven't stated an actual backtrace or error message you got when
the lock file is left RW), but you may have just convinced me that the
conversion to always use a RO lock fd is correct, at least for
file-posix' use of fcntl locking.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH RFC] file-posix: make lock_fd read-only

Posted by Kevin Wolf 8 years, 4 months ago

[ Cc: Fam ]

Am 10.10.2017 um 15:42 hat Vladimir Sementsov-Ogievskiy geschrieben:
> We do not reopen lock_fd on bdrv_reopen which leads to problems on
> reopen image RO. So, lets make lock_fd be always RO.
> This is correct, because qemu_lock_fd always called with exclusive=false
> on lock_fd.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> 
> Hi all!
> 
> We've faced the following problem with our shared-storage migration
> scheme. We make an external snapshot and need base image to be reopened
> RO. However, bdrv_reopen reopens only .fd of BDRVRawState but not
> .lock_fd. So, .lock_fd is left opened RW and this breaks the whole
> thing.
> 
> The simple fix is here: let's just open lock_fd as RO always. This
> looks fine for current code, as we never try to set write locks
> (qemu_lock_fd always called with exclusive=false).
> 
> However it will not work if we are going to use write locks.

I was sure that we had discussed this during review, so I just went back
and checked. Indeed, Fam originally had an unconditional O_RDONLY in
some version of the image locking patches, but I actually found a
potential problem with that back then:

> Note that with /dev/fdset there can be cases where we can open a file
> O_RDWR, but not O_RDONLY. Should we better just use the same flags as
> for the s->fd?
https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05107.html

However, I'm now wondering whether we really still need a separate
s->lock_fd or whether we can just use the normal image fd for this. If I
understood the old threads correctly, the original reason for it was
that during bdrv_reopen(), we couldn't safely migrate exclusive locks
from the old fd to the new one. But as we aren't using exclusive locks
any more, this shouldn't be a problem today.

Fam, are there more reasons why we need a separate lock_fd?

Kevin

Re: [Qemu-devel] [PATCH RFC] file-posix: make lock_fd read-only

Posted by Vladimir Sementsov-Ogievskiy 8 years, 4 months ago

11.10.2017 12:22, Kevin Wolf wrote:
> [ Cc: Fam ]
>
> Am 10.10.2017 um 15:42 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> We do not reopen lock_fd on bdrv_reopen which leads to problems on
>> reopen image RO. So, lets make lock_fd be always RO.
>> This is correct, because qemu_lock_fd always called with exclusive=false
>> on lock_fd.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>
>> Hi all!
>>
>> We've faced the following problem with our shared-storage migration
>> scheme. We make an external snapshot and need base image to be reopened
>> RO. However, bdrv_reopen reopens only .fd of BDRVRawState but not
>> .lock_fd. So, .lock_fd is left opened RW and this breaks the whole
>> thing.
>>
>> The simple fix is here: let's just open lock_fd as RO always. This
>> looks fine for current code, as we never try to set write locks
>> (qemu_lock_fd always called with exclusive=false).
>>
>> However it will not work if we are going to use write locks.
> I was sure that we had discussed this during review, so I just went back
> and checked. Indeed, Fam originally had an unconditional O_RDONLY in
> some version of the image locking patches, but I actually found a
> potential problem with that back then:
>
>> Note that with /dev/fdset there can be cases where we can open a file
>> O_RDWR, but not O_RDONLY. Should we better just use the same flags as
>> for the s->fd?
> https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05107.html
>
> However, I'm now wondering whether we really still need a separate
> s->lock_fd or whether we can just use the normal image fd for this. If I
> understood the old threads correctly, the original reason for it was
> that during bdrv_reopen(), we couldn't safely migrate exclusive locks
> from the old fd to the new one. But as we aren't using exclusive locks
> any more, this shouldn't be a problem today.
>
> Fam, are there more reasons why we need a separate lock_fd?
>
> Kevin

If I understand correctly, posix lock will be lost on fd close anyway, 
so other app will have an opportunity of taking this lock, so it's unsafe.

-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH RFC] file-posix: make lock_fd read-only

Posted by Kevin Wolf 8 years, 4 months ago

Am 11.10.2017 um 11:38 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 11.10.2017 12:22, Kevin Wolf wrote:
> > [ Cc: Fam ]
> > 
> > Am 10.10.2017 um 15:42 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > We do not reopen lock_fd on bdrv_reopen which leads to problems on
> > > reopen image RO. So, lets make lock_fd be always RO.
> > > This is correct, because qemu_lock_fd always called with exclusive=false
> > > on lock_fd.
> > > 
> > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > > ---
> > > 
> > > Hi all!
> > > 
> > > We've faced the following problem with our shared-storage migration
> > > scheme. We make an external snapshot and need base image to be reopened
> > > RO. However, bdrv_reopen reopens only .fd of BDRVRawState but not
> > > .lock_fd. So, .lock_fd is left opened RW and this breaks the whole
> > > thing.
> > > 
> > > The simple fix is here: let's just open lock_fd as RO always. This
> > > looks fine for current code, as we never try to set write locks
> > > (qemu_lock_fd always called with exclusive=false).
> > > 
> > > However it will not work if we are going to use write locks.
> > I was sure that we had discussed this during review, so I just went back
> > and checked. Indeed, Fam originally had an unconditional O_RDONLY in
> > some version of the image locking patches, but I actually found a
> > potential problem with that back then:
> > 
> > > Note that with /dev/fdset there can be cases where we can open a file
> > > O_RDWR, but not O_RDONLY. Should we better just use the same flags as
> > > for the s->fd?
> > https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05107.html
> > 
> > However, I'm now wondering whether we really still need a separate
> > s->lock_fd or whether we can just use the normal image fd for this. If I
> > understood the old threads correctly, the original reason for it was
> > that during bdrv_reopen(), we couldn't safely migrate exclusive locks
> > from the old fd to the new one. But as we aren't using exclusive locks
> > any more, this shouldn't be a problem today.
> > 
> > Fam, are there more reasons why we need a separate lock_fd?
> > 
> > Kevin
> 
> If I understand correctly, posix lock will be lost on fd close anyway, so
> other app will have an opportunity of taking this lock, so it's unsafe.

With the OFD locks we're using, you just need to take the lock on the
new fd before you close the old fd, then it should be safe.

With normal POSIX locks, bdrv_reopen() is hopeless anyway, you will
always lose the lock, even with a separate lock_fd. This is why we only
make use of POSIX locks if OFD isn't available, if locking=on is
explicitly requested and only after printing a warning.

Kevin

Re: [Qemu-devel] [PATCH RFC] file-posix: make lock_fd read-only

Posted by Fam Zheng 8 years, 3 months ago

On Wed, 10/11 11:48, Kevin Wolf wrote:
> Am 11.10.2017 um 11:38 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > 11.10.2017 12:22, Kevin Wolf wrote:
> > > [ Cc: Fam ]

Sorry for being slow, now finally getting my hands on email backlog after
holiday and preparing for KVM Forum presentation.

> > > 
> > > Am 10.10.2017 um 15:42 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > > > We do not reopen lock_fd on bdrv_reopen which leads to problems on
> > > > reopen image RO. So, lets make lock_fd be always RO.
> > > > This is correct, because qemu_lock_fd always called with exclusive=false
> > > > on lock_fd.
> > > > 
> > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > > > ---
> > > > 
> > > > Hi all!
> > > > 
> > > > We've faced the following problem with our shared-storage migration
> > > > scheme. We make an external snapshot and need base image to be reopened
> > > > RO. However, bdrv_reopen reopens only .fd of BDRVRawState but not
> > > > .lock_fd. So, .lock_fd is left opened RW and this breaks the whole
> > > > thing.
> > > > 
> > > > The simple fix is here: let's just open lock_fd as RO always. This
> > > > looks fine for current code, as we never try to set write locks
> > > > (qemu_lock_fd always called with exclusive=false).
> > > > 
> > > > However it will not work if we are going to use write locks.
> > > I was sure that we had discussed this during review, so I just went back
> > > and checked. Indeed, Fam originally had an unconditional O_RDONLY in
> > > some version of the image locking patches, but I actually found a
> > > potential problem with that back then:
> > > 
> > > > Note that with /dev/fdset there can be cases where we can open a file
> > > > O_RDWR, but not O_RDONLY. Should we better just use the same flags as
> > > > for the s->fd?
> > > https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg05107.html
> > > 
> > > However, I'm now wondering whether we really still need a separate
> > > s->lock_fd or whether we can just use the normal image fd for this. If I
> > > understood the old threads correctly, the original reason for it was
> > > that during bdrv_reopen(), we couldn't safely migrate exclusive locks
> > > from the old fd to the new one. But as we aren't using exclusive locks
> > > any more, this shouldn't be a problem today.
> > > 
> > > Fam, are there more reasons why we need a separate lock_fd?

I think your reasoning is right. We needed lock_fd in earlier revisions of the
image locking patch because we cannot move exclusive lock from one fd to another
transactionally. We don't need that now.

Fam