Questions about losing the write lock of raw-format disks after migration

Peng Liang via posted 1 patch 2 years, 5 months ago
Failed in applying to current master (apply log)
Questions about losing the write lock of raw-format disks after migration
Posted by Peng Liang via 2 years, 5 months ago
Hi folks,

When we test migration with raw-format disk, we found that the QEMU
process in the dst will lose the write lock after migration.  However,
the QEMU process in the dst will still hold the write lock for
qcow2-format disk.

After reading some block layer's code, I found that the first
blk_set_perm in blk_root_activate will set blk->shared_perm to
BLK_PERM_ALL (disable all shared permissions?).  Then in
blk_vm_state_changed, blk_set_perm will set shared_perm to
blk->shared_perm, which is BLK_PERM_ALL.  And it makes
raw_handle_perm_lock not to get the write lock.

So I try the following patch and it will fix the problem:
diff --git a/block/block-backend.c b/block/block-backend.c
index 12ef80ea17..96518fd1f0 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -197,13 +197,6 @@ static void blk_root_activate(BdrvChild *child,
Error **errp)

     blk->disable_perm = false;

-    blk_set_perm(blk, blk->perm, BLK_PERM_ALL, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
-        blk->disable_perm = true;
-        return;
-    }
-
     if (runstate_check(RUN_STATE_INMIGRATE)) {
         /* Activation can happen when migration process is still
active, for
          * example when nbd_server_add is called during non-shared storage

I'm new to the block layer and I'm not sure that it's a right fix to the
problem.  Any idea about the problem and the patch?

Thanks,
Peng

Re: Questions about losing the write lock of raw-format disks after migration
Posted by Hanna Reitz 2 years, 5 months ago
On 24.11.21 13:56, Peng Liang via wrote:
> Hi folks,
>
> When we test migration with raw-format disk, we found that the QEMU
> process in the dst will lose the write lock after migration.  However,
> the QEMU process in the dst will still hold the write lock for
> qcow2-format disk.
>
> After reading some block layer's code, I found that the first
> blk_set_perm in blk_root_activate will set blk->shared_perm to
> BLK_PERM_ALL (disable all shared permissions?).  Then in
> blk_vm_state_changed, blk_set_perm will set shared_perm to
> blk->shared_perm, which is BLK_PERM_ALL.  And it makes
> raw_handle_perm_lock not to get the write lock.
>
> So I try the following patch and it will fix the problem:
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 12ef80ea17..96518fd1f0 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -197,13 +197,6 @@ static void blk_root_activate(BdrvChild *child,
> Error **errp)
>
>       blk->disable_perm = false;
>
> -    blk_set_perm(blk, blk->perm, BLK_PERM_ALL, &local_err);
> -    if (local_err) {
> -        error_propagate(errp, local_err);
> -        blk->disable_perm = true;
> -        return;
> -    }
> -
>       if (runstate_check(RUN_STATE_INMIGRATE)) {
>           /* Activation can happen when migration process is still
> active, for
>            * example when nbd_server_add is called during non-shared storage
>
> I'm new to the block layer and I'm not sure that it's a right fix to the
> problem.  Any idea about the problem and the patch?

Hi Peng,

Thanks for your report!  I can reproduce this problem.

It appears to me the problem is that blk_set_perm(), well, sets 
blk->perm and blk->shared_perm.  So by once calling it with 
BLK_PERM_ALL, we override blk->shared_perm to from then on be 
BLK_PERM_ALL, even though the guest device has not set that at all. We 
later call blk_set_perm(blk->blk_perm, blk->shared_perm) (in 
blk_vm_state_changed()), however, blk->shared_perm is now BLK_PERM_ALL, 
so this is a no-op.  That means that any restrictions the guest device 
has imposed (like the default share-rw=off) is not reflected in the 
block device’s permissions.

This is not apparent with qcow2, because the qcow2 format imposes its 
own restrictions in addition to the guest device.

I think the right way to fix this is to save blk->shared_perm somewhere 
and then restore it after the blk_set_perm(BLK_PERM_ALL) call.  I’ll 
send a patch (with a test case).

Hanna