[PATCH] block: Fix bdrv_activate() not to fail without medium

Markus Armbruster posted 1 patch 2 weeks, 6 days ago
block.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] block: Fix bdrv_activate() not to fail without medium
Posted by Markus Armbruster 2 weeks, 6 days ago
bdrv_activate() returns failure without setting an error when
!bs->drv.  This is suspicious.  Turns out it used to succeed then,
until commit 5416645fcf82 changed it to return -ENOMEDIUM.

Return zero instead.

Fixes: 5416645fcf82 (block: return error-code from bdrv_invalidate_cache)
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 0ece805e41..9855c102de 100644
--- a/block.c
+++ b/block.c
@@ -6860,7 +6860,7 @@ int bdrv_activate(BlockDriverState *bs, Error **errp)
     GRAPH_RDLOCK_GUARD_MAINLOOP();
 
     if (!bs->drv)  {
-        return -ENOMEDIUM;
+        return 0;
     }
 
     QLIST_FOREACH(child, &bs->children, next) {
-- 
2.48.1
Re: [PATCH] block: Fix bdrv_activate() not to fail without medium
Posted by Kevin Wolf 2 weeks, 6 days ago
Am 12.03.2025 um 15:37 hat Markus Armbruster geschrieben:
> bdrv_activate() returns failure without setting an error when
> !bs->drv.  This is suspicious.  Turns out it used to succeed then,
> until commit 5416645fcf82 changed it to return -ENOMEDIUM.
> 
> Return zero instead.
> 
> Fixes: 5416645fcf82 (block: return error-code from bdrv_invalidate_cache)
> Signed-off-by: Markus Armbruster <armbru@redhat.com>

The commit message sounds more theoretical. Did you find this only by
code inspection? Do we know what the effect on real-world cases is, so
we could add a sentence about it to the commit message? Maybe we could
even have a qemu-iotests case to show the effect?

I absolutely agree that returning -ENOMEDIUM while not setting errp is
wrong. But without an example of what is affected, it's not obvious to
me which part of it needs to be fixed.

Kevin
Re: [PATCH] block: Fix bdrv_activate() not to fail without medium
Posted by Markus Armbruster 2 weeks, 6 days ago
Kevin Wolf <kwolf@redhat.com> writes:

> Am 12.03.2025 um 15:37 hat Markus Armbruster geschrieben:
>> bdrv_activate() returns failure without setting an error when
>> !bs->drv.  This is suspicious.  Turns out it used to succeed then,
>> until commit 5416645fcf82 changed it to return -ENOMEDIUM.
>> 
>> Return zero instead.
>> 
>> Fixes: 5416645fcf82 (block: return error-code from bdrv_invalidate_cache)
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>
> The commit message sounds more theoretical. Did you find this only by
> code inspection? Do we know what the effect on real-world cases is, so
> we could add a sentence about it to the commit message? Maybe we could
> even have a qemu-iotests case to show the effect?
>
> I absolutely agree that returning -ENOMEDIUM while not setting errp is
> wrong. But without an example of what is affected, it's not obvious to
> me which part of it needs to be fixed.

Code inspection.

Here's my somewhat extended rationale for my fix.

The code of interest used to be in bdrv_invalidate_cache().  Have a look
at commit 5a8a30db477:

    commit 5a8a30db4771675480829d7d3bf35a138e9c35f1
    Author: Kevin Wolf <kwolf@redhat.com>
    Date:   Wed Mar 12 15:59:16 2014 +0100

        block: Add error handling to bdrv_invalidate_cache()

        If it returns an error, the migrated VM will not be started, but qemu
        exits with an error message.

        Signed-off-by: Kevin Wolf <kwolf@redhat.com>
        Reviewed-by: Juan Quintela <quintela@redhat.com>
        Reviewed-by: Eric Blake <eblake@redhat.com>
        Reviewed-by: Benoit Canet <benoit@irqsave.net>

    [...]

    diff --git a/block.c b/block.c
    index 53f5b44fbb..acb70fde3d 100644
    --- a/block.c
    +++ b/block.c
    @@ -4781,27 +4781,43 @@ flush_parent:
         return bdrv_co_flush(bs->file);
     }

    -void bdrv_invalidate_cache(BlockDriverState *bs)
    +void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
     {
    +    Error *local_err = NULL;
    +    int ret;
    +
         if (!bs->drv)  {
             return;
         }

         if (bs->drv->bdrv_invalidate_cache) {
    -        bs->drv->bdrv_invalidate_cache(bs);
    +        bs->drv->bdrv_invalidate_cache(bs, &local_err);
         } else if (bs->file) {
    -        bdrv_invalidate_cache(bs->file);
    +        bdrv_invalidate_cache(bs->file, &local_err);
    +    }
    +    if (local_err) {
    +        error_propagate(errp, local_err);
    +        return;
         }

    -    refresh_total_sectors(bs, bs->total_sectors);
    +    ret = refresh_total_sectors(bs, bs->total_sectors);
    +    if (ret < 0) {
    +        error_setg_errno(errp, -ret, "Could not refresh total sector count");
    +        return;
    +    }
     }

    [...]

Not failing the function on !bs->drv is clearly intentional.

Behavior stayed this way for more than six years.  Then commit
5416645fcf8 (block: return error-code from bdrv_invalidate_cache)
changed the function to return zero on success, a negative errno on
failure.  According to the commit message, the patch is mere cleanup,
and not supposed to change behavior.

Since the first return was a success before the patch (no error set),
the correct value to return was zero.  The patch used -ENOMEDIUM
instead.  This is a clear regression.

My patch restores previous behavior.
Re: [PATCH] block: Fix bdrv_activate() not to fail without medium
Posted by Kevin Wolf 1 week, 6 days ago
Am 13.03.2025 um 12:53 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 12.03.2025 um 15:37 hat Markus Armbruster geschrieben:
> >> bdrv_activate() returns failure without setting an error when
> >> !bs->drv.  This is suspicious.  Turns out it used to succeed then,
> >> until commit 5416645fcf82 changed it to return -ENOMEDIUM.
> >> 
> >> Return zero instead.
> >> 
> >> Fixes: 5416645fcf82 (block: return error-code from bdrv_invalidate_cache)
> >> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> >
> > The commit message sounds more theoretical. Did you find this only by
> > code inspection? Do we know what the effect on real-world cases is, so
> > we could add a sentence about it to the commit message? Maybe we could
> > even have a qemu-iotests case to show the effect?
> >
> > I absolutely agree that returning -ENOMEDIUM while not setting errp is
> > wrong. But without an example of what is affected, it's not obvious to
> > me which part of it needs to be fixed.
> 
> Code inspection.
> 
> Here's my somewhat extended rationale for my fix.
> [...]

> Not failing the function on !bs->drv is clearly intentional.
> 
> Behavior stayed this way for more than six years.  Then commit
> 5416645fcf8 (block: return error-code from bdrv_invalidate_cache)
> changed the function to return zero on success, a negative errno on
> failure.  According to the commit message, the patch is mere cleanup,
> and not supposed to change behavior.
> 
> Since the first return was a success before the patch (no error set),
> the correct value to return was zero.  The patch used -ENOMEDIUM
> instead.  This is a clear regression.
> 
> My patch restores previous behavior.

I understand your rationale and don't disagree with your patch.
But I would still like the commit message to explain the practical
consequences of the bug and if possible a test case.

If you tried to find the practical consequences and couldn't find any
way to trigger a bug as a user, that is worth documenting, too.

Kevin
Re: [PATCH] block: Fix bdrv_activate() not to fail without medium
Posted by Markus Armbruster 2 weeks, 6 days ago
Kevin Wolf <kwolf@redhat.com> writes:

> Am 12.03.2025 um 15:37 hat Markus Armbruster geschrieben:
>> bdrv_activate() returns failure without setting an error when
>> !bs->drv.  This is suspicious.  Turns out it used to succeed then,
>> until commit 5416645fcf82 changed it to return -ENOMEDIUM.
>> 
>> Return zero instead.
>> 
>> Fixes: 5416645fcf82 (block: return error-code from bdrv_invalidate_cache)
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>
> The commit message sounds more theoretical. Did you find this only by
> code inspection? Do we know what the effect on real-world cases is, so
> we could add a sentence about it to the commit message? Maybe we could
> even have a qemu-iotests case to show the effect?
>
> I absolutely agree that returning -ENOMEDIUM while not setting errp is
> wrong. But without an example of what is affected, it's not obvious to
> me which part of it needs to be fixed.

Code inspection.

Here's my somewhat extended rationale for my fix.

The code of interest used to be in bdrv_invalidate_cache().  Have a look
at commit 5a8a30db477:

    commit 5a8a30db4771675480829d7d3bf35a138e9c35f1
    Author: Kevin Wolf <kwolf@redhat.com>
    Date:   Wed Mar 12 15:59:16 2014 +0100

        block: Add error handling to bdrv_invalidate_cache()

        If it returns an error, the migrated VM will not be started, but qemu
        exits with an error message.

        Signed-off-by: Kevin Wolf <kwolf@redhat.com>
        Reviewed-by: Juan Quintela <quintela@redhat.com>
        Reviewed-by: Eric Blake <eblake@redhat.com>
        Reviewed-by: Benoit Canet <benoit@irqsave.net>

    [...]
    diff --git a/include/block/block.h b/include/block/block.h
    index bd34d14109..1ed55d839a 100644
    --- a/include/block/block.h
    +++ b/include/block/block.h
    @@ -329,8 +329,8 @@ BlockDriverAIOCB *bdrv_aio_ioctl(BlockDriverState *bs,
             BlockDriverCompletionFunc *cb, void *opaque);

     /* Invalidate any cached metadata used by image formats */
    -void bdrv_invalidate_cache(BlockDriverState *bs);
    -void bdrv_invalidate_cache_all(void);
    +void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp);
    +void bdrv_invalidate_cache_all(Error **errp);

     void bdrv_clear_incoming_migration_all(void);

    diff --git a/include/block/block_int.h b/include/block/block_int.h
    index 4fc5ea8a65..cd5bc7308a 100644
    --- a/include/block/block_int.h
    +++ b/include/block/block_int.h
    @@ -153,7 +153,7 @@ struct BlockDriver {
         /*
          * Invalidate any cached meta-data.
          */
    -    void (*bdrv_invalidate_cache)(BlockDriverState *bs);
    +    void (*bdrv_invalidate_cache)(BlockDriverState *bs, Error **errp);

         /*
          * Flushes all data that was already written to the OS all the way down to
    diff --git a/block.c b/block.c
    index 53f5b44fbb..acb70fde3d 100644
    --- a/block.c
    +++ b/block.c
    @@ -4781,27 +4781,43 @@ flush_parent:
         return bdrv_co_flush(bs->file);
     }

    -void bdrv_invalidate_cache(BlockDriverState *bs)
    +void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
     {
    +    Error *local_err = NULL;
    +    int ret;
    +
         if (!bs->drv)  {
             return;
         }

         if (bs->drv->bdrv_invalidate_cache) {
    -        bs->drv->bdrv_invalidate_cache(bs);
    +        bs->drv->bdrv_invalidate_cache(bs, &local_err);
         } else if (bs->file) {
    -        bdrv_invalidate_cache(bs->file);
    +        bdrv_invalidate_cache(bs->file, &local_err);
    +    }
    +    if (local_err) {
    +        error_propagate(errp, local_err);
    +        return;
         }

    -    refresh_total_sectors(bs, bs->total_sectors);
    +    ret = refresh_total_sectors(bs, bs->total_sectors);
    +    if (ret < 0) {
    +        error_setg_errno(errp, -ret, "Could not refresh total sector count");
    +        return;
    +    }
     }

    -void bdrv_invalidate_cache_all(void)
    +void bdrv_invalidate_cache_all(Error **errp)
     {
         BlockDriverState *bs;
    +    Error *local_err = NULL;

         QTAILQ_FOREACH(bs, &bdrv_states, device_list) {
    -        bdrv_invalidate_cache(bs);
    +        bdrv_invalidate_cache(bs, &local_err);
    +        if (local_err) {
    +            error_propagate(errp, local_err);
    +            return;
    +        }
         }
     }
Re: [PATCH] block: Fix bdrv_activate() not to fail without medium
Posted by Markus Armbruster 2 weeks, 6 days ago
Butterfingers, please ignore this one.  I'll send a real reply soon.