[Qemu-devel] [PATCH-for-4.1 v6 0/5] hw/block/pflash_cfi01: Add DeviceReset() handler

Philippe Mathieu-Daudé posted 5 patches 4 years, 9 months ago
Test asan passed
Test docker-clang@ubuntu passed
Test docker-mingw@fedora passed
Test FreeBSD passed
Test s390x passed
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20190716221555.11145-1-philmd@redhat.com
Maintainers: "Philippe Mathieu-Daudé" <philmd@redhat.com>, Max Reitz <mreitz@redhat.com>, Kevin Wolf <kwolf@redhat.com>
There is a newer version of this series
hw/block/pflash_cfi01.c | 77 +++++++++++++++++++++--------------------
hw/block/trace-events   |  1 +
2 files changed, 41 insertions(+), 37 deletions(-)
[Qemu-devel] [PATCH-for-4.1 v6 0/5] hw/block/pflash_cfi01: Add DeviceReset() handler
Posted by Philippe Mathieu-Daudé 4 years, 9 months ago
Hello it's me again, insisting with this series because there are at
least 2 different report of guests bricked on reset due to the bug
fixed by patch #5:
https://bugzilla.redhat.com/show_bug.cgi?id=1678713
https://bugzilla.redhat.com/show_bug.cgi?id=1704584

Patches missing review: 2 and 3

The pflash device lacks a reset() function.
When a machine is resetted, the flash might be in an
inconsistent state, leading to unexpected behavior.
Resolve this issue by adding a DeviceReset() handler.

Since v1: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg00962.html
- addressed Laszlo review comments

Since v2: https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg00395.html
- consider migration (Laszlo, Peter)

Since v3: https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01668.html
- more reliable migration (Dave)
- dropped patches 6-9 not required for next release

Since v4: https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02785.html
- document why using READ_ARRAY value 0x00 for migration is safe

Since v5: https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg03366.html
- avoid trying to be spec-compliant and messing with migration. KISS.
  review/test tags reset, sorry.

$ git backport-diff -u v5
Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/5:[----] [-C] 'hw/block/pflash_cfi01: Removed an unused timer'
002/5:[down] 'hw/block/pflash_cfi01: Document use of non-CFI compliant command '0x00''
003/5:[0031] [FC] 'hw/block/pflash_cfi01: Extract pflash_mode_read_array()'
004/5:[down] 'hw/block/pflash_cfi01: Rename 'reset_flash' label as 'mode_read_array''
005/5:[----] [--] 'hw/block/pflash_cfi01: Add the DeviceReset() handler'

Regards,

Phil.

Philippe Mathieu-Daudé (5):
  hw/block/pflash_cfi01: Removed an unused timer
  hw/block/pflash_cfi01: Document use of non-CFI compliant command
    '0x00'
  hw/block/pflash_cfi01: Extract pflash_mode_read_array()
  hw/block/pflash_cfi01: Rename 'reset_flash' label as 'mode_read_array'
  hw/block/pflash_cfi01: Add the DeviceReset() handler

 hw/block/pflash_cfi01.c | 77 +++++++++++++++++++++--------------------
 hw/block/trace-events   |  1 +
 2 files changed, 41 insertions(+), 37 deletions(-)

-- 
2.20.1


Re: [Qemu-devel] [PATCH-for-4.1 v6 0/5] hw/block/pflash_cfi01: Add DeviceReset() handler
Posted by Laszlo Ersek 4 years, 9 months ago
Hi Phil,

On 07/17/19 00:15, Philippe Mathieu-Daudé wrote:
> Hello it's me again, insisting with this series because there are at
> least 2 different report of guests bricked on reset due to the bug
> fixed by patch #5:
> https://bugzilla.redhat.com/show_bug.cgi?id=1678713
> https://bugzilla.redhat.com/show_bug.cgi?id=1704584
> 
> Patches missing review: 2 and 3
> 
> The pflash device lacks a reset() function.
> When a machine is resetted, the flash might be in an
> inconsistent state, leading to unexpected behavior.
> Resolve this issue by adding a DeviceReset() handler.
> 
> Since v1: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg00962.html
> - addressed Laszlo review comments
> 
> Since v2: https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg00395.html
> - consider migration (Laszlo, Peter)
> 
> Since v3: https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01668.html
> - more reliable migration (Dave)
> - dropped patches 6-9 not required for next release
> 
> Since v4: https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02785.html
> - document why using READ_ARRAY value 0x00 for migration is safe
> 
> Since v5: https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg03366.html
> - avoid trying to be spec-compliant and messing with migration. KISS.
>   review/test tags reset, sorry.

I have a number of questions / requests.


(1) The last I recall from the v5 discussion is Markus asking about some
risky cases (corner cases?) related to migration.

So what is the new avenue taken in v6? What does "continue ignoring spec
compliance" mean, with regard to migration?

My vague understanding is that we're not trying to answer Markus's
questions; instead, we're side-stepping them, by doing something else.
That works for me, but can we please summarize in a bit more detail?
Like, "in v6, we're not mapping 0x00 vs. 0xff across migration because..."

Yes, I could inter-diff v5 vs. v6, but I know way too little about
pflash. I'd miss how our *dropping* of the special 0xff mapping impacts
our conformance to the data sheet.

I'm not asking for commit message updates, just a bit more explanation
in this free-form discussion. (I looked for it under v5, and couldn't
find it.)


(2) Has someone regression-tested v6 for migration specifically?

Or, is v6 not "risky" wrt. migration any longer?


(3) I'm fine regression testing v6 too (without migration, again).
Please ping me separately once the reviews have converged and the series
is otherwise ready for merging.

Thanks!
Laszlo


> $ git backport-diff -u v5
> Key:
> [----] : patches are identical
> [####] : number of functional differences between upstream/downstream patch
> [down] : patch is downstream-only
> The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively
> 
> 001/5:[----] [-C] 'hw/block/pflash_cfi01: Removed an unused timer'
> 002/5:[down] 'hw/block/pflash_cfi01: Document use of non-CFI compliant command '0x00''
> 003/5:[0031] [FC] 'hw/block/pflash_cfi01: Extract pflash_mode_read_array()'
> 004/5:[down] 'hw/block/pflash_cfi01: Rename 'reset_flash' label as 'mode_read_array''
> 005/5:[----] [--] 'hw/block/pflash_cfi01: Add the DeviceReset() handler'
> 
> Regards,
> 
> Phil.
> 
> Philippe Mathieu-Daudé (5):
>   hw/block/pflash_cfi01: Removed an unused timer
>   hw/block/pflash_cfi01: Document use of non-CFI compliant command
>     '0x00'
>   hw/block/pflash_cfi01: Extract pflash_mode_read_array()
>   hw/block/pflash_cfi01: Rename 'reset_flash' label as 'mode_read_array'
>   hw/block/pflash_cfi01: Add the DeviceReset() handler
> 
>  hw/block/pflash_cfi01.c | 77 +++++++++++++++++++++--------------------
>  hw/block/trace-events   |  1 +
>  2 files changed, 41 insertions(+), 37 deletions(-)
> 


Re: [Qemu-devel] [PATCH-for-4.1 v6 0/5] hw/block/pflash_cfi01: Add DeviceReset() handler
Posted by Philippe Mathieu-Daudé 4 years, 9 months ago
Hi Laszlo,

On 7/17/19 2:24 PM, Laszlo Ersek wrote:
> Hi Phil,
> 
> On 07/17/19 00:15, Philippe Mathieu-Daudé wrote:
>> Hello it's me again, insisting with this series because there are at
>> least 2 different report of guests bricked on reset due to the bug
>> fixed by patch #5:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1678713
>> https://bugzilla.redhat.com/show_bug.cgi?id=1704584
>>
>> Patches missing review: 2 and 3
>>
>> The pflash device lacks a reset() function.
>> When a machine is resetted, the flash might be in an
>> inconsistent state, leading to unexpected behavior.
>> Resolve this issue by adding a DeviceReset() handler.
>>
>> Since v1: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg00962.html
>> - addressed Laszlo review comments
>>
>> Since v2: https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg00395.html
>> - consider migration (Laszlo, Peter)
>>
>> Since v3: https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01668.html
>> - more reliable migration (Dave)
>> - dropped patches 6-9 not required for next release
>>
>> Since v4: https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02785.html
>> - document why using READ_ARRAY value 0x00 for migration is safe
>>
>> Since v5: https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg03366.html
>> - avoid trying to be spec-compliant and messing with migration. KISS.
>>   review/test tags reset, sorry.
> 
> I have a number of questions / requests.
> 
> 
> (1) The last I recall from the v5 discussion is Markus asking about some
> risky cases (corner cases?) related to migration.
> 
> So what is the new avenue taken in v6? What does "continue ignoring spec
> compliance" mean, with regard to migration?
> 
> My vague understanding is that we're not trying to answer Markus's
> questions; instead, we're side-stepping them, by doing something else.

I'd love to keep the v5 series and address Markus issues, but as shown
by the quantity of respins, I don't understand the migration feature
enough to fix it in time for the next release. I plan to address them
(and other issues reported by Markus in other reviews) during the next
dev cycle.

> That works for me, but can we please summarize in a bit more detail?
> Like, "in v6, we're not mapping 0x00 vs. 0xff across migration because..."

Since it is very late in the release process and this series intend to
fix a guest corruption bug worthwhile for release, the approch taken by
v6 is "try to change the strict minimum regarding to migration, do not
worry about spec issues". I even tried to make no migration change at
all, but as explained in patch 6 "Extract pflash_mode_read_array" there
is still one.

I could make no migration change, and in patch 6 not call
memory_region_rom_device_set_romd() in pflash_mode_read_array() [and
call it in other places instead], but then we still have the undefined
behavior described in the patch. We always lived with this UNDEF, so...
I could work on a simpler v7.

> Yes, I could inter-diff v5 vs. v6, but I know way too little about
> pflash. I'd miss how our *dropping* of the special 0xff mapping impacts
> our conformance to the data sheet.

To reassure you, it never worked well but nobody cared, I noticed while
converting DPRINTF to trace-events and adding more, then follow the
model state machine. While it's probably not worth fixing, it makes
debugging slighly harder when looking at the CFI spec workflow. Now I'm
used to it.

> I'm not asking for commit message updates, just a bit more explanation
> in this free-form discussion. (I looked for it under v5, and couldn't
> find it.)

I tried to be verbose in the patch description, so for reference:

0xff on v5:
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg03368.html

not 0xff on v6:
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg03931.html

migration impact on v6 [1]:
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg03932.html

> (2) Has someone regression-tested v6 for migration specifically?

No, neither were tested the previous versions.

> 
> Or, is v6 not "risky" wrt. migration any longer?

v6 should be way less risky than previous versions (still one issue
noted in [1]).

> (3) I'm fine regression testing v6 too (without migration, again).
> Please ping me separately once the reviews have converged and the series
> is otherwise ready for merging.

Yes, I know your testing takes very long, so let's wait first.

Thanks for having a look.

Phil.

> 
> Thanks!
> Laszlo
> 
> 
>> $ git backport-diff -u v5
>> Key:
>> [----] : patches are identical
>> [####] : number of functional differences between upstream/downstream patch
>> [down] : patch is downstream-only
>> The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively
>>
>> 001/5:[----] [-C] 'hw/block/pflash_cfi01: Removed an unused timer'
>> 002/5:[down] 'hw/block/pflash_cfi01: Document use of non-CFI compliant command '0x00''
>> 003/5:[0031] [FC] 'hw/block/pflash_cfi01: Extract pflash_mode_read_array()'
>> 004/5:[down] 'hw/block/pflash_cfi01: Rename 'reset_flash' label as 'mode_read_array''
>> 005/5:[----] [--] 'hw/block/pflash_cfi01: Add the DeviceReset() handler'
>>
>> Regards,
>>
>> Phil.
>>
>> Philippe Mathieu-Daudé (5):
>>   hw/block/pflash_cfi01: Removed an unused timer
>>   hw/block/pflash_cfi01: Document use of non-CFI compliant command
>>     '0x00'
>>   hw/block/pflash_cfi01: Extract pflash_mode_read_array()
>>   hw/block/pflash_cfi01: Rename 'reset_flash' label as 'mode_read_array'
>>   hw/block/pflash_cfi01: Add the DeviceReset() handler
>>
>>  hw/block/pflash_cfi01.c | 77 +++++++++++++++++++++--------------------
>>  hw/block/trace-events   |  1 +
>>  2 files changed, 41 insertions(+), 37 deletions(-)
>>
>