[RFC 1/4] Add qapi for block error injection

Tony Asleson posted 4 patches 6 years, 4 months ago
Maintainers: Markus Armbruster <armbru@redhat.com>, John Snow <jsnow@redhat.com>, Max Reitz <mreitz@redhat.com>, Eric Blake <eblake@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Keith Busch <keith.busch@intel.com>, Kevin Wolf <kwolf@redhat.com>, Fam Zheng <fam@euphon.net>
[RFC 1/4] Add qapi for block error injection
Posted by Tony Asleson 6 years, 4 months ago
Proof of concept code to dynamically create/delete media errors
for block devices using the qapi interface.  This is useful for testing
the OS and the block and FS layers for error handling.  Utilizing this
in a VM allows it to be OS agnostic and test the OS at it's lowest levels
of hardware interaction.

Signed-off-by: Tony Asleson <tasleson@redhat.com>
---
 block/Makefile.objs  |   2 +-
 block/error_inject.c | 179 +++++++++++++++++++++++++++++++++++++++++++
 block/error_inject.h |  43 +++++++++++
 block/qapi.c         |  18 +++++
 qapi/block.json      |  36 +++++++++
 5 files changed, 277 insertions(+), 1 deletion(-)
 create mode 100644 block/error_inject.c
 create mode 100644 block/error_inject.h

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 35f3bca4d9..c8fcbc276b 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,4 +1,4 @@
-block-obj-y += raw-format.o vmdk.o vpc.o
+block-obj-y += raw-format.o vmdk.o vpc.o error_inject.o
 block-obj-$(CONFIG_QCOW1) += qcow.o
 block-obj-$(CONFIG_VDI) += vdi.o
 block-obj-$(CONFIG_CLOOP) += cloop.o
diff --git a/block/error_inject.c b/block/error_inject.c
new file mode 100644
index 0000000000..26a47dfb8c
--- /dev/null
+++ b/block/error_inject.c
@@ -0,0 +1,179 @@
+/*
+ * Error injection code for block devices
+ *
+ * Copyright (c) 2019 Red Hat, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "error_inject.h"
+
+#include <gmodule.h>
+
+#include "qemu/thread.h"
+
+static QemuMutex error_inject_lock;
+static GHashTable *error_inject_data;
+
+
+static void delete_lba_entries(void *entry)
+{
+    GSequence *e = (GSequence *) entry;
+    g_sequence_free(e);
+}
+
+struct value {
+    uint64_t error_lba;
+
+    /*
+     * TODO Actually do something with behavior
+     */
+    MediaErrorBehavior behavior;
+    /*
+     * TODO Add data for generating bitrot, when we do change free function
+     */
+};
+
+static int key_compare(gconstpointer a, gconstpointer b, gpointer data)
+{
+    uint64_t left = ((struct value *)a)->error_lba;
+    uint64_t right = ((struct value *)b)->error_lba;
+
+    if (left < right) {
+        return -1;
+    } else if (left > right) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+
+static uint64_t error_lba_get(GSequenceIter *iter)
+{
+    gpointer tmp = g_sequence_get(iter);
+    return ((struct value *)tmp)->error_lba;
+}
+
+void media_error_create(const char *device_id, uint64_t lba,
+                        MediaErrorBehavior behavior)
+{
+    qemu_mutex_lock(&error_inject_lock);
+
+    GSequence *block_device = g_hash_table_lookup(error_inject_data, device_id);
+    if (!block_device) {
+        block_device = g_sequence_new(g_free);
+        char *key = strdup(device_id);
+        g_hash_table_insert(error_inject_data, key, block_device);
+    }
+
+    struct value lookup = {lba, MEDIA_ERROR_BEHAVIOR__MAX};
+    if (!g_sequence_lookup(block_device, &lookup, key_compare, NULL)) {
+        struct value *val = g_new(struct value, 1);
+        val->error_lba = lba;
+        val->behavior = behavior;
+
+        g_sequence_insert_sorted(block_device, val, key_compare, NULL);
+    }
+
+    qemu_mutex_unlock(&error_inject_lock);
+}
+
+void media_error_delete(const char *device_id, uint64_t lba)
+{
+    qemu_mutex_lock(&error_inject_lock);
+
+    GSequence *block_device = g_hash_table_lookup(error_inject_data, device_id);
+    if (block_device) {
+        struct value find = { lba, MEDIA_ERROR_BEHAVIOR__MAX};
+        GSequenceIter *found = g_sequence_lookup(block_device, &find,
+                                                 key_compare, NULL);
+        if (found) {
+            g_sequence_remove(found);
+        }
+    }
+
+    qemu_mutex_unlock(&error_inject_lock);
+}
+
+int error_in_read(const char *device_id, uint64_t lba, uint64_t len,
+                  uint64_t *error_lba)
+{
+    uint64_t error_sector = 0;
+    const uint64_t transfer_end = lba + len;
+    int ec = 0;
+    *error_lba = 0xFFFFFFFFFFFFFFFF;
+
+    qemu_mutex_lock(&error_inject_lock);
+
+    GSequence *block_device = g_hash_table_lookup(error_inject_data, device_id);
+    if (block_device && g_sequence_get_length(block_device) != 0) {
+        struct value find = {lba, MEDIA_ERROR_BEHAVIOR__MAX};
+        GSequenceIter *iter = g_sequence_search(block_device, &find,
+                                                key_compare, NULL);
+
+        /*
+         * g_sequence_seach returns where the item would be inserted.
+         * In the case of a direct match, it's spot is inserted after the
+         * existing, thus we need to check the one immediately before
+         * the insertion point to see if it is the one we are looking for.
+         */
+        GSequenceIter *prev = g_sequence_iter_prev(iter);
+        error_sector = error_lba_get(prev);
+
+        if (error_sector >= lba && error_sector < transfer_end) {
+            *error_lba = error_sector;
+            ec = 1;
+        } else {
+            /*
+             * Lets look at next until we find one in our transfer or bail
+             * if the error(s) logical block address are greater than the
+             * end of our transfer.
+             */
+            while (!g_sequence_iter_is_end(iter)) {
+                error_sector = error_lba_get(iter);
+
+                if (error_sector >= transfer_end) {
+                    break;
+                }
+                if (error_sector >= lba && error_sector < transfer_end) {
+                    *error_lba = error_sector;
+                    ec = 1;
+                    break;
+                } else {
+                    iter = g_sequence_iter_next(iter);
+                }
+            }
+        }
+    }
+
+    qemu_mutex_unlock(&error_inject_lock);
+
+    return ec;
+}
+
+
+static void __attribute__((__constructor__)) error_inject_init(void)
+{
+    qemu_mutex_init(&error_inject_lock);
+
+    error_inject_data = g_hash_table_new_full(g_str_hash,
+                                              g_str_equal,
+                                              g_free,
+                                              delete_lba_entries);
+}
diff --git a/block/error_inject.h b/block/error_inject.h
new file mode 100644
index 0000000000..5f2d143c90
--- /dev/null
+++ b/block/error_inject.h
@@ -0,0 +1,43 @@
+/*
+ * Error injection code for block devices
+ *
+ * Copyright (c) 2019 Red Hat, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef ERROR_INJECT
+#define ERROR_INJECT
+#include "qemu/osdep.h"
+
+#include <stdint.h>
+
+#include "qapi/qapi-commands-block.h"
+#include "qapi/qapi-types-block.h"
+
+
+void media_error_create(const char *device_id, uint64_t lba,
+                        MediaErrorBehavior behavior);
+void media_error_delete(const char *device_id, uint64_t lba);
+
+
+int error_in_read(const char *device_id, uint64_t lba, uint64_t len,
+                  uint64_t *error_lba);
+
+#endif
diff --git a/block/qapi.c b/block/qapi.c
index 15f1030264..d66201a831 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -26,11 +26,14 @@
 #include "qemu-common.h"
 #include "block/qapi.h"
 #include "block/block_int.h"
+#include "block/error_inject.h"
 #include "block/throttle-groups.h"
 #include "block/write-threshold.h"
 #include "qapi/error.h"
+#include "qapi/qapi-commands-block.h"
 #include "qapi/qapi-commands-block-core.h"
 #include "qapi/qobject-output-visitor.h"
+#include "qapi/qapi-types-block.h"
 #include "qapi/qapi-visit-block-core.h"
 #include "qapi/qmp/qbool.h"
 #include "qapi/qmp/qdict.h"
@@ -841,3 +844,18 @@ void bdrv_image_info_dump(ImageInfo *info)
         bdrv_image_info_specific_dump(info->format_specific);
     }
 }
+
+void qmp_media_error_create(const char *device_id, uint64_t lba,
+        MediaErrorBehavior behavior, Error **errp)
+{
+    /*
+     * We could validate the device_id and lba for existence and range, but we
+     * want to be able to add entries before a device is hot plugged too.
+     */
+    media_error_create(device_id, lba, behavior);
+}
+
+void qmp_media_error_delete(const char *device_id, uint64_t lba, Error **errp)
+{
+    media_error_delete(device_id, lba);
+}
diff --git a/qapi/block.json b/qapi/block.json
index 145c268bb6..2b11954b44 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -453,3 +453,39 @@
 { 'event': 'QUORUM_REPORT_BAD',
   'data': { 'type': 'QuorumOpType', '*error': 'str', 'node-name': 'str',
             'sector-num': 'int', 'sectors-count': 'int' } }
+
+##
+# @MediaErrorBehavior:
+#
+# Enumerated type for classifying type of read error behavior.
+#
+##
+{ 'enum': 'MediaErrorBehavior', 'data': ['hard', 'fixed_on_write', 'flakey'] }
+
+##
+# @media-error-create:
+#
+# Example:
+# -> {"execute": "media-error-create",
+#     "arguments": {"device_id": "12345678, "lba" : 10000, "behavior" : "hard"}}
+# <- { "return": {} }
+#
+# Create a synthetic media error for the specified logical block address when
+# it is read by the guest OS.
+#
+##
+{ 'command':'media-error-create',
+  'data': {'device_id': 'str', 'lba': 'uint64', 'behavior': 'MediaErrorBehavior'}}
+
+##
+# @media-error-delete:
+#
+# Delete a synthetic media error for the specified logical block address.
+#
+# # Example:
+# -> {"execute": "media-error-delete",
+#     "arguments": {"device_id": "01020304", "lba" : 10000}
+# <- { "return": {} }
+##
+{ 'command':'media-error-delete',
+  'data': {'device_id': 'str', 'lba': 'uint64'}}
-- 
2.21.0


Re: [RFC 1/4] Add qapi for block error injection
Posted by Philippe Mathieu-Daudé 6 years, 4 months ago
Hi Tony,

On 9/19/19 9:48 PM, Tony Asleson wrote:
> Proof of concept code to dynamically create/delete media errors
> for block devices using the qapi interface.  This is useful for testing
> the OS and the block and FS layers for error handling.  Utilizing this
> in a VM allows it to be OS agnostic and test the OS at it's lowest levels
> of hardware interaction.
> 
> Signed-off-by: Tony Asleson <tasleson@redhat.com>
> ---
>  block/Makefile.objs  |   2 +-
>  block/error_inject.c | 179 +++++++++++++++++++++++++++++++++++++++++++
>  block/error_inject.h |  43 +++++++++++
>  block/qapi.c         |  18 +++++
>  qapi/block.json      |  36 +++++++++
>  5 files changed, 277 insertions(+), 1 deletion(-)
>  create mode 100644 block/error_inject.c
>  create mode 100644 block/error_inject.h
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 35f3bca4d9..c8fcbc276b 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -1,4 +1,4 @@
> -block-obj-y += raw-format.o vmdk.o vpc.o
> +block-obj-y += raw-format.o vmdk.o vpc.o error_inject.o
>  block-obj-$(CONFIG_QCOW1) += qcow.o
>  block-obj-$(CONFIG_VDI) += vdi.o
>  block-obj-$(CONFIG_CLOOP) += cloop.o
> diff --git a/block/error_inject.c b/block/error_inject.c
> new file mode 100644
> index 0000000000..26a47dfb8c
> --- /dev/null
> +++ b/block/error_inject.c
> @@ -0,0 +1,179 @@
> +/*
> + * Error injection code for block devices
> + *
> + * Copyright (c) 2019 Red Hat, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "error_inject.h"
> +
> +#include <gmodule.h>
> +
> +#include "qemu/thread.h"
> +
> +static QemuMutex error_inject_lock;
> +static GHashTable *error_inject_data;
> +
> +
> +static void delete_lba_entries(void *entry)
> +{
> +    GSequence *e = (GSequence *) entry;
> +    g_sequence_free(e);
> +}
> +
> +struct value {
> +    uint64_t error_lba;
> +
> +    /*
> +     * TODO Actually do something with behavior
> +     */
> +    MediaErrorBehavior behavior;
> +    /*
> +     * TODO Add data for generating bitrot, when we do change free function
> +     */
> +};
> +
> +static int key_compare(gconstpointer a, gconstpointer b, gpointer data)
> +{
> +    uint64_t left = ((struct value *)a)->error_lba;
> +    uint64_t right = ((struct value *)b)->error_lba;
> +
> +    if (left < right) {
> +        return -1;
> +    } else if (left > right) {
> +        return 1;
> +    } else {
> +        return 0;
> +    }
> +}
> +
> +static uint64_t error_lba_get(GSequenceIter *iter)
> +{
> +    gpointer tmp = g_sequence_get(iter);
> +    return ((struct value *)tmp)->error_lba;
> +}
> +
> +void media_error_create(const char *device_id, uint64_t lba,
> +                        MediaErrorBehavior behavior)
> +{
> +    qemu_mutex_lock(&error_inject_lock);
> +
> +    GSequence *block_device = g_hash_table_lookup(error_inject_data, device_id);
> +    if (!block_device) {
> +        block_device = g_sequence_new(g_free);
> +        char *key = strdup(device_id);
> +        g_hash_table_insert(error_inject_data, key, block_device);
> +    }
> +
> +    struct value lookup = {lba, MEDIA_ERROR_BEHAVIOR__MAX};
> +    if (!g_sequence_lookup(block_device, &lookup, key_compare, NULL)) {
> +        struct value *val = g_new(struct value, 1);
> +        val->error_lba = lba;
> +        val->behavior = behavior;
> +
> +        g_sequence_insert_sorted(block_device, val, key_compare, NULL);
> +    }
> +
> +    qemu_mutex_unlock(&error_inject_lock);
> +}
> +
> +void media_error_delete(const char *device_id, uint64_t lba)

Can you explain the goal of this function? I don't understand it from
the hardware PoV. Or is it a simple code cleanup function?

Once hw starts to break, it is unlikely it recovers magically to a
previous sane state...

In real world, block hardware starts to fail, controller notice
inconsistency and set some bits, the controller driver react.
Eventually the hw will still be used, with the failing block avoided.

What do you think? How is your current use case setup?

Thanks,

Phil.

> +{
> +    qemu_mutex_lock(&error_inject_lock);
> +
> +    GSequence *block_device = g_hash_table_lookup(error_inject_data, device_id);
> +    if (block_device) {
> +        struct value find = { lba, MEDIA_ERROR_BEHAVIOR__MAX};
> +        GSequenceIter *found = g_sequence_lookup(block_device, &find,
> +                                                 key_compare, NULL);
> +        if (found) {
> +            g_sequence_remove(found);
> +        }
> +    }
> +
> +    qemu_mutex_unlock(&error_inject_lock);
> +}
> +
> +int error_in_read(const char *device_id, uint64_t lba, uint64_t len,
> +                  uint64_t *error_lba)
> +{
> +    uint64_t error_sector = 0;
> +    const uint64_t transfer_end = lba + len;
> +    int ec = 0;
> +    *error_lba = 0xFFFFFFFFFFFFFFFF;
> +
> +    qemu_mutex_lock(&error_inject_lock);
> +
> +    GSequence *block_device = g_hash_table_lookup(error_inject_data, device_id);
> +    if (block_device && g_sequence_get_length(block_device) != 0) {
> +        struct value find = {lba, MEDIA_ERROR_BEHAVIOR__MAX};
> +        GSequenceIter *iter = g_sequence_search(block_device, &find,
> +                                                key_compare, NULL);
> +
> +        /*
> +         * g_sequence_seach returns where the item would be inserted.
> +         * In the case of a direct match, it's spot is inserted after the
> +         * existing, thus we need to check the one immediately before
> +         * the insertion point to see if it is the one we are looking for.
> +         */
> +        GSequenceIter *prev = g_sequence_iter_prev(iter);
> +        error_sector = error_lba_get(prev);
> +
> +        if (error_sector >= lba && error_sector < transfer_end) {
> +            *error_lba = error_sector;
> +            ec = 1;
> +        } else {
> +            /*
> +             * Lets look at next until we find one in our transfer or bail
> +             * if the error(s) logical block address are greater than the
> +             * end of our transfer.
> +             */
> +            while (!g_sequence_iter_is_end(iter)) {
> +                error_sector = error_lba_get(iter);
> +
> +                if (error_sector >= transfer_end) {
> +                    break;
> +                }
> +                if (error_sector >= lba && error_sector < transfer_end) {
> +                    *error_lba = error_sector;
> +                    ec = 1;
> +                    break;
> +                } else {
> +                    iter = g_sequence_iter_next(iter);
> +                }
> +            }
> +        }
> +    }
> +
> +    qemu_mutex_unlock(&error_inject_lock);
> +
> +    return ec;
> +}
> +
> +
> +static void __attribute__((__constructor__)) error_inject_init(void)
> +{
> +    qemu_mutex_init(&error_inject_lock);
> +
> +    error_inject_data = g_hash_table_new_full(g_str_hash,
> +                                              g_str_equal,
> +                                              g_free,
> +                                              delete_lba_entries);
> +}
> diff --git a/block/error_inject.h b/block/error_inject.h
> new file mode 100644
> index 0000000000..5f2d143c90
> --- /dev/null
> +++ b/block/error_inject.h
> @@ -0,0 +1,43 @@
> +/*
> + * Error injection code for block devices
> + *
> + * Copyright (c) 2019 Red Hat, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#ifndef ERROR_INJECT
> +#define ERROR_INJECT
> +#include "qemu/osdep.h"
> +
> +#include <stdint.h>
> +
> +#include "qapi/qapi-commands-block.h"
> +#include "qapi/qapi-types-block.h"
> +
> +
> +void media_error_create(const char *device_id, uint64_t lba,
> +                        MediaErrorBehavior behavior);
> +void media_error_delete(const char *device_id, uint64_t lba);
> +
> +
> +int error_in_read(const char *device_id, uint64_t lba, uint64_t len,
> +                  uint64_t *error_lba);
> +
> +#endif
> diff --git a/block/qapi.c b/block/qapi.c
> index 15f1030264..d66201a831 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -26,11 +26,14 @@
>  #include "qemu-common.h"
>  #include "block/qapi.h"
>  #include "block/block_int.h"
> +#include "block/error_inject.h"
>  #include "block/throttle-groups.h"
>  #include "block/write-threshold.h"
>  #include "qapi/error.h"
> +#include "qapi/qapi-commands-block.h"
>  #include "qapi/qapi-commands-block-core.h"
>  #include "qapi/qobject-output-visitor.h"
> +#include "qapi/qapi-types-block.h"
>  #include "qapi/qapi-visit-block-core.h"
>  #include "qapi/qmp/qbool.h"
>  #include "qapi/qmp/qdict.h"
> @@ -841,3 +844,18 @@ void bdrv_image_info_dump(ImageInfo *info)
>          bdrv_image_info_specific_dump(info->format_specific);
>      }
>  }
> +
> +void qmp_media_error_create(const char *device_id, uint64_t lba,
> +        MediaErrorBehavior behavior, Error **errp)
> +{
> +    /*
> +     * We could validate the device_id and lba for existence and range, but we
> +     * want to be able to add entries before a device is hot plugged too.
> +     */
> +    media_error_create(device_id, lba, behavior);
> +}
> +
> +void qmp_media_error_delete(const char *device_id, uint64_t lba, Error **errp)
> +{
> +    media_error_delete(device_id, lba);
> +}
> diff --git a/qapi/block.json b/qapi/block.json
> index 145c268bb6..2b11954b44 100644
> --- a/qapi/block.json
> +++ b/qapi/block.json
> @@ -453,3 +453,39 @@
>  { 'event': 'QUORUM_REPORT_BAD',
>    'data': { 'type': 'QuorumOpType', '*error': 'str', 'node-name': 'str',
>              'sector-num': 'int', 'sectors-count': 'int' } }
> +
> +##
> +# @MediaErrorBehavior:
> +#
> +# Enumerated type for classifying type of read error behavior.
> +#
> +##
> +{ 'enum': 'MediaErrorBehavior', 'data': ['hard', 'fixed_on_write', 'flakey'] }
> +
> +##
> +# @media-error-create:
> +#
> +# Example:
> +# -> {"execute": "media-error-create",
> +#     "arguments": {"device_id": "12345678, "lba" : 10000, "behavior" : "hard"}}
> +# <- { "return": {} }
> +#
> +# Create a synthetic media error for the specified logical block address when
> +# it is read by the guest OS.
> +#
> +##
> +{ 'command':'media-error-create',
> +  'data': {'device_id': 'str', 'lba': 'uint64', 'behavior': 'MediaErrorBehavior'}}
> +
> +##
> +# @media-error-delete:
> +#
> +# Delete a synthetic media error for the specified logical block address.
> +#
> +# # Example:
> +# -> {"execute": "media-error-delete",
> +#     "arguments": {"device_id": "01020304", "lba" : 10000}
> +# <- { "return": {} }
> +##
> +{ 'command':'media-error-delete',
> +  'data': {'device_id': 'str', 'lba': 'uint64'}}
> 

Re: [RFC 1/4] Add qapi for block error injection
Posted by Tony Asleson 6 years, 4 months ago
On 9/20/19 4:03 AM, Philippe Mathieu-Daudé wrote:
> Hi Tony,

Hi Phil

> 
> On 9/19/19 9:48 PM, Tony Asleson wrote:
...
>> +void media_error_delete(const char *device_id, uint64_t lba)
> 
> Can you explain the goal of this function? I don't understand it from
> the hardware PoV. Or is it a simple code cleanup function?

It's just a way to do clean up.  I actually used it during development
as the Linux kernel can get very unresponsive when doing error retries ,
so I removed the error to allow things to proceed again.

> Once hw starts to break, it is unlikely it recovers magically to a
> previous sane state...
> 
> In real world, block hardware starts to fail, controller notice
> inconsistency and set some bits, the controller driver react.
> Eventually the hw will still be used, with the failing block avoided.

This is probably the most common case that people are familiar with, a
sector that goes bad and persists.

In rotating magnetic media devices, you can have a media error if a
write was interrupted by a power failure.  The sector has part of the
new write and part of the old write, CRC check fails and ECC is unable
to correct.  This can be corrected by having the host re-write the sector.

You can also have an intermittent read error if you have a transient
media problem, eg. a small particulate is moving around on the disk
surface (ransient thermal asperity, instead of persistent).

Depending on disk type (SCSI/SATA) and firmware features, the disk may
auto-reassign the block when it discovers that it had to go through lots
of recovery to retrieve the data or it may require the initiator to
issue a reassign blocks command.  In both cases the lba replacement is
ready to use.

> What do you think? How is your current use case setup?

Eventually I think it would be good to model some or all of these other
error scenarios I've outlined above.

> 
> Thanks,

Thank you!

-Tony