[PATCH] module: fix early stop for module loading function

Hector Cao posted 1 patch 1 month, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260416134842.1726474-1-hector.cao@canonical.com
util/module.c | 34 +++++++++++++++++++---------------
1 file changed, 19 insertions(+), 15 deletions(-)
[PATCH] module: fix early stop for module loading function
Posted by Hector Cao 1 month, 2 weeks ago
The commit c551fb0b53d introduces a regression on the behavior
of the module load logic. Historically, if QEMU cannot load a
particular module in a folder, it keeps going and look for the
module in the remaining folders. The mentioned commit make QEMU
stop if the module exists but cannot be loaded (build mismatch
reason for example).

This causes QEMU fail to load the right module and this failure
happens usually when QEMU is upgraded and therefore QEMU has
several folders to look for the modules.

In addition to the regression on the behavior, this commit
introduces also a leak in the function module_load_dso()
that does not remove the entries from the list dso_init_list
when the the module load fails for build mismatch.

This commit has 2 folds:
 - restores the correct legacy behavior
 - fix the memory leak in module_load_dso()

Fixes: c551fb0b53d ("module: add Error arguments to module_load and module_load_qom")
Signed-off-by: Hector Cao <hector.cao@canonical.com>
---
 util/module.c | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/util/module.c b/util/module.c
index 32e263163c7..4d5b74fa6e4 100644
--- a/util/module.c
+++ b/util/module.c
@@ -184,6 +184,11 @@ static bool module_load_dso(const char *fname, bool export_symbols,
                 "Only modules from the same build can be loaded.\n");
         }
         g_module_close(g_module);
+        /* Clean up any dso init entries added during DSO initialization */
+        QTAILQ_FOREACH_SAFE(e, &dso_init_list, node, next) {
+            QTAILQ_REMOVE(&dso_init_list, e, node);
+            g_free(e);
+        }
         return false;
     }
 
@@ -279,29 +284,28 @@ int module_load(const char *prefix, const char *name, Error **errp)
         }
     }
 
-    for (i = 0; i < n_dirs; i++) {
+    /*
+     * Try to load module in given folders until we load it successfully or no
+     * more to look.
+     * If no module can be loaded successfully, errp is set to the last encountered error
+     */
+    rv = 0; /* module not found */
+    for (i = 0; (i < n_dirs) && (rv != 1); i++) {
         char *fname = g_strdup_printf("%s/%s%s",
                                       dirs[i], module_name, CONFIG_HOST_DSOSUF);
         int ret = access(fname, F_OK);
-        if (ret != 0 && (errno == ENOENT || errno == ENOTDIR)) {
-            /*
-             * if we don't find the module in this dir, try the next one.
-             * If we don't find it in any dir, that can be fine too: user
-             * did not install the module. We will return 0 in this case
-             * with no error set.
-             */
-            g_free(fname);
-            continue;
-        } else if (ret != 0) {
+        if ((ret == 0) && module_load_dso(fname, export_symbols, errp)) {
+            rv = 1; /* module successfully loaded */
+        }
+
+        /* The file exists but cannot be accessed -> report an error and continue */
+        if ((ret != 0) && (errno != ENOENT) && (errno != ENOTDIR)) {
             /* most common is EACCES here */
             error_setg_errno(errp, errno, "error trying to access %s", fname);
-        } else if (module_load_dso(fname, export_symbols, errp)) {
-            rv = 1; /* module successfully loaded */
         }
+
         g_free(fname);
-        goto out;
     }
-    rv = 0; /* module not found */
 
 out:
     if (rv <= 0) {
-- 
2.43.0
Re: [PATCH] module: fix early stop for module loading function
Posted by Gabriel Hartmann 1 month, 1 week ago
On Wed, Apr 16, 2026 at 01:48:42PM +0000, Hector Cao wrote:
> The commit c551fb0b53d introduces a regression on the behavior
> of the module load logic. Historically, if QEMU cannot load a
> particular module in a folder, it keeps going and look for the
> module in the remaining folders. The mentioned commit make QEMU
> stop if the module exists but cannot be loaded (build mismatch
> reason for example).

Thanks for the patch, Hector. I'm the original reporter of this issue
and have been debugging it on our OpenStack/Ceph production environment.
The analysis and both fixes look correct to me. One comment:

> +    rv = 0; /* module not found */
> +    for (i = 0; (i < n_dirs) && (rv != 1); i++) {
>          char *fname = g_strdup_printf("%s/%s%s",
>                                        dirs[i], module_name,
>  CONFIG_HOST_DSOSUF);
>          int ret = access(fname, F_OK);
> -        if (ret != 0 && (errno == ENOENT || errno == ENOTDIR)) {
> -            /*
> -             * if we don't find the module in this dir, try the next one.
> -             * If we don't find it in any dir, that can be fine too: user
> -             * did not install the module. We will return 0 in this case
> -             * with no error set.
> -             */
> -            g_free(fname);
> -            continue;
> -        } else if (ret != 0) {
> +        if ((ret == 0) && module_load_dso(fname, export_symbols, errp)) {
> +            rv = 1; /* module successfully loaded */
> +        }
> +
> +        /* The file exists but cannot be accessed -> report an error and
> continue */
> +        if ((ret != 0) && (errno != ENOENT) && (errno != ENOTDIR)) {
>              /* most common is EACCES here */
>              error_setg_errno(errp, errno, "error trying to access %s", fname);
> -        } else if (module_load_dso(fname, export_symbols, errp)) {
> -            rv = 1; /* module successfully loaded */
>          }

The loop can trigger assert(*errp == NULL) inside error_setv() when
errors occur in multiple iterations.

Consider: directory 1 contains the module but with a build mismatch.
module_load_dso() returns false and sets *errp. The loop continues to
directory 2, which either also has a mismatched module (calling
module_load_dso() -> error_setg() again) or has an access error
(calling error_setg_errno()). In both cases, *errp is already set,
and error_setv() will hit its assert(*errp == NULL).

The error needs to be freed between iterations, for example:

    rv = 0;
    for (i = 0; (i < n_dirs) && (rv != 1); i++) {
        char *fname = g_strdup_printf("%s/%s%s",
                                      dirs[i], module_name, CONFIG_HOST_DSOSUF);
        int ret = access(fname, F_OK);
        if ((ret == 0) && module_load_dso(fname, export_symbols, errp)) {
            rv = 1;
        }

        if ((ret != 0) && (errno != ENOENT) && (errno != ENOTDIR)) {
            error_setg_errno(errp, errno, "error trying to access %s", fname);
        }

        /* Clear error from this iteration before trying next directory */
        if ((rv != 1) && *errp) {
            error_free(*errp);
            *errp = NULL;
        }

        g_free(fname);
    }

    /* If we exhausted all dirs without success, set a meaningful error */
    if (rv != 1 && !*errp) {
        error_setg(errp, "could not load module '%s'", module_name);
    }

Alternatively, using a local Error *local_err = NULL and only
propagating the last error at the end would follow the common QEMU
pattern more closely.

-- 
Gabriel Hartmann
Senior Systems Engineer

NETWAYS Managed Services GmbH | Deutschherrnstr. 15-19 | D-90429 Nuernberg
Tel: +49 911 92885-0 | Fax: +49 911 92885-77
CEO: Julian Hein, Bernd Erk, Sebastian Saemann | AG Nuernberg HRB25207
https://www.netways.de | gabriel.hartmann@netways.de

** stackconf 2026 - April | Munich - https://stackconf.eu **
** Open Source Monitoring Conference 2026 - November | Nuremberg - https://osmc.de **
** NETWAYS Web Services - https://nws.netways.de **
** NETWAYS Trainings - https://netways.de/trainings **