[PATCH v2 5/5] qom/object: Limit type names to alphanumerical and some few special characters

Thomas Huth posted 5 patches 1 year ago
Maintainers: Alistair Francis <alistair@alistair23.me>, "Edgar E. Iglesias" <edgar.iglesias@gmail.com>, Peter Maydell <peter.maydell@linaro.org>, Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>, David Hildenbrand <david@redhat.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, Francisco Iglesias <francisco.iglesias@amd.com>, "Daniel P. Berrangé" <berrange@redhat.com>, Eduardo Habkost <eduardo@habkost.net>
There is a newer version of this series
[PATCH v2 5/5] qom/object: Limit type names to alphanumerical and some few special characters
Posted by Thomas Huth 1 year ago
QOM names currently don't have any enforced naming rules. This
can be problematic, e.g. when they are used on the command line
for the "-device" option (where the comma is used to separate
properties). To avoid that such problematic type names come in
again, let's restrict the set of acceptable characters during the
type registration.

Ideally, we'd apply here the same rules as for QAPI, i.e. all type
names should begin with a letter, and contain only ASCII letters,
digits, hyphen, and underscore. However, we already have so many
pre-existing types like:

    486-x86_64-cpu
    cfi.pflash01
    power5+_v2.1-spapr-cpu-core
    virt-2.6-machine
    pc-i440fx-3.0-machine

... so that we have to allow "." and "+" for now, too. While the
dot is used in a lot of places, the "+" can fortunately be limited
to two classes of legacy names ("power" and "Sun-UltraSparc" CPUs).

We also cannot enforce the rule that names must start with a letter
yet, since there are lot of types that start with a digit. Still,
at least limiting the first characters to the alphanumerical range
should be way better than nothing.

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 qom/object.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/qom/object.c b/qom/object.c
index 95c0dc8285..571ef68950 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -138,9 +138,50 @@ static TypeImpl *type_new(const TypeInfo *info)
     return ti;
 }
 
+static bool type_name_is_valid(const char *name)
+{
+    const int slen = strlen(name);
+
+    g_assert(slen > 1);
+
+    /*
+     * Ideally, the name should start with a letter - however, we've got
+     * too many names starting with a digit already, so allow digits here,
+     * too (except '0' which is not used yet)
+     */
+    if (!g_ascii_isalnum(name[0]) || name[0] == '0') {
+        return false;
+    }
+
+    for (int i = 1; i < slen; i++) {
+        if (name[i] != '-' && name[i] != '_' && name[i] != '.' &&
+            !g_ascii_isalnum(name[i])) {
+            if (name[i] == '+') {
+                if (i == 6 && !strncmp(name, "power", 5)) {
+                    /* It's a legacy name like "power5+" */
+                    continue;
+                }
+                if (i >= 17 && !strncmp(name, "Sun-UltraSparc", 14)) {
+                    /* It's a legacy name like "Sun-UltraSparc-IV+" */
+                    continue;
+                }
+            }
+            return false;
+        }
+    }
+
+    return true;
+}
+
 static TypeImpl *type_register_internal(const TypeInfo *info)
 {
     TypeImpl *ti;
+
+    if (!type_name_is_valid(info->name)) {
+        fprintf(stderr, "Registering '%s' with illegal type name\n", info->name);
+        abort();
+    }
+
     ti = type_new(info);
 
     type_table_add(ti);
-- 
2.41.0
Re: [PATCH v2 5/5] qom/object: Limit type names to alphanumerical and some few special characters
Posted by Daniel P. Berrangé 1 year ago
On Thu, Nov 16, 2023 at 02:14:54PM +0100, Thomas Huth wrote:
> QOM names currently don't have any enforced naming rules. This
> can be problematic, e.g. when they are used on the command line
> for the "-device" option (where the comma is used to separate
> properties). To avoid that such problematic type names come in
> again, let's restrict the set of acceptable characters during the
> type registration.
> 
> Ideally, we'd apply here the same rules as for QAPI, i.e. all type
> names should begin with a letter, and contain only ASCII letters,
> digits, hyphen, and underscore. However, we already have so many
> pre-existing types like:
> 
>     486-x86_64-cpu
>     cfi.pflash01
>     power5+_v2.1-spapr-cpu-core
>     virt-2.6-machine
>     pc-i440fx-3.0-machine
> 
> ... so that we have to allow "." and "+" for now, too. While the
> dot is used in a lot of places, the "+" can fortunately be limited
> to two classes of legacy names ("power" and "Sun-UltraSparc" CPUs).
> 
> We also cannot enforce the rule that names must start with a letter
> yet, since there are lot of types that start with a digit. Still,
> at least limiting the first characters to the alphanumerical range
> should be way better than nothing.
> 
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  qom/object.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/qom/object.c b/qom/object.c
> index 95c0dc8285..571ef68950 100644
> --- a/qom/object.c
> +++ b/qom/object.c
> @@ -138,9 +138,50 @@ static TypeImpl *type_new(const TypeInfo *info)
>      return ti;
>  }
>  
> +static bool type_name_is_valid(const char *name)
> +{
> +    const int slen = strlen(name);
> +
> +    g_assert(slen > 1);
> +
> +    /*
> +     * Ideally, the name should start with a letter - however, we've got
> +     * too many names starting with a digit already, so allow digits here,
> +     * too (except '0' which is not used yet)
> +     */
> +    if (!g_ascii_isalnum(name[0]) || name[0] == '0') {
> +        return false;
> +    }
> +
> +    for (int i = 1; i < slen; i++) {
> +        if (name[i] != '-' && name[i] != '_' && name[i] != '.' &&
> +            !g_ascii_isalnum(name[i])) {
> +            if (name[i] == '+') {
> +                if (i == 6 && !strncmp(name, "power", 5)) {
> +                    /* It's a legacy name like "power5+" */
> +                    continue;
> +                }
> +                if (i >= 17 && !strncmp(name, "Sun-UltraSparc", 14)) {
> +                    /* It's a legacy name like "Sun-UltraSparc-IV+" */
> +                    continue;
> +                }
> +            }
> +            return false;
> +        }
> +    }

Replace this big loop with strspn, which has an asm optimized impl
in glibc

      ALPHA_LC "abcdefghijklmnopqrstuvwxyz"
      ALPHA_UC "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
      OTHER "0123456789-_."

      return (strspn(name, ALPHA_UC ALPHA_LC OTHER) == slen) ||
          (g_str_has_prefix(name, "power") && slen > 6 && name[6] == '+') ||
	  (g_str_has_prefix(name, "Sun-UltraSparc") && slen > 17 && name[17] == '+');


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH v2 5/5] qom/object: Limit type names to alphanumerical and some few special characters
Posted by Thomas Huth 1 year ago
On 16/11/2023 14.24, Daniel P. Berrangé wrote:
> On Thu, Nov 16, 2023 at 02:14:54PM +0100, Thomas Huth wrote:
>> QOM names currently don't have any enforced naming rules. This
>> can be problematic, e.g. when they are used on the command line
>> for the "-device" option (where the comma is used to separate
>> properties). To avoid that such problematic type names come in
>> again, let's restrict the set of acceptable characters during the
>> type registration.
>>
>> Ideally, we'd apply here the same rules as for QAPI, i.e. all type
>> names should begin with a letter, and contain only ASCII letters,
>> digits, hyphen, and underscore. However, we already have so many
>> pre-existing types like:
>>
>>      486-x86_64-cpu
>>      cfi.pflash01
>>      power5+_v2.1-spapr-cpu-core
>>      virt-2.6-machine
>>      pc-i440fx-3.0-machine
>>
>> ... so that we have to allow "." and "+" for now, too. While the
>> dot is used in a lot of places, the "+" can fortunately be limited
>> to two classes of legacy names ("power" and "Sun-UltraSparc" CPUs).
>>
>> We also cannot enforce the rule that names must start with a letter
>> yet, since there are lot of types that start with a digit. Still,
>> at least limiting the first characters to the alphanumerical range
>> should be way better than nothing.
>>
>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>> ---
>>   qom/object.c | 41 +++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 41 insertions(+)
>>
>> diff --git a/qom/object.c b/qom/object.c
>> index 95c0dc8285..571ef68950 100644
>> --- a/qom/object.c
>> +++ b/qom/object.c
>> @@ -138,9 +138,50 @@ static TypeImpl *type_new(const TypeInfo *info)
>>       return ti;
>>   }
>>   
>> +static bool type_name_is_valid(const char *name)
>> +{
>> +    const int slen = strlen(name);
>> +
>> +    g_assert(slen > 1);
>> +
>> +    /*
>> +     * Ideally, the name should start with a letter - however, we've got
>> +     * too many names starting with a digit already, so allow digits here,
>> +     * too (except '0' which is not used yet)
>> +     */
>> +    if (!g_ascii_isalnum(name[0]) || name[0] == '0') {
>> +        return false;
>> +    }
>> +
>> +    for (int i = 1; i < slen; i++) {
>> +        if (name[i] != '-' && name[i] != '_' && name[i] != '.' &&
>> +            !g_ascii_isalnum(name[i])) {
>> +            if (name[i] == '+') {
>> +                if (i == 6 && !strncmp(name, "power", 5)) {
>> +                    /* It's a legacy name like "power5+" */
>> +                    continue;
>> +                }
>> +                if (i >= 17 && !strncmp(name, "Sun-UltraSparc", 14)) {
>> +                    /* It's a legacy name like "Sun-UltraSparc-IV+" */
>> +                    continue;
>> +                }
>> +            }
>> +            return false;
>> +        }
>> +    }
> 
> Replace this big loop with strspn, which has an asm optimized impl
> in glibc
> 
>        ALPHA_LC "abcdefghijklmnopqrstuvwxyz"
>        ALPHA_UC "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
>        OTHER "0123456789-_."
> 
>        return (strspn(name, ALPHA_UC ALPHA_LC OTHER) == slen) ||
>            (g_str_has_prefix(name, "power") && slen > 6 && name[6] == '+') ||
> 	  (g_str_has_prefix(name, "Sun-UltraSparc") && slen > 17 && name[17] == '+');

It's quite hard to believe that a function that has to check each and every 
character in a string of acceptable characters is faster than a function 
that uses something like g_ascii_asalnum which can check range of characters 
in one go...

So I gave it a try, wrote two test programs, one with my implementation and 
one with yours, and looped on the function 1000000000 times. And indeed, for 
short strings (less than 30 characters), my function is about three times 
faster than the one with strspn() (mine takes ~ 13 seconds, the strspn() one 
takes ~ 39 seconds).

Interestingly, for larger strings (more than 140 characters), the strspn() 
impementation starts to perform better. They indeed must have an 
optimization that kicks in for larger strings.

Now while my implementation seems to be a little bit faster for the strings 
that we are using in QEMU, we certainly don't have 1000000000 different 
types in QEMU, but rather only 1300 or so, so the performance shouldn't 
really matter that much here. And I have to admit that your code is indeed 
more compact to read, so I'll give it a try.

  Thomas