QOM names currently don't have any enforced naming rules. This
can be problematic, e.g. when they are used on the command line
for the "-device" option (where the comma is used to separate
properties). To avoid that such problematic type names come in
again, let's restrict the set of acceptable characters during the
type registration.
Ideally, we'd apply here the same rules as for QAPI, i.e. all type
names should begin with a letter, and contain only ASCII letters,
digits, hyphen, and underscore. However, we already have so many
pre-existing types like:
486-x86_64-cpu
cfi.pflash01
power5+_v2.1-spapr-cpu-core
virt-2.6-machine
pc-i440fx-3.0-machine
... so that we have to allow "." and "+" for now, too. While the
dot is used in a lot of places, the "+" can fortunately be limited
to two classes of legacy names ("power" and "Sun-UltraSparc" CPUs).
We also cannot enforce the rule that names must start with a letter
yet, since there are lot of types that start with a digit. Still,
at least limiting the first characters to the alphanumerical range
should be way better than nothing.
Signed-off-by: Thomas Huth <thuth@redhat.com>
---
qom/object.c | 41 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/qom/object.c b/qom/object.c
index 95c0dc8285..571ef68950 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -138,9 +138,50 @@ static TypeImpl *type_new(const TypeInfo *info)
return ti;
}
+static bool type_name_is_valid(const char *name)
+{
+ const int slen = strlen(name);
+
+ g_assert(slen > 1);
+
+ /*
+ * Ideally, the name should start with a letter - however, we've got
+ * too many names starting with a digit already, so allow digits here,
+ * too (except '0' which is not used yet)
+ */
+ if (!g_ascii_isalnum(name[0]) || name[0] == '0') {
+ return false;
+ }
+
+ for (int i = 1; i < slen; i++) {
+ if (name[i] != '-' && name[i] != '_' && name[i] != '.' &&
+ !g_ascii_isalnum(name[i])) {
+ if (name[i] == '+') {
+ if (i == 6 && !strncmp(name, "power", 5)) {
+ /* It's a legacy name like "power5+" */
+ continue;
+ }
+ if (i >= 17 && !strncmp(name, "Sun-UltraSparc", 14)) {
+ /* It's a legacy name like "Sun-UltraSparc-IV+" */
+ continue;
+ }
+ }
+ return false;
+ }
+ }
+
+ return true;
+}
+
static TypeImpl *type_register_internal(const TypeInfo *info)
{
TypeImpl *ti;
+
+ if (!type_name_is_valid(info->name)) {
+ fprintf(stderr, "Registering '%s' with illegal type name\n", info->name);
+ abort();
+ }
+
ti = type_new(info);
type_table_add(ti);
--
2.41.0
On Thu, Nov 16, 2023 at 02:14:54PM +0100, Thomas Huth wrote: > QOM names currently don't have any enforced naming rules. This > can be problematic, e.g. when they are used on the command line > for the "-device" option (where the comma is used to separate > properties). To avoid that such problematic type names come in > again, let's restrict the set of acceptable characters during the > type registration. > > Ideally, we'd apply here the same rules as for QAPI, i.e. all type > names should begin with a letter, and contain only ASCII letters, > digits, hyphen, and underscore. However, we already have so many > pre-existing types like: > > 486-x86_64-cpu > cfi.pflash01 > power5+_v2.1-spapr-cpu-core > virt-2.6-machine > pc-i440fx-3.0-machine > > ... so that we have to allow "." and "+" for now, too. While the > dot is used in a lot of places, the "+" can fortunately be limited > to two classes of legacy names ("power" and "Sun-UltraSparc" CPUs). > > We also cannot enforce the rule that names must start with a letter > yet, since there are lot of types that start with a digit. Still, > at least limiting the first characters to the alphanumerical range > should be way better than nothing. > > Signed-off-by: Thomas Huth <thuth@redhat.com> > --- > qom/object.c | 41 +++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 41 insertions(+) > > diff --git a/qom/object.c b/qom/object.c > index 95c0dc8285..571ef68950 100644 > --- a/qom/object.c > +++ b/qom/object.c > @@ -138,9 +138,50 @@ static TypeImpl *type_new(const TypeInfo *info) > return ti; > } > > +static bool type_name_is_valid(const char *name) > +{ > + const int slen = strlen(name); > + > + g_assert(slen > 1); > + > + /* > + * Ideally, the name should start with a letter - however, we've got > + * too many names starting with a digit already, so allow digits here, > + * too (except '0' which is not used yet) > + */ > + if (!g_ascii_isalnum(name[0]) || name[0] == '0') { > + return false; > + } > + > + for (int i = 1; i < slen; i++) { > + if (name[i] != '-' && name[i] != '_' && name[i] != '.' && > + !g_ascii_isalnum(name[i])) { > + if (name[i] == '+') { > + if (i == 6 && !strncmp(name, "power", 5)) { > + /* It's a legacy name like "power5+" */ > + continue; > + } > + if (i >= 17 && !strncmp(name, "Sun-UltraSparc", 14)) { > + /* It's a legacy name like "Sun-UltraSparc-IV+" */ > + continue; > + } > + } > + return false; > + } > + } Replace this big loop with strspn, which has an asm optimized impl in glibc ALPHA_LC "abcdefghijklmnopqrstuvwxyz" ALPHA_UC "ABCDEFGHIJKLMNOPQRSTUVWXYZ" OTHER "0123456789-_." return (strspn(name, ALPHA_UC ALPHA_LC OTHER) == slen) || (g_str_has_prefix(name, "power") && slen > 6 && name[6] == '+') || (g_str_has_prefix(name, "Sun-UltraSparc") && slen > 17 && name[17] == '+'); With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 16/11/2023 14.24, Daniel P. Berrangé wrote: > On Thu, Nov 16, 2023 at 02:14:54PM +0100, Thomas Huth wrote: >> QOM names currently don't have any enforced naming rules. This >> can be problematic, e.g. when they are used on the command line >> for the "-device" option (where the comma is used to separate >> properties). To avoid that such problematic type names come in >> again, let's restrict the set of acceptable characters during the >> type registration. >> >> Ideally, we'd apply here the same rules as for QAPI, i.e. all type >> names should begin with a letter, and contain only ASCII letters, >> digits, hyphen, and underscore. However, we already have so many >> pre-existing types like: >> >> 486-x86_64-cpu >> cfi.pflash01 >> power5+_v2.1-spapr-cpu-core >> virt-2.6-machine >> pc-i440fx-3.0-machine >> >> ... so that we have to allow "." and "+" for now, too. While the >> dot is used in a lot of places, the "+" can fortunately be limited >> to two classes of legacy names ("power" and "Sun-UltraSparc" CPUs). >> >> We also cannot enforce the rule that names must start with a letter >> yet, since there are lot of types that start with a digit. Still, >> at least limiting the first characters to the alphanumerical range >> should be way better than nothing. >> >> Signed-off-by: Thomas Huth <thuth@redhat.com> >> --- >> qom/object.c | 41 +++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 41 insertions(+) >> >> diff --git a/qom/object.c b/qom/object.c >> index 95c0dc8285..571ef68950 100644 >> --- a/qom/object.c >> +++ b/qom/object.c >> @@ -138,9 +138,50 @@ static TypeImpl *type_new(const TypeInfo *info) >> return ti; >> } >> >> +static bool type_name_is_valid(const char *name) >> +{ >> + const int slen = strlen(name); >> + >> + g_assert(slen > 1); >> + >> + /* >> + * Ideally, the name should start with a letter - however, we've got >> + * too many names starting with a digit already, so allow digits here, >> + * too (except '0' which is not used yet) >> + */ >> + if (!g_ascii_isalnum(name[0]) || name[0] == '0') { >> + return false; >> + } >> + >> + for (int i = 1; i < slen; i++) { >> + if (name[i] != '-' && name[i] != '_' && name[i] != '.' && >> + !g_ascii_isalnum(name[i])) { >> + if (name[i] == '+') { >> + if (i == 6 && !strncmp(name, "power", 5)) { >> + /* It's a legacy name like "power5+" */ >> + continue; >> + } >> + if (i >= 17 && !strncmp(name, "Sun-UltraSparc", 14)) { >> + /* It's a legacy name like "Sun-UltraSparc-IV+" */ >> + continue; >> + } >> + } >> + return false; >> + } >> + } > > Replace this big loop with strspn, which has an asm optimized impl > in glibc > > ALPHA_LC "abcdefghijklmnopqrstuvwxyz" > ALPHA_UC "ABCDEFGHIJKLMNOPQRSTUVWXYZ" > OTHER "0123456789-_." > > return (strspn(name, ALPHA_UC ALPHA_LC OTHER) == slen) || > (g_str_has_prefix(name, "power") && slen > 6 && name[6] == '+') || > (g_str_has_prefix(name, "Sun-UltraSparc") && slen > 17 && name[17] == '+'); It's quite hard to believe that a function that has to check each and every character in a string of acceptable characters is faster than a function that uses something like g_ascii_asalnum which can check range of characters in one go... So I gave it a try, wrote two test programs, one with my implementation and one with yours, and looped on the function 1000000000 times. And indeed, for short strings (less than 30 characters), my function is about three times faster than the one with strspn() (mine takes ~ 13 seconds, the strspn() one takes ~ 39 seconds). Interestingly, for larger strings (more than 140 characters), the strspn() impementation starts to perform better. They indeed must have an optimization that kicks in for larger strings. Now while my implementation seems to be a little bit faster for the strings that we are using in QEMU, we certainly don't have 1000000000 different types in QEMU, but rather only 1300 or so, so the performance shouldn't really matter that much here. And I have to admit that your code is indeed more compact to read, so I'll give it a try. Thomas
© 2016 - 2024 Red Hat, Inc.