[PATCH v2 13/19] qapi/schema: split "checked" field into "checking" and "checked"

John Snow posted 19 patches 10 months ago
Maintainers: Markus Armbruster <armbru@redhat.com>, Michael Roth <michael.roth@amd.com>
There is a newer version of this series
[PATCH v2 13/19] qapi/schema: split "checked" field into "checking" and "checked"
Posted by John Snow 10 months ago
differentiate between "actively in the process of checking" and
"checking has completed". This allows us to clean up the types of some
internal fields such as QAPISchemaObjectType's members field which
currently uses "None" as a test for determining if check has been run
already or not.

This simplifies the typing from a cumbersome Optional[List[T]] to merely
a List[T], which is more pythonic: it is safe to iterate over an empty
list with "for x in []" whereas with an Optional[List[T]] you have to
rely on the more cumbersome "if L: for x in L: ..."

Note that it is valid to have an empty members list, see the internal
q_empty object as an example.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/schema.py | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
index eefa58a798b..0d9a70ab4cb 100644
--- a/scripts/qapi/schema.py
+++ b/scripts/qapi/schema.py
@@ -20,7 +20,7 @@
 from collections import OrderedDict
 import os
 import re
-from typing import List, Optional
+from typing import List, Optional, cast
 
 from .common import (
     POINTER_SUFFIX,
@@ -457,22 +457,24 @@ def __init__(self, name, info, doc, ifcond, features,
         self.base = None
         self.local_members = local_members
         self.variants = variants
-        self.members = None
+        self.members: List[QAPISchemaObjectTypeMember] = []
+        self._checking = False
 
     def check(self, schema):
         # This calls another type T's .check() exactly when the C
         # struct emitted by gen_object() contains that T's C struct
         # (pointers don't count).
-        if self.members is not None:
-            # A previous .check() completed: nothing to do
-            return
-        if self._checked:
+        if self._checking:
             # Recursed: C struct contains itself
             raise QAPISemError(self.info,
                                "object %s contains itself" % self.name)
+        if self._checked:
+            # A previous .check() completed: nothing to do
+            return
 
+        self._checking = True
         super().check(schema)
-        assert self._checked and self.members is None
+        assert self._checked and not self.members
 
         seen = OrderedDict()
         if self._base_name:
@@ -489,13 +491,17 @@ def check(self, schema):
         for m in self.local_members:
             m.check(schema)
             m.check_clash(self.info, seen)
-        members = seen.values()
+
+        # check_clash is abstract, but local_members is asserted to be
+        # List[QAPISchemaObjectTypeMember]. Cast to the narrower type.
+        members = cast(List[QAPISchemaObjectTypeMember], list(seen.values()))
 
         if self.variants:
             self.variants.check(schema, seen)
             self.variants.check_clash(self.info, seen)
 
-        self.members = members  # mark completed
+        self.members = members
+        self._checking = False  # mark completed
 
     # Check that the members of this type do not cause duplicate JSON members,
     # and update seen to track the members seen so far. Report any errors
-- 
2.43.0
Re: [PATCH v2 13/19] qapi/schema: split "checked" field into "checking" and "checked"
Posted by Markus Armbruster 10 months ago
John Snow <jsnow@redhat.com> writes:

> differentiate between "actively in the process of checking" and
> "checking has completed". This allows us to clean up the types of some
> internal fields such as QAPISchemaObjectType's members field which
> currently uses "None" as a test for determining if check has been run
> already or not.
>
> This simplifies the typing from a cumbersome Optional[List[T]] to merely
> a List[T], which is more pythonic: it is safe to iterate over an empty
> list with "for x in []" whereas with an Optional[List[T]] you have to
> rely on the more cumbersome "if L: for x in L: ..."

Does this cumbersome form exist?

> Note that it is valid to have an empty members list, see the internal
> q_empty object as an example.

Yes.

.members becomes valid only in .check().  Before the patch, .__init__()
initializes it to None, and .check() sets it to the real value.  We use
assert .members is not None to catch invalid use.  We can also hope
invalid use without an assert would crash.  for m in .members would.

We've seen this pattern before: PATCH 4+5.  There, we change .__init__()
to declare the attribute without initializing it.  Use before it becomes
valid now certainly crashes, which is an improvement.  Why can't we do
the same here?

> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  scripts/qapi/schema.py | 24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
> index eefa58a798b..0d9a70ab4cb 100644
> --- a/scripts/qapi/schema.py
> +++ b/scripts/qapi/schema.py
> @@ -20,7 +20,7 @@
>  from collections import OrderedDict
>  import os
>  import re
> -from typing import List, Optional
> +from typing import List, Optional, cast
>  
>  from .common import (
>      POINTER_SUFFIX,
> @@ -457,22 +457,24 @@ def __init__(self, name, info, doc, ifcond, features,
>          self.base = None
>          self.local_members = local_members
>          self.variants = variants
> -        self.members = None
> +        self.members: List[QAPISchemaObjectTypeMember] = []
> +        self._checking = False
>  
>      def check(self, schema):
>          # This calls another type T's .check() exactly when the C
>          # struct emitted by gen_object() contains that T's C struct
>          # (pointers don't count).
> -        if self.members is not None:
> -            # A previous .check() completed: nothing to do
> -            return
> -        if self._checked:
> +        if self._checking:
>              # Recursed: C struct contains itself
>              raise QAPISemError(self.info,
>                                 "object %s contains itself" % self.name)
> +        if self._checked:
> +            # A previous .check() completed: nothing to do
> +            return
>  
> +        self._checking = True
>          super().check(schema)
> -        assert self._checked and self.members is None
> +        assert self._checked and not self.members
>  
>          seen = OrderedDict()
>          if self._base_name:
> @@ -489,13 +491,17 @@ def check(self, schema):
>          for m in self.local_members:
>              m.check(schema)
>              m.check_clash(self.info, seen)
> -        members = seen.values()
> +
> +        # check_clash is abstract, but local_members is asserted to be
> +        # List[QAPISchemaObjectTypeMember]. Cast to the narrower type.
> +        members = cast(List[QAPISchemaObjectTypeMember], list(seen.values()))
>  
>          if self.variants:
>              self.variants.check(schema, seen)
>              self.variants.check_clash(self.info, seen)
>  
> -        self.members = members  # mark completed
> +        self.members = members
> +        self._checking = False  # mark completed
>  
>      # Check that the members of this type do not cause duplicate JSON members,
>      # and update seen to track the members seen so far. Report any errors

I think you missed these:

       def is_empty(self):
           assert self.members is not None
           return not self.members and not self.variants

       def has_conditional_members(self):
           assert self.members is not None
           return any(m.ifcond.is_present() for m in self.members)

The assertions no longer work.  I figure you want to assert .checked
instead.

Consider splitting the patch: first add .checking, and replace use of
.members by use of .checking where possible.  Then change .members.  The
split may or may not be easier to describe and digest.  Use your
judgement.
Re: [PATCH v2 13/19] qapi/schema: split "checked" field into "checking" and "checked"
Posted by John Snow 9 months, 2 weeks ago
On Tue, Jan 16, 2024 at 9:58 AM Markus Armbruster <armbru@redhat.com> wrote:
>
> John Snow <jsnow@redhat.com> writes:
>
> > differentiate between "actively in the process of checking" and
> > "checking has completed". This allows us to clean up the types of some
> > internal fields such as QAPISchemaObjectType's members field which
> > currently uses "None" as a test for determining if check has been run
> > already or not.
> >
> > This simplifies the typing from a cumbersome Optional[List[T]] to merely
> > a List[T], which is more pythonic: it is safe to iterate over an empty
> > list with "for x in []" whereas with an Optional[List[T]] you have to
> > rely on the more cumbersome "if L: for x in L: ..."
>
> Does this cumbersome form exist?

I thought it should, but I suppose it shouldn't given that you have
been relying on None to cause a crash.

Here's where that pattern *would* be used if None was legitimate:

qapi/schema.py:604: error: Item "None" of
"Optional[List[QAPISchemaObjectTypeMember]]" has no attribute
"__iter__" (not iterable)  [union-attr]
qapi/schema.py:626: error: Item "None" of
"Optional[List[QAPISchemaObjectTypeMember]]" has no attribute
"__iter__" (not iterable)  [union-attr]
qapi/gen.py:122: error: Item "None" of
"Optional[List[QAPISchemaObjectTypeMember]]" has no attribute
"__iter__" (not iterable)  [union-attr]
qapi/commands.py:68: error: Item "None" of
"Optional[List[QAPISchemaObjectTypeMember]]" has no attribute
"__iter__" (not iterable)  [union-attr]
qapi/events.py:60: error: Item "None" of
"Optional[List[QAPISchemaObjectTypeMember]]" has no attribute
"__iter__" (not iterable)  [union-attr]

but, I suppose your argument is that it just literally never is. So
never mind the callsite argument - though mypy still wants its
guarantee that it will never be None here.

>
> > Note that it is valid to have an empty members list, see the internal
> > q_empty object as an example.
>
> Yes.
>
> .members becomes valid only in .check().  Before the patch, .__init__()
> initializes it to None, and .check() sets it to the real value.  We use
> assert .members is not None to catch invalid use.  We can also hope
> invalid use without an assert would crash.  for m in .members would.
>

Mmm, I see. (I very often just literally don't account for you relying
on invalid types being load-bearing ... Seeing a stack trace where
you're told that "None" does not have such-and-such property usually
feels about as helpful as getting kicked out of a moving car on the
freeway.)

> We've seen this pattern before: PATCH 4+5.  There, we change .__init__()
> to declare the attribute without initializing it.  Use before it becomes
> valid now certainly crashes, which is an improvement.  Why can't we do
> the same here?

Didn't occur to me to add a crash on purpose... In the other cases, I
think I didn't have any suitable value to add at all, but in this case
I can use an empty list.

(I have a kind of distaste for relying on Python-level exceptions like
AttributeError to indicate that a stateful error has occurred - i.e.
that this attribute doesn't exist yet, but it will. I usually aim for
having all of the attributes defined at initialization time. The error
is misleading, semantically.

In our case, full initialization directly at init time is not ...
quite possible without some vigorous reworking of the known universe,
so I capitulated and allowed some "very late initialization" which
causes some friction for static typing between pre- and post- "delayed
initialization" callsites. It's still very much something I look at
and regard as a code smell. The entire point of declaring all state in
the init method is to centralize that state so it's finite and
knowable, both to static analysis tools and to humans. Punting the
initialization to arbitrary points later in the object's lifetime
feels like lying: we promise this value is here after initialization,
except not really, have a nice day.)

*cough* that said, I can also use that same trick here. I just want to
whine about it. (I guess I don't want to teach folks what I consider
to be a bad habit just because it makes the linters shut up, but
realize the sausage has to get made.)

... I *just* now saw you had more ideas on how to approach this in the
last series, I will go back and read them. I didn't mean to skip past
them. Lemme investigate.

>
> > Signed-off-by: John Snow <jsnow@redhat.com>
> > ---
> >  scripts/qapi/schema.py | 24 +++++++++++++++---------
> >  1 file changed, 15 insertions(+), 9 deletions(-)
> >
> > diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
> > index eefa58a798b..0d9a70ab4cb 100644
> > --- a/scripts/qapi/schema.py
> > +++ b/scripts/qapi/schema.py
> > @@ -20,7 +20,7 @@
> >  from collections import OrderedDict
> >  import os
> >  import re
> > -from typing import List, Optional
> > +from typing import List, Optional, cast
> >
> >  from .common import (
> >      POINTER_SUFFIX,
> > @@ -457,22 +457,24 @@ def __init__(self, name, info, doc, ifcond, features,
> >          self.base = None
> >          self.local_members = local_members
> >          self.variants = variants
> > -        self.members = None
> > +        self.members: List[QAPISchemaObjectTypeMember] = []
> > +        self._checking = False
> >
> >      def check(self, schema):
> >          # This calls another type T's .check() exactly when the C
> >          # struct emitted by gen_object() contains that T's C struct
> >          # (pointers don't count).
> > -        if self.members is not None:
> > -            # A previous .check() completed: nothing to do
> > -            return
> > -        if self._checked:
> > +        if self._checking:
> >              # Recursed: C struct contains itself
> >              raise QAPISemError(self.info,
> >                                 "object %s contains itself" % self.name)
> > +        if self._checked:
> > +            # A previous .check() completed: nothing to do
> > +            return
> >
> > +        self._checking = True
> >          super().check(schema)
> > -        assert self._checked and self.members is None
> > +        assert self._checked and not self.members
> >
> >          seen = OrderedDict()
> >          if self._base_name:
> > @@ -489,13 +491,17 @@ def check(self, schema):
> >          for m in self.local_members:
> >              m.check(schema)
> >              m.check_clash(self.info, seen)
> > -        members = seen.values()
> > +
> > +        # check_clash is abstract, but local_members is asserted to be
> > +        # List[QAPISchemaObjectTypeMember]. Cast to the narrower type.
> > +        members = cast(List[QAPISchemaObjectTypeMember], list(seen.values()))
> >
> >          if self.variants:
> >              self.variants.check(schema, seen)
> >              self.variants.check_clash(self.info, seen)
> >
> > -        self.members = members  # mark completed
> > +        self.members = members
> > +        self._checking = False  # mark completed
> >
> >      # Check that the members of this type do not cause duplicate JSON members,
> >      # and update seen to track the members seen so far. Report any errors
>
> I think you missed these:
>
>        def is_empty(self):
>            assert self.members is not None
>            return not self.members and not self.variants
>
>        def has_conditional_members(self):
>            assert self.members is not None
>            return any(m.ifcond.is_present() for m in self.members)
>
> The assertions no longer work.  I figure you want to assert .checked
> instead.
>
> Consider splitting the patch: first add .checking, and replace use of
> .members by use of .checking where possible.  Then change .members.  The
> split may or may not be easier to describe and digest.  Use your
> judgement.

Ah, oops.

(I wish mypy would bark about this, actually: if members is a List,
then surely it can't ever be None, right? It's definitely a smell. I
wonder if there is a reason why it doesn't, or if this is a valid
feature request.)

--js
Re: [PATCH v2 13/19] qapi/schema: split "checked" field into "checking" and "checked"
Posted by John Snow 9 months, 2 weeks ago
On Thu, Feb 1, 2024 at 2:41 PM John Snow <jsnow@redhat.com> wrote:
>
> On Tue, Jan 16, 2024 at 9:58 AM Markus Armbruster <armbru@redhat.com> wrote:
> >
> > John Snow <jsnow@redhat.com> writes:
> >
> > > differentiate between "actively in the process of checking" and
> > > "checking has completed". This allows us to clean up the types of some
> > > internal fields such as QAPISchemaObjectType's members field which
> > > currently uses "None" as a test for determining if check has been run
> > > already or not.
> > >
> > > This simplifies the typing from a cumbersome Optional[List[T]] to merely
> > > a List[T], which is more pythonic: it is safe to iterate over an empty
> > > list with "for x in []" whereas with an Optional[List[T]] you have to
> > > rely on the more cumbersome "if L: for x in L: ..."
> >
> > Does this cumbersome form exist?
>
> I thought it should, but I suppose it shouldn't given that you have
> been relying on None to cause a crash.
>
> Here's where that pattern *would* be used if None was legitimate:
>
> qapi/schema.py:604: error: Item "None" of
> "Optional[List[QAPISchemaObjectTypeMember]]" has no attribute
> "__iter__" (not iterable)  [union-attr]
> qapi/schema.py:626: error: Item "None" of
> "Optional[List[QAPISchemaObjectTypeMember]]" has no attribute
> "__iter__" (not iterable)  [union-attr]
> qapi/gen.py:122: error: Item "None" of
> "Optional[List[QAPISchemaObjectTypeMember]]" has no attribute
> "__iter__" (not iterable)  [union-attr]
> qapi/commands.py:68: error: Item "None" of
> "Optional[List[QAPISchemaObjectTypeMember]]" has no attribute
> "__iter__" (not iterable)  [union-attr]
> qapi/events.py:60: error: Item "None" of
> "Optional[List[QAPISchemaObjectTypeMember]]" has no attribute
> "__iter__" (not iterable)  [union-attr]
>
> but, I suppose your argument is that it just literally never is. So
> never mind the callsite argument - though mypy still wants its
> guarantee that it will never be None here.
>
> >
> > > Note that it is valid to have an empty members list, see the internal
> > > q_empty object as an example.
> >
> > Yes.
> >
> > .members becomes valid only in .check().  Before the patch, .__init__()
> > initializes it to None, and .check() sets it to the real value.  We use
> > assert .members is not None to catch invalid use.  We can also hope
> > invalid use without an assert would crash.  for m in .members would.
> >
>
> Mmm, I see. (I very often just literally don't account for you relying
> on invalid types being load-bearing ... Seeing a stack trace where
> you're told that "None" does not have such-and-such property usually
> feels about as helpful as getting kicked out of a moving car on the
> freeway.)
>
> > We've seen this pattern before: PATCH 4+5.  There, we change .__init__()
> > to declare the attribute without initializing it.  Use before it becomes
> > valid now certainly crashes, which is an improvement.  Why can't we do
> > the same here?
>
> Didn't occur to me to add a crash on purpose... In the other cases, I
> think I didn't have any suitable value to add at all, but in this case
> I can use an empty list.
>
> (I have a kind of distaste for relying on Python-level exceptions like
> AttributeError to indicate that a stateful error has occurred - i.e.
> that this attribute doesn't exist yet, but it will. I usually aim for
> having all of the attributes defined at initialization time. The error
> is misleading, semantically.
>
> In our case, full initialization directly at init time is not ...
> quite possible without some vigorous reworking of the known universe,
> so I capitulated and allowed some "very late initialization" which
> causes some friction for static typing between pre- and post- "delayed
> initialization" callsites. It's still very much something I look at
> and regard as a code smell. The entire point of declaring all state in
> the init method is to centralize that state so it's finite and
> knowable, both to static analysis tools and to humans. Punting the
> initialization to arbitrary points later in the object's lifetime
> feels like lying: we promise this value is here after initialization,
> except not really, have a nice day.)
>
> *cough* that said, I can also use that same trick here. I just want to
> whine about it. (I guess I don't want to teach folks what I consider
> to be a bad habit just because it makes the linters shut up, but
> realize the sausage has to get made.)
>
> ... I *just* now saw you had more ideas on how to approach this in the
> last series, I will go back and read them. I didn't mean to skip past
> them. Lemme investigate.
>
> >
> > > Signed-off-by: John Snow <jsnow@redhat.com>
> > > ---
> > >  scripts/qapi/schema.py | 24 +++++++++++++++---------
> > >  1 file changed, 15 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
> > > index eefa58a798b..0d9a70ab4cb 100644
> > > --- a/scripts/qapi/schema.py
> > > +++ b/scripts/qapi/schema.py
> > > @@ -20,7 +20,7 @@
> > >  from collections import OrderedDict
> > >  import os
> > >  import re
> > > -from typing import List, Optional
> > > +from typing import List, Optional, cast
> > >
> > >  from .common import (
> > >      POINTER_SUFFIX,
> > > @@ -457,22 +457,24 @@ def __init__(self, name, info, doc, ifcond, features,
> > >          self.base = None
> > >          self.local_members = local_members
> > >          self.variants = variants
> > > -        self.members = None
> > > +        self.members: List[QAPISchemaObjectTypeMember] = []
> > > +        self._checking = False
> > >
> > >      def check(self, schema):
> > >          # This calls another type T's .check() exactly when the C
> > >          # struct emitted by gen_object() contains that T's C struct
> > >          # (pointers don't count).
> > > -        if self.members is not None:
> > > -            # A previous .check() completed: nothing to do
> > > -            return
> > > -        if self._checked:
> > > +        if self._checking:
> > >              # Recursed: C struct contains itself
> > >              raise QAPISemError(self.info,
> > >                                 "object %s contains itself" % self.name)
> > > +        if self._checked:
> > > +            # A previous .check() completed: nothing to do
> > > +            return
> > >
> > > +        self._checking = True
> > >          super().check(schema)
> > > -        assert self._checked and self.members is None
> > > +        assert self._checked and not self.members
> > >
> > >          seen = OrderedDict()
> > >          if self._base_name:
> > > @@ -489,13 +491,17 @@ def check(self, schema):
> > >          for m in self.local_members:
> > >              m.check(schema)
> > >              m.check_clash(self.info, seen)
> > > -        members = seen.values()
> > > +
> > > +        # check_clash is abstract, but local_members is asserted to be
> > > +        # List[QAPISchemaObjectTypeMember]. Cast to the narrower type.
> > > +        members = cast(List[QAPISchemaObjectTypeMember], list(seen.values()))
> > >
> > >          if self.variants:
> > >              self.variants.check(schema, seen)
> > >              self.variants.check_clash(self.info, seen)
> > >
> > > -        self.members = members  # mark completed
> > > +        self.members = members
> > > +        self._checking = False  # mark completed
> > >
> > >      # Check that the members of this type do not cause duplicate JSON members,
> > >      # and update seen to track the members seen so far. Report any errors
> >
> > I think you missed these:
> >
> >        def is_empty(self):
> >            assert self.members is not None
> >            return not self.members and not self.variants
> >
> >        def has_conditional_members(self):
> >            assert self.members is not None
> >            return any(m.ifcond.is_present() for m in self.members)
> >
> > The assertions no longer work.  I figure you want to assert .checked
> > instead.
> >
> > Consider splitting the patch: first add .checking, and replace use of
> > .members by use of .checking where possible.  Then change .members.  The
> > split may or may not be easier to describe and digest.  Use your
> > judgement.
>
> Ah, oops.

Oh, I got it in the assertions patch, but have since moved it forward.

>
> (I wish mypy would bark about this, actually: if members is a List,
> then surely it can't ever be None, right? It's definitely a smell. I
> wonder if there is a reason why it doesn't, or if this is a valid
> feature request.)
>
> --js