[v1] qapi: statically type schema.py

[PATCH 12/19] qapi/schema: split "checked" field into "checking" and "checked"

Posted by John Snow 1 year ago

Differentiate between "actively in the process of checking" and
"checking has completed". This allows us to clean up the types of some
internal fields such as QAPISchemaObjectType's members field which
currently uses "None" as a canary for determining if check has
completed.

This simplifies the typing from a cumbersome Optional[List[T]] to merely
a List[T], which is more pythonic: it is safe to iterate over an empty
list with "for x in []" whereas with an Optional[List[T]] you have to
rely on the more cumbersome "if L: for x in L: ..."

RFC: are we guaranteed to have members here? can we just use "if
members" without adding the new field?

Signed-off-by: John Snow <jsnow@redhat.com>
---
 scripts/qapi/schema.py | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
index 164d86c4064..200bc0730d6 100644
--- a/scripts/qapi/schema.py
+++ b/scripts/qapi/schema.py
@@ -18,7 +18,7 @@
 from collections import OrderedDict
 import os
 import re
-from typing import List, Optional
+from typing import List, Optional, cast
 
 from .common import (
     POINTER_SUFFIX,
@@ -447,22 +447,24 @@ def __init__(self, name, info, doc, ifcond, features,
         self.base = None
         self.local_members = local_members
         self.variants = variants
-        self.members = None
+        self.members: List[QAPISchemaObjectTypeMember] = []
+        self._checking = False
 
     def check(self, schema):
         # This calls another type T's .check() exactly when the C
         # struct emitted by gen_object() contains that T's C struct
         # (pointers don't count).
-        if self.members is not None:
-            # A previous .check() completed: nothing to do
-            return
-        if self._checked:
+        if self._checking:
             # Recursed: C struct contains itself
             raise QAPISemError(self.info,
                                "object %s contains itself" % self.name)
+        if self._checked:
+            # A previous .check() completed: nothing to do
+            return
 
+        self._checking = True
         super().check(schema)
-        assert self._checked and self.members is None
+        assert self._checked and not self.members
 
         seen = OrderedDict()
         if self._base_name:
@@ -479,13 +481,17 @@ def check(self, schema):
         for m in self.local_members:
             m.check(schema)
             m.check_clash(self.info, seen)
-        members = seen.values()
+
+        # check_clash is abstract, but local_members is asserted to be
+        # List[QAPISchemaObjectTypeMember]. Cast to the narrower type.
+        members = cast(List[QAPISchemaObjectTypeMember], list(seen.values()))
 
         if self.variants:
             self.variants.check(schema, seen)
             self.variants.check_clash(self.info, seen)
 
-        self.members = members  # mark completed
+        self.members = members
+        self._checking = False  # mark completed
 
     # Check that the members of this type do not cause duplicate JSON members,
     # and update seen to track the members seen so far. Report any errors
-- 
2.41.0

Re: [PATCH 12/19] qapi/schema: split "checked" field into "checking" and "checked"

Posted by Markus Armbruster 1 year ago

John Snow <jsnow@redhat.com> writes:

> Differentiate between "actively in the process of checking" and
> "checking has completed". This allows us to clean up the types of some
> internal fields such as QAPISchemaObjectType's members field which
> currently uses "None" as a canary for determining if check has
> completed.

Certain members become valid only after .check().  Two ways to code
that:

1. Assign to such members only in .check().  If you try to use them
before .check(), AttributeError.  Nice.  Drawback: pylint is unhappy,
W0201 attribute-defined-outside-init.

2. Assign None in .__init__(), and the real value in .check().  If you
try to use them before .check(), you get None, which hopefully leads to
an error.  Meh, but pylint is happy.

I picked 2. because pylint's warning made me go "when in Rome..."

With type hints, we can declare in .__init__(), and assign in .check().
Gives me the AttributeError I like, and pylint remains happy.  What do
you think?

Splitting .checked feels like a separate change, though.  I can't quite
see why we need it.  Help me out: what problem does it solve?

> This simplifies the typing from a cumbersome Optional[List[T]] to merely
> a List[T], which is more pythonic: it is safe to iterate over an empty
> list with "for x in []" whereas with an Optional[List[T]] you have to
> rely on the more cumbersome "if L: for x in L: ..."

Yes, but when L is None, it's *invalid*, and for i in L *should* fail
when L is invalid.

I think the actual problem is something else.  By adding the type hint
Optional[List[T]], the valid uses of L become type errors.  We really
want L to be a List[T].  Doesn't mean we have to (or want to) make uses
of invalid L "work".

> RFC: are we guaranteed to have members here? can we just use "if
> members" without adding the new field?

I'm afraid I don't understand these questions.

> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  scripts/qapi/schema.py | 24 +++++++++++++++---------
>  1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
> index 164d86c4064..200bc0730d6 100644
> --- a/scripts/qapi/schema.py
> +++ b/scripts/qapi/schema.py
> @@ -18,7 +18,7 @@
>  from collections import OrderedDict
>  import os
>  import re
> -from typing import List, Optional
> +from typing import List, Optional, cast
>  
>  from .common import (
>      POINTER_SUFFIX,
> @@ -447,22 +447,24 @@ def __init__(self, name, info, doc, ifcond, features,
>          self.base = None
>          self.local_members = local_members
>          self.variants = variants
> -        self.members = None
> +        self.members: List[QAPISchemaObjectTypeMember] = []

Can we do

           self.members: List[QAPISchemaObjectTypeMember]

?

> +        self._checking = False
>  
>      def check(self, schema):
>          # This calls another type T's .check() exactly when the C
>          # struct emitted by gen_object() contains that T's C struct
>          # (pointers don't count).
> -        if self.members is not None:
> -            # A previous .check() completed: nothing to do
> -            return
> -        if self._checked:
> +        if self._checking:
>              # Recursed: C struct contains itself
>              raise QAPISemError(self.info,
>                                 "object %s contains itself" % self.name)
> +        if self._checked:
> +            # A previous .check() completed: nothing to do
> +            return
>  
> +        self._checking = True
>          super().check(schema)
> -        assert self._checked and self.members is None
> +        assert self._checked and not self.members

If we assign only in .check(), we can't read self.members to find out
whether it's valid.  We could perhaps mess with .__dict__() instead.
Not sure it's worthwhile.  Dumb down the assertion?

>  
>          seen = OrderedDict()
>          if self._base_name:
> @@ -479,13 +481,17 @@ def check(self, schema):
>          for m in self.local_members:
>              m.check(schema)
>              m.check_clash(self.info, seen)
> -        members = seen.values()
> +
> +        # check_clash is abstract, but local_members is asserted to be
> +        # List[QAPISchemaObjectTypeMember]. Cast to the narrower type.
> +        members = cast(List[QAPISchemaObjectTypeMember], list(seen.values()))
>  
>          if self.variants:
>              self.variants.check(schema, seen)
>              self.variants.check_clash(self.info, seen)
>  
> -        self.members = members  # mark completed
> +        self.members = members
> +        self._checking = False  # mark completed
>  
>      # Check that the members of this type do not cause duplicate JSON members,
>      # and update seen to track the members seen so far. Report any errors

Re: [PATCH 12/19] qapi/schema: split "checked" field into "checking" and "checked"

Posted by John Snow 10 months, 3 weeks ago

On Wed, Nov 22, 2023, 9:02 AM Markus Armbruster <armbru@redhat.com> wrote:

> John Snow <jsnow@redhat.com> writes:
>
> > Differentiate between "actively in the process of checking" and
> > "checking has completed". This allows us to clean up the types of some
> > internal fields such as QAPISchemaObjectType's members field which
> > currently uses "None" as a canary for determining if check has
> > completed.
>
> Certain members become valid only after .check().  Two ways to code
> that:
>
> 1. Assign to such members only in .check().  If you try to use them
> before .check(), AttributeError.  Nice.  Drawback: pylint is unhappy,
> W0201 attribute-defined-outside-init.
>

Can be overcome by declaring the field in __init__, which satisfies both
the linter and my developer usability sense (Classes should be easy to have
their properties enumerated by looking in one well known place.)

> 2. Assign None in .__init__(), and the real value in .check().  If you
> try to use them before .check(), you get None, which hopefully leads to
> an error.  Meh, but pylint is happy.
>
> I picked 2. because pylint's warning made me go "when in Rome..."
>

Yep, this is perfectly cromulent dynamically typed Python. It's not the
Roman's fault I'm packing us up to go to another empire.

> With type hints, we can declare in .__init__(), and assign in .check().
> Gives me the AttributeError I like, and pylint remains happy.  What do
> you think?
>

Sounds good to me in general, I already changed this for 2/3 of my other
uses of @property.

(I'm only reluctant because I don't really like that it's a "lie", but in
this case, without potentially significant rewrites, it's a reasonable type
band-aid. Once we're type checked, we can refactor more confidently if we
so desire, to make certain patterns less prominent or landmine-y.)

> Splitting .checked feels like a separate change, though.  I can't quite
> see why we need it.  Help me out: what problem does it solve?
>

Mechanically, I wanted to eliminate the Optional type from the members
field, but you have conditionals in the code that use the presence or
absence of this field as a query to determine if we had run check or not
yet.

So I did the stupidest possible thing and added a "checked" variable to
explicitly represent it.

> > This simplifies the typing from a cumbersome Optional[List[T]] to merely
> > a List[T], which is more pythonic: it is safe to iterate over an empty
> > list with "for x in []" whereas with an Optional[List[T]] you have to
> > rely on the more cumbersome "if L: for x in L: ..."
>
> Yes, but when L is None, it's *invalid*, and for i in L *should* fail
> when L is invalid.
>

Sure, but it's so invalid that it causes static typing errors.

You can't write "for x in None" in a way that makes mypy happy, None is not
iterable.

If you want to preserve the behavior of "iterating an empty collection is
an Assertion", you need a custom iterator that throws an exception when the
collection is empty. I can do that, if you'd like, but I think it's
actually fine to just allow the collection to be empty and to just catch
the error in check() with either an assertion (oops, something went wrong
and the list is empty, this should never happen) or a QAPISemError (oops,
you didn't specify any members, which is illegal.)

I'd prefer to catch this in check and just allow the iterator to permit
empty iterators at the callsite, knowing it'll never happen.

> I think the actual problem is something else.  By adding the type hint
> Optional[List[T]], the valid uses of L become type errors.  We really
> want L to be a List[T].  Doesn't mean we have to (or want to) make uses
> of invalid L "work".
>

I didn't think I did allow for invalid uses to work - if the list should
semantically never be empty, I think it's fine to enforce that in schema.py
during construction of the schema object and to assume all uses of "for x
in L: ..." are inherently valid.

> > RFC: are we guaranteed to have members here? can we just use "if
> > members" without adding the new field?
>
> I'm afraid I don't understand these questions.
>

I think you answered this one for me; I was asking if it was ever valid in
any circumstance to have an empty members list, but I think you've laid out
in your response that it isn't.

And I think with that knowledge I can simplify this patch, but don't quite
recall. (On my mobile again, please excuse my apparent laziness.)

> > Signed-off-by: John Snow <jsnow@redhat.com>
> > ---
> >  scripts/qapi/schema.py | 24 +++++++++++++++---------
> >  1 file changed, 15 insertions(+), 9 deletions(-)
> >
> > diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py
> > index 164d86c4064..200bc0730d6 100644
> > --- a/scripts/qapi/schema.py
> > +++ b/scripts/qapi/schema.py
> > @@ -18,7 +18,7 @@
> >  from collections import OrderedDict
> >  import os
> >  import re
> > -from typing import List, Optional
> > +from typing import List, Optional, cast
> >
> >  from .common import (
> >      POINTER_SUFFIX,
> > @@ -447,22 +447,24 @@ def __init__(self, name, info, doc, ifcond,
> features,
> >          self.base = None
> >          self.local_members = local_members
> >          self.variants = variants
> > -        self.members = None
> > +        self.members: List[QAPISchemaObjectTypeMember] = []
>
> Can we do
>
>            self.members: List[QAPISchemaObjectTypeMember]
>
> ?
>

Possibly, but also possibly I can just initialize it to an empty list.

> > +        self._checking = False
> >
> >      def check(self, schema):
> >          # This calls another type T's .check() exactly when the C
> >          # struct emitted by gen_object() contains that T's C struct
> >          # (pointers don't count).
> > -        if self.members is not None:
> > -            # A previous .check() completed: nothing to do
> > -            return
> > -        if self._checked:
> > +        if self._checking:
> >              # Recursed: C struct contains itself
> >              raise QAPISemError(self.info,
> >                                 "object %s contains itself" % self.name)
> > +        if self._checked:
> > +            # A previous .check() completed: nothing to do
> > +            return
> >
> > +        self._checking = True
> >          super().check(schema)
> > -        assert self._checked and self.members is None
> > +        assert self._checked and not self.members
>
> If we assign only in .check(), we can't read self.members to find out
> whether it's valid.  We could perhaps mess with .__dict__() instead.
> Not sure it's worthwhile.  Dumb down the assertion?
>

If I initialize to an empty list (and don't mess with the checked member)
maybe

assert self._checked and not self.members

would be perfectly acceptable.

> >
> >          seen = OrderedDict()
> >          if self._base_name:
> > @@ -479,13 +481,17 @@ def check(self, schema):
> >          for m in self.local_members:
> >              m.check(schema)
> >              m.check_clash(self.info, seen)
> > -        members = seen.values()
> > +
> > +        # check_clash is abstract, but local_members is asserted to be
> > +        # List[QAPISchemaObjectTypeMember]. Cast to the narrower type.
> > +        members = cast(List[QAPISchemaObjectTypeMember],
> list(seen.values()))
> >
> >          if self.variants:
> >              self.variants.check(schema, seen)
> >              self.variants.check_clash(self.info, seen)
> >
> > -        self.members = members  # mark completed
> > +        self.members = members
> > +        self._checking = False  # mark completed
> >
> >      # Check that the members of this type do not cause duplicate JSON
> members,
> >      # and update seen to track the members seen so far. Report any
> errors
>
>

Re: [PATCH 12/19] qapi/schema: split "checked" field into "checking" and "checked"

Posted by Markus Armbruster 10 months, 2 weeks ago

John Snow <jsnow@redhat.com> writes:

> On Wed, Nov 22, 2023, 9:02 AM Markus Armbruster <armbru@redhat.com> wrote:
>
>> John Snow <jsnow@redhat.com> writes:
>>
>> > Differentiate between "actively in the process of checking" and
>> > "checking has completed". This allows us to clean up the types of some
>> > internal fields such as QAPISchemaObjectType's members field which
>> > currently uses "None" as a canary for determining if check has
>> > completed.
>>
>> Certain members become valid only after .check().  Two ways to code
>> that:
>>
>> 1. Assign to such members only in .check().  If you try to use them
>> before .check(), AttributeError.  Nice.  Drawback: pylint is unhappy,
>> W0201 attribute-defined-outside-init.
>>
>
> Can be overcome by declaring the field in __init__, which satisfies both
> the linter and my developer usability sense (Classes should be easy to have
> their properties enumerated by looking in one well known place.)
>
>
>> 2. Assign None in .__init__(), and the real value in .check().  If you
>> try to use them before .check(), you get None, which hopefully leads to
>> an error.  Meh, but pylint is happy.
>>
>> I picked 2. because pylint's warning made me go "when in Rome..."
>>
>
> Yep, this is perfectly cromulent dynamically typed Python. It's not the
> Roman's fault I'm packing us up to go to another empire.
>
>
>> With type hints, we can declare in .__init__(), and assign in .check().
>> Gives me the AttributeError I like, and pylint remains happy.  What do
>> you think?
>>
>
> Sounds good to me in general, I already changed this for 2/3 of my other
> uses of @property.
>
> (I'm only reluctant because I don't really like that it's a "lie", but in
> this case, without potentially significant rewrites, it's a reasonable type
> band-aid. Once we're type checked, we can refactor more confidently if we
> so desire, to make certain patterns less prominent or landmine-y.)

The general problem is "attribute value is valid only after a state
transition" (here: .member is valid only after .check()).

We want to catch uses of the attribute before it becomes valid.

We want to keep pylint and mypy happy.

Solutions:

1. Initialize in .__init__() to some invalid value.  Set it to the valid
   value in .check().

1.a. Pick the "natural" invalid value: None

   How to catch: assert attribute value is valid (here: .members is not
   None).  Easy to forget.  Better: when the use will safely choke on
   the invalid value (here: elide for uses like for m in .members),
   catch is automatic.

   Pylint: fine.

   Mypy: adding None to the set of values changes the type from T to
   Optional[T].  Because of this, mypy commonly can't prove valid uses
   are valid.  Keeping it happy requires cluttering the code with
   assertions and such.  Meh.

   Note: catching invalid uses is a run time check.  Mypy won't.

1.b. Pick an invalid value of type T (here: [])

   How to catch: same as 1.a., except automatic catch is rare.  Meh.

   Pylint: fine.

   Mypy: fine.

2. Declare in .__init__() without initializing.  Initialize to valid
   value in .check()

   How to catch: always automatic.  Good, me want!

   Pylint: fine.

   Mypy: fine.

   Note: catching invalid uses is a run time check.  Mypy won't.

3. Express the state transition in the type system

   To catch invalid uses statically with mypy, we need to use different
   types before and after the state transition.  Feels possible.  Also
   feels ludicrously overengineered.

May I have 2., please?

>> Splitting .checked feels like a separate change, though.  I can't quite
>> see why we need it.  Help me out: what problem does it solve?
>>
>
> Mechanically, I wanted to eliminate the Optional type from the members
> field, but you have conditionals in the code that use the presence or
> absence of this field as a query to determine if we had run check or not
> yet.
>
> So I did the stupidest possible thing and added a "checked" variable to
> explicitly represent it.

If 2. complicates the existing "have we .check()ed?" code too much, then
adding such a variable may indeed be useful.

>> > This simplifies the typing from a cumbersome Optional[List[T]] to merely
>> > a List[T], which is more pythonic: it is safe to iterate over an empty
>> > list with "for x in []" whereas with an Optional[List[T]] you have to
>> > rely on the more cumbersome "if L: for x in L: ..."
>>
>> Yes, but when L is None, it's *invalid*, and for i in L *should* fail
>> when L is invalid.
>>
>
> Sure, but it's so invalid that it causes static typing errors.
>
> You can't write "for x in None" in a way that makes mypy happy, None is not
> iterable.

A variable that is declared, but not initialized (2. above) also not
iterable, and it does make mypy happy, doesn't it?

> If you want to preserve the behavior of "iterating an empty collection is
> an Assertion", you need a custom iterator that throws an exception when the
> collection is empty. I can do that, if you'd like, but I think it's
> actually fine to just allow the collection to be empty and to just catch
> the error in check() with either an assertion (oops, something went wrong
> and the list is empty, this should never happen) or a QAPISemError (oops,
> you didn't specify any members, which is illegal.)
>
> I'd prefer to catch this in check and just allow the iterator to permit
> empty iterators at the callsite, knowing it'll never happen.
>
>
>> I think the actual problem is something else.  By adding the type hint
>> Optional[List[T]], the valid uses of L become type errors.  We really
>> want L to be a List[T].  Doesn't mean we have to (or want to) make uses
>> of invalid L "work".
>>
>
> I didn't think I did allow for invalid uses to work - if the list should
> semantically never be empty, I think it's fine to enforce that in schema.py
> during construction of the schema object and to assume all uses of "for x
> in L: ..." are inherently valid.
>
>
>> > RFC: are we guaranteed to have members here? can we just use "if
>> > members" without adding the new field?
>>
>> I'm afraid I don't understand these questions.
>>
>
> I think you answered this one for me; I was asking if it was ever valid in
> any circumstance to have an empty members list, but I think you've laid out
> in your response that it isn't.
>
> And I think with that knowledge I can simplify this patch, but don't quite
> recall. (On my mobile again, please excuse my apparent laziness.)

Feel excused!

[...]