qapi: new sphinx qapi domain pre-requisites

[PATCH 04/20] qapi/parser: preserve indentation in QAPIDoc sections

Posted by John Snow 1 year, 9 months ago

Prior to this patch, a section like this:

@name: lorem ipsum
   dolor sit amet
     consectetur adipiscing elit

would be parsed as:

"lorem ipsum\ndolor sit amet\n  consectetur adipiscing elit"

We want to preserve the indentation for even the first body line so that
the entire block can be parsed directly as rST. This patch would now
parse that segment as:

"lorem ipsum\n   dolor sit amet\n     consectetur adipiscing elit"

This understandably breaks qapidoc.py; so a new function is added there
to re-dedent the text. Once the new generator is merged, this function
will not be needed any longer and can be dropped.

(I verified this patch changes absolutely nothing by comparing the
md5sums of the QMP ref html pages both before and after the change, so
it's certified inert. QAPI test output has been updated to reflect the
new strategy of preserving indents for rST.)

Signed-off-by: John Snow <jsnow@redhat.com>
---
 docs/sphinx/qapidoc.py         | 36 +++++++++++++++++++++++++++++-----
 scripts/qapi/parser.py         |  8 ++++++--
 tests/qapi-schema/doc-good.out | 32 +++++++++++++++---------------
 3 files changed, 53 insertions(+), 23 deletions(-)

diff --git a/docs/sphinx/qapidoc.py b/docs/sphinx/qapidoc.py
index 1655682d4c7..2e3ffcbafb7 100644
--- a/docs/sphinx/qapidoc.py
+++ b/docs/sphinx/qapidoc.py
@@ -26,6 +26,7 @@
 
 import os
 import re
+import textwrap
 
 from docutils import nodes
 from docutils.parsers.rst import Directive, directives
@@ -51,6 +52,28 @@
 __version__ = "1.0"
 
 
+def dedent(text: str) -> str:
+    # Temporary: In service of the new QAPI domain, the QAPI doc parser
+    # now preserves indents in args/members/features text. QAPIDoc does
+    # not handle this well, so undo that change here.
+
+    # QAPIDoc is being rewritten and will be replaced soon,
+    # but this function is here in the interim as transition glue.
+
+    lines = text.splitlines(True)
+    if len(lines) > 1:
+        if re.match(r"\s+", lines[0]):
+            # First line is indented; description started on
+            # the line after the name. dedent the whole block.
+            return textwrap.dedent(text)
+        else:
+            # Descr started on same line. Dedent line 2+.
+            return lines[0] + textwrap.dedent("".join(lines[1:]))
+    else:
+        # Descr was a single line; dedent entire line.
+        return textwrap.dedent(text)
+
+
 # fmt: off
 # Function borrowed from pydash, which is under the MIT license
 def intersperse(iterable, separator):
@@ -169,7 +192,7 @@ def _nodes_for_members(self, doc, what, base=None, branches=None):
             term = self._nodes_for_one_member(section.member)
             # TODO drop fallbacks when undocumented members are outlawed
             if section.text:
-                defn = section.text
+                defn = dedent(section.text)
             else:
                 defn = [nodes.Text('Not documented')]
 
@@ -207,7 +230,7 @@ def _nodes_for_enum_values(self, doc):
                 termtext.extend(self._nodes_for_ifcond(section.member.ifcond))
             # TODO drop fallbacks when undocumented members are outlawed
             if section.text:
-                defn = section.text
+                defn = dedent(section.text)
             else:
                 defn = [nodes.Text('Not documented')]
 
@@ -242,7 +265,7 @@ def _nodes_for_features(self, doc):
         dlnode = nodes.definition_list()
         for section in doc.features.values():
             dlnode += self._make_dlitem(
-                [nodes.literal('', section.member.name)], section.text)
+                [nodes.literal('', section.member.name)], dedent(section.text))
             seen_item = True
 
         if not seen_item:
@@ -265,9 +288,12 @@ def _nodes_for_sections(self, doc):
                 continue
             snode = self._make_section(section.tag)
             if section.tag and section.tag.startswith('Example'):
-                snode += self._nodes_for_example(section.text)
+                snode += self._nodes_for_example(dedent(section.text))
             else:
-                self._parse_text_into_node(section.text, snode)
+                self._parse_text_into_node(
+                    dedent(section.text) if section.tag else section.text,
+                    snode,
+                )
             nodelist.append(snode)
         return nodelist
 
diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
index 7b13a583ac1..8cdd5334ec6 100644
--- a/scripts/qapi/parser.py
+++ b/scripts/qapi/parser.py
@@ -448,7 +448,10 @@ def get_doc_indented(self, doc: 'QAPIDoc') -> Optional[str]:
         indent = must_match(r'\s*', line).end()
         if not indent:
             return line
-        doc.append_line(line[indent:])
+
+        # Preserve the indent, it's needed for rST formatting.
+        doc.append_line(line)
+
         prev_line_blank = False
         while True:
             self.accept(False)
@@ -465,7 +468,8 @@ def get_doc_indented(self, doc: 'QAPIDoc') -> Optional[str]:
                     self,
                     "unexpected de-indent (expected at least %d spaces)" %
                     indent)
-            doc.append_line(line[indent:])
+            # Again, preserve the indent for ReST.
+            doc.append_line(line)
             prev_line_blank = True
 
     def get_doc_paragraph(self, doc: 'QAPIDoc') -> Optional[str]:
diff --git a/tests/qapi-schema/doc-good.out b/tests/qapi-schema/doc-good.out
index 716a9a41026..435f6e6d768 100644
--- a/tests/qapi-schema/doc-good.out
+++ b/tests/qapi-schema/doc-good.out
@@ -117,8 +117,8 @@ doc symbol=Base
     body=
 
     arg=base1
-description starts on a new line,
-minimally indented
+ description starts on a new line,
+ minimally indented
 doc symbol=Variant1
     body=
 A paragraph
@@ -145,8 +145,8 @@ doc symbol=Alternate
 
     arg=i
 description starts on the same line
-remainder indented the same
-@b is undocumented
+    remainder indented the same
+    @b is undocumented
     arg=b
 
     feature=alt-feat
@@ -158,11 +158,11 @@ doc symbol=cmd
     body=
 
     arg=arg1
-description starts on a new line,
-indented
+    description starts on a new line,
+    indented
     arg=arg2
 description starts on the same line
-remainder indented differently
+    remainder indented differently
     arg=arg3
 
     feature=cmd-feat1
@@ -178,16 +178,16 @@ some
     section=TODO
 frobnicate
     section=Notes
-- Lorem ipsum dolor sit amet
-- Ut enim ad minim veniam
+ - Lorem ipsum dolor sit amet
+ - Ut enim ad minim veniam
 
-Duis aute irure dolor
+ Duis aute irure dolor
     section=Example
--> in
-<- out
+ -> in
+ <- out
     section=Examples
-- *verbatim*
-- {braces}
+ - *verbatim*
+ - {braces}
     section=Since
 2.10
 doc symbol=cmd-boxed
@@ -198,9 +198,9 @@ a feature
     feature=cmd-feat2
 another feature
     section=Example
--> in
+ -> in
 
-<- out
+ <- out
 doc symbol=EVT_BOXED
     body=
 
-- 
2.44.0

Re: [PATCH 04/20] qapi/parser: preserve indentation in QAPIDoc sections

Posted by Markus Armbruster 1 year, 8 months ago

John Snow <jsnow@redhat.com> writes:

> Prior to this patch, a section like this:
>
> @name: lorem ipsum
>    dolor sit amet
>      consectetur adipiscing elit
>
> would be parsed as:
>
> "lorem ipsum\ndolor sit amet\n  consectetur adipiscing elit"
>
> We want to preserve the indentation for even the first body line so that
> the entire block can be parsed directly as rST. This patch would now
> parse that segment as:
>
> "lorem ipsum\n   dolor sit amet\n     consectetur adipiscing elit"

I'm afraid it's less than clear *why* we want to parse the entire block
directly as rST.  I have just enough insight into what you've built on
top of this series to hazard a guess.  Bear with me while I try to
explain it.

We first parse the doc comment with parser.py into an internal
representation.  The structural parts become objects, and the remainder
becomes text attributes of these objects.  Currently, parser.py
carefully adjusts indentation for these text attributes.  Why?  I'll get
to that.

For your example, parser.py creates an ArgSection object, and sets its
@text member to

    "lorem ipsum\ndolor sit amet\n  consectetur adipiscing elit"

Printing this string gives us

    lorem ipsum
    dolor sit amet
      consectetur adipiscing elit

qapidoc.py then transforms parser.py's IR into Sphinx IR.  The objects
become (more complicated) Sphinx objects, and their text attributes get
re-parsed as rST text into still more Sphinx objects.

This re-parse rejects your example with "Unexpected indentation."

Let me use a slightly different one:

    # @name: lorem ipsum
    #    dolor sit amet
    #    consectetur adipiscing elit

Results in this @text member

    lorem ipsum
    dolor sit amet
    consectetur adipiscing elit

which is re-parsed as paragraph, i.e. exactly what we want.

> This understandably breaks qapidoc.py;

Without indentation adjustment, we'd get

    lorem ipsum
       dolor sit amet
       consectetur adipiscing elit

which would be re-parsed as a definition list, I guess.  This isn't what
we want.

>                                        so a new function is added there
> to re-dedent the text.

Your patch moves the indentation adjustment to another place.  No
functional change.

You move it so you can branch off your new rendering pipeline before the
indentation adjustment, because ...

>                        Once the new generator is merged, this function
> will not be needed any longer and can be dropped.

... yours doesn't want it.

I believe it doesn't want it, because it generates rST (with a QAPI
extension) instead of Sphinx objects.  For your example, something like

    :arg name: lorem ipsum
       dolor sit amet
         consectetur adipiscing elit

For mine:

    :arg name: lorem ipsum
       dolor sit amet
       consectetur adipiscing elit

Fair?

The transition from the doc comment to (extended) rST is straightforward
for these examples.

I hope it'll be as straightforward (and thus predictable) in other
cases, too.

> (I verified this patch changes absolutely nothing by comparing the
> md5sums of the QMP ref html pages both before and after the change, so
> it's certified inert. QAPI test output has been updated to reflect the
> new strategy of preserving indents for rST.)
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  docs/sphinx/qapidoc.py         | 36 +++++++++++++++++++++++++++++-----
>  scripts/qapi/parser.py         |  8 ++++++--
>  tests/qapi-schema/doc-good.out | 32 +++++++++++++++---------------
>  3 files changed, 53 insertions(+), 23 deletions(-)
>
> diff --git a/docs/sphinx/qapidoc.py b/docs/sphinx/qapidoc.py
> index 1655682d4c7..2e3ffcbafb7 100644
> --- a/docs/sphinx/qapidoc.py
> +++ b/docs/sphinx/qapidoc.py
> @@ -26,6 +26,7 @@
>  
>  import os
>  import re
> +import textwrap
>  
>  from docutils import nodes
>  from docutils.parsers.rst import Directive, directives
> @@ -51,6 +52,28 @@
>  __version__ = "1.0"
>  
>  
> +def dedent(text: str) -> str:
> +    # Temporary: In service of the new QAPI domain, the QAPI doc parser
> +    # now preserves indents in args/members/features text. QAPIDoc does
> +    # not handle this well, so undo that change here.
> +
> +    # QAPIDoc is being rewritten and will be replaced soon,
> +    # but this function is here in the interim as transition glue.

I'm not sure we need the comment.

> +
> +    lines = text.splitlines(True)
> +    if len(lines) > 1:
> +        if re.match(r"\s+", lines[0]):
> +            # First line is indented; description started on
> +            # the line after the name. dedent the whole block.
> +            return textwrap.dedent(text)
> +        else:

pylint gripes

    docs/sphinx/qapidoc.py:65:8: R1705: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it (no-else-return)

> +            # Descr started on same line. Dedent line 2+.
> +            return lines[0] + textwrap.dedent("".join(lines[1:]))
> +    else:
> +        # Descr was a single line; dedent entire line.
> +        return textwrap.dedent(text)
> +
> +
>  # fmt: off
>  # Function borrowed from pydash, which is under the MIT license
>  def intersperse(iterable, separator):
> @@ -169,7 +192,7 @@ def _nodes_for_members(self, doc, what, base=None, branches=None):
>              term = self._nodes_for_one_member(section.member)
>              # TODO drop fallbacks when undocumented members are outlawed
>              if section.text:
> -                defn = section.text
> +                defn = dedent(section.text)
>              else:
>                  defn = [nodes.Text('Not documented')]
>  
> @@ -207,7 +230,7 @@ def _nodes_for_enum_values(self, doc):
>                  termtext.extend(self._nodes_for_ifcond(section.member.ifcond))
>              # TODO drop fallbacks when undocumented members are outlawed
>              if section.text:
> -                defn = section.text
> +                defn = dedent(section.text)
>              else:
>                  defn = [nodes.Text('Not documented')]
>  
> @@ -242,7 +265,7 @@ def _nodes_for_features(self, doc):
>          dlnode = nodes.definition_list()
>          for section in doc.features.values():
>              dlnode += self._make_dlitem(
> -                [nodes.literal('', section.member.name)], section.text)
> +                [nodes.literal('', section.member.name)], dedent(section.text))
>              seen_item = True
>  
>          if not seen_item:
> @@ -265,9 +288,12 @@ def _nodes_for_sections(self, doc):
>                  continue
>              snode = self._make_section(section.tag)
>              if section.tag and section.tag.startswith('Example'):
> -                snode += self._nodes_for_example(section.text)
> +                snode += self._nodes_for_example(dedent(section.text))
>              else:
> -                self._parse_text_into_node(section.text, snode)
> +                self._parse_text_into_node(
> +                    dedent(section.text) if section.tag else section.text,
> +                    snode,
> +                )
>              nodelist.append(snode)
>          return nodelist
>  
> diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
> index 7b13a583ac1..8cdd5334ec6 100644
> --- a/scripts/qapi/parser.py
> +++ b/scripts/qapi/parser.py
> @@ -448,7 +448,10 @@ def get_doc_indented(self, doc: 'QAPIDoc') -> Optional[str]:
>          indent = must_match(r'\s*', line).end()
>          if not indent:
>              return line
> -        doc.append_line(line[indent:])
> +
> +        # Preserve the indent, it's needed for rST formatting.

I'm not sure we need the comment.

> +        doc.append_line(line)
> +
>          prev_line_blank = False
>          while True:
>              self.accept(False)
> @@ -465,7 +468,8 @@ def get_doc_indented(self, doc: 'QAPIDoc') -> Optional[str]:
>                      self,
>                      "unexpected de-indent (expected at least %d spaces)" %
>                      indent)
> -            doc.append_line(line[indent:])
> +            # Again, preserve the indent for ReST.

Likewise.

If you want to document the fact that .get_doc_indented() preserves
indentation, a function comment or doc string would the proper place.

> +            doc.append_line(line)
>              prev_line_blank = True
>  
>      def get_doc_paragraph(self, doc: 'QAPIDoc') -> Optional[str]:

Correctness argument:

0. This patch merely moves an indentation adjustment from parser.py to
   qapidoc.py.

   Checks out: you remove indentation adjustment in parser.py, add it in
   qapidoc.py, and that's it.

1. The new indentation adjuster in qapidoc.py behaves just like the old
   one in parser.py did.

   parser.py's .get_doc_indented() is used to parse what follows certain
   sections' first line.  It's users store the text on this first line,
   if any, with leading whitespace stripped, then call
   .get_doc_indented() to eat the section's remaining lines.  All
   non-blank lines eaten must be indented at least as much as the first
   non-blank line.  .get_doc_indented() appends the lines eaten to the
   stored text with the first non-blank line's indentation stripped from
   all the non-blank lines.

   Your patch drops this stripping of indentation from non-first lines.
   This is what must now be done in qapidoc.py.

   If the section's first line has no text, the first non-blank line's
   indentation must be stripped from all non-blank lines.

   If the section's first line has text, the next non-blank line's
   indentation must be stripped from all lines but the first.

   How can qapidoc.py detect whether the section's first line has text?
   Fortunately, such a line will be unindented, i.e. start with a
   non-blank character, and all remaining lines will be blank or
   indented.

   qapidoc.py's dedent() seems to remove indentation common to all
   non-blank lines.  Except when there are multiple lines, and the first
   one is not indented, it removes common indentation from the remaining
   lines.

   Common indentation works, because all lines are indented at least as
   much as the one whose indentation we want to remove.

   The conditional doesn't quite match the "if the section's first line"
   above.  This simpler dedent() does, and also works in my testing:

def dedent(text: str) -> str:
    lines = text.splitlines(True)
    if re.match(r"\s+", lines[0]):
        # First line is indented; description started on
        # the line after the name. dedent the whole block.
        return textwrap.dedent(text)
    # Descr started on same line. Dedent line 2+.
    return lines[0] + textwrap.dedent("".join(lines[1:]))

   What do you think?

2. The new one is applied exactly when the old one was.

   The old one is applied to the sections starting with @FOO: and the
   tagged sections (Returns:, Errors:, ...).

   The new one is applied in ._nodes_for_members(),
   ._nodes_for_enum_values(), _nodes_for_features(), and
   ._nodes_for_sections().

   It is not applied to the text of untagged sections, including the
   body section.

   Good.

> diff --git a/tests/qapi-schema/doc-good.out b/tests/qapi-schema/doc-good.out
> index 716a9a41026..435f6e6d768 100644
> --- a/tests/qapi-schema/doc-good.out
> +++ b/tests/qapi-schema/doc-good.out
> @@ -117,8 +117,8 @@ doc symbol=Base
>      body=
>  
>      arg=base1
> -description starts on a new line,
> -minimally indented
> + description starts on a new line,
> + minimally indented
>  doc symbol=Variant1
>      body=
>  A paragraph
> @@ -145,8 +145,8 @@ doc symbol=Alternate
>  
>      arg=i
>  description starts on the same line
> -remainder indented the same
> -@b is undocumented
> +    remainder indented the same
> +    @b is undocumented
>      arg=b
>  
>      feature=alt-feat
> @@ -158,11 +158,11 @@ doc symbol=cmd
>      body=
>  
>      arg=arg1
> -description starts on a new line,
> -indented
> +    description starts on a new line,
> +    indented
>      arg=arg2
>  description starts on the same line
> -remainder indented differently
> +    remainder indented differently
>      arg=arg3
>  
>      feature=cmd-feat1
> @@ -178,16 +178,16 @@ some
>      section=TODO
>  frobnicate
>      section=Notes
> -- Lorem ipsum dolor sit amet
> -- Ut enim ad minim veniam
> + - Lorem ipsum dolor sit amet
> + - Ut enim ad minim veniam
>  
> -Duis aute irure dolor
> + Duis aute irure dolor
>      section=Example
> --> in
> -<- out
> + -> in
> + <- out
>      section=Examples
> -- *verbatim*
> -- {braces}
> + - *verbatim*
> + - {braces}
>      section=Since
>  2.10
>  doc symbol=cmd-boxed
> @@ -198,9 +198,9 @@ a feature
>      feature=cmd-feat2
>  another feature
>      section=Example
> --> in
> + -> in
>  
> -<- out
> + <- out
>  doc symbol=EVT_BOXED
>      body=

The indentation change is nicely visible here.

Re: [PATCH 04/20] qapi/parser: preserve indentation in QAPIDoc sections

Posted by John Snow 1 year, 8 months ago

On Wed, May 15, 2024, 7:50 AM Markus Armbruster <armbru@redhat.com> wrote:

> John Snow <jsnow@redhat.com> writes:
>
> > Prior to this patch, a section like this:
> >
> > @name: lorem ipsum
> >    dolor sit amet
> >      consectetur adipiscing elit
> >
> > would be parsed as:
> >
> > "lorem ipsum\ndolor sit amet\n  consectetur adipiscing elit"
> >
> > We want to preserve the indentation for even the first body line so that
> > the entire block can be parsed directly as rST. This patch would now
> > parse that segment as:
> >
> > "lorem ipsum\n   dolor sit amet\n     consectetur adipiscing elit"
>
> I'm afraid it's less than clear *why* we want to parse the entire block
> directly as rST.  I have just enough insight into what you've built on
> top of this series to hazard a guess.  Bear with me while I try to
> explain it.
>

My own summary: qapidoc expects a paragraph, the new generator expects a
block.


> We first parse the doc comment with parser.py into an internal
> representation.  The structural parts become objects, and the remainder
> becomes text attributes of these objects.  Currently, parser.py
> carefully adjusts indentation for these text attributes.  Why?  I'll get
> to that.
>
> For your example, parser.py creates an ArgSection object, and sets its
> @text member to
>
>     "lorem ipsum\ndolor sit amet\n  consectetur adipiscing elit"
>
> Printing this string gives us
>
>     lorem ipsum
>     dolor sit amet
>       consectetur adipiscing elit
>
> qapidoc.py then transforms parser.py's IR into Sphinx IR.  The objects
> become (more complicated) Sphinx objects, and their text attributes get
> re-parsed as rST text into still more Sphinx objects.
>
> This re-parse rejects your example with "Unexpected indentation."
>

Specifically, it'd be an unexpected *unindent*; the indent lacking on the
first *two* lines is the problem.


> Let me use a slightly different one:
>
>     # @name: lorem ipsum
>     #    dolor sit amet
>     #    consectetur adipiscing elit
>
> Results in this @text member
>
>     lorem ipsum
>     dolor sit amet
>     consectetur adipiscing elit
>
> which is re-parsed as paragraph, i.e. exactly what we want.
>

It's what we used to want, anyway.


> > This understandably breaks qapidoc.py;
>
> Without indentation adjustment, we'd get
>
>     lorem ipsum
>        dolor sit amet
>        consectetur adipiscing elit
>
> which would be re-parsed as a definition list, I guess.  This isn't what
> we want.
>
> >                                        so a new function is added there
> > to re-dedent the text.
>
> Your patch moves the indentation adjustment to another place.  No
> functional change.
>
> You move it so you can branch off your new rendering pipeline before the
> indentation adjustment, because ...
>
> >                        Once the new generator is merged, this function
> > will not be needed any longer and can be dropped.
>
> ... yours doesn't want it.
>
> I believe it doesn't want it, because it generates rST (with a QAPI
> extension) instead of Sphinx objects.  For your example, something like
>
>     :arg name: lorem ipsum
>        dolor sit amet
>          consectetur adipiscing elit
>
> For mine:
>
>     :arg name: lorem ipsum
>        dolor sit amet
>        consectetur adipiscing elit
>
> Fair?
>

Not quite;

Old parsing, new generator:

:arg type name: lorem ipsum
dolor sit amet
  consectetur apidiscing elit

This is wrong - continuations of a field list must be indented. Unlike
paragraphs, we want indents to "keep the block".

New parsing, new generator:

:arg type name: lorem ipsum
   dolor sit amet
     consectetur apidiscing elit

indent is preserved, maintaining the block-level element.

I don't have to re-add indents and any nested block elements will be
preserved correctly. i.e. you can use code examples, nested lists, etc. in
argument definitions.

The goal here was "Do not treat this as a paragraph, treat it directly as
rST and do not modify it needlessly."

It's a lot simpler than trying to manage the indent and injecting spaces
manually - and adding a temporary dedent to scheduled-for-demolition code
seemed the nicer place to add the hack.


> The transition from the doc comment to (extended) rST is straightforward
> for these examples.
>
> I hope it'll be as straightforward (and thus predictable) in other
> cases, too.
>
> > (I verified this patch changes absolutely nothing by comparing the
> > md5sums of the QMP ref html pages both before and after the change, so
> > it's certified inert. QAPI test output has been updated to reflect the
> > new strategy of preserving indents for rST.)
> >
> > Signed-off-by: John Snow <jsnow@redhat.com>
> > ---
> >  docs/sphinx/qapidoc.py         | 36 +++++++++++++++++++++++++++++-----
> >  scripts/qapi/parser.py         |  8 ++++++--
> >  tests/qapi-schema/doc-good.out | 32 +++++++++++++++---------------
> >  3 files changed, 53 insertions(+), 23 deletions(-)
> >
> > diff --git a/docs/sphinx/qapidoc.py b/docs/sphinx/qapidoc.py
> > index 1655682d4c7..2e3ffcbafb7 100644
> > --- a/docs/sphinx/qapidoc.py
> > +++ b/docs/sphinx/qapidoc.py
> > @@ -26,6 +26,7 @@
> >
> >  import os
> >  import re
> > +import textwrap
> >
> >  from docutils import nodes
> >  from docutils.parsers.rst import Directive, directives
> > @@ -51,6 +52,28 @@
> >  __version__ = "1.0"
> >
> >
> > +def dedent(text: str) -> str:
> > +    # Temporary: In service of the new QAPI domain, the QAPI doc parser
> > +    # now preserves indents in args/members/features text. QAPIDoc does
> > +    # not handle this well, so undo that change here.
> > +
> > +    # QAPIDoc is being rewritten and will be replaced soon,
> > +    # but this function is here in the interim as transition glue.
>
> I'm not sure we need the comment.
>

OK. Guess I'd rather overcomment than undercomment... easier to delete than
add :)


> > +
> > +    lines = text.splitlines(True)
> > +    if len(lines) > 1:
> > +        if re.match(r"\s+", lines[0]):
> > +            # First line is indented; description started on
> > +            # the line after the name. dedent the whole block.
> > +            return textwrap.dedent(text)
> > +        else:
>
> pylint gripes
>
>     docs/sphinx/qapidoc.py:65:8: R1705: Unnecessary "else" after "return",
> remove the "else" and de-indent the code inside it (no-else-return)
>

Interesting. What pylint version?


> > +            # Descr started on same line. Dedent line 2+.
> > +            return lines[0] + textwrap.dedent("".join(lines[1:]))
> > +    else:
> > +        # Descr was a single line; dedent entire line.
> > +        return textwrap.dedent(text)
> > +
> > +
> >  # fmt: off
> >  # Function borrowed from pydash, which is under the MIT license
> >  def intersperse(iterable, separator):
> > @@ -169,7 +192,7 @@ def _nodes_for_members(self, doc, what, base=None,
> branches=None):
> >              term = self._nodes_for_one_member(section.member)
> >              # TODO drop fallbacks when undocumented members are outlawed
> >              if section.text:
> > -                defn = section.text
> > +                defn = dedent(section.text)
> >              else:
> >                  defn = [nodes.Text('Not documented')]
> >
> > @@ -207,7 +230,7 @@ def _nodes_for_enum_values(self, doc):
> >
> termtext.extend(self._nodes_for_ifcond(section.member.ifcond))
> >              # TODO drop fallbacks when undocumented members are outlawed
> >              if section.text:
> > -                defn = section.text
> > +                defn = dedent(section.text)
> >              else:
> >                  defn = [nodes.Text('Not documented')]
> >
> > @@ -242,7 +265,7 @@ def _nodes_for_features(self, doc):
> >          dlnode = nodes.definition_list()
> >          for section in doc.features.values():
> >              dlnode += self._make_dlitem(
> > -                [nodes.literal('', section.member.name)], section.text)
> > +                [nodes.literal('', section.member.name)],
> dedent(section.text))
> >              seen_item = True
> >
> >          if not seen_item:
> > @@ -265,9 +288,12 @@ def _nodes_for_sections(self, doc):
> >                  continue
> >              snode = self._make_section(section.tag)
> >              if section.tag and section.tag.startswith('Example'):
> > -                snode += self._nodes_for_example(section.text)
> > +                snode += self._nodes_for_example(dedent(section.text))
> >              else:
> > -                self._parse_text_into_node(section.text, snode)
> > +                self._parse_text_into_node(
> > +                    dedent(section.text) if section.tag else
> section.text,
> > +                    snode,
> > +                )
> >              nodelist.append(snode)
> >          return nodelist
> >
> > diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
> > index 7b13a583ac1..8cdd5334ec6 100644
> > --- a/scripts/qapi/parser.py
> > +++ b/scripts/qapi/parser.py
> > @@ -448,7 +448,10 @@ def get_doc_indented(self, doc: 'QAPIDoc') ->
> Optional[str]:
> >          indent = must_match(r'\s*', line).end()
> >          if not indent:
> >              return line
> > -        doc.append_line(line[indent:])
> > +
> > +        # Preserve the indent, it's needed for rST formatting.
>
> I'm not sure we need the comment.
>

OK.


> > +        doc.append_line(line)
> > +
> >          prev_line_blank = False
> >          while True:
> >              self.accept(False)
> > @@ -465,7 +468,8 @@ def get_doc_indented(self, doc: 'QAPIDoc') ->
> Optional[str]:
> >                      self,
> >                      "unexpected de-indent (expected at least %d
> spaces)" %
> >                      indent)
> > -            doc.append_line(line[indent:])
> > +            # Again, preserve the indent for ReST.
>
> Likewise.
>
> If you want to document the fact that .get_doc_indented() preserves
> indentation, a function comment or doc string would the proper place.
>

Got it.


> > +            doc.append_line(line)
> >              prev_line_blank = True
> >
> >      def get_doc_paragraph(self, doc: 'QAPIDoc') -> Optional[str]:
>
> Correctness argument:
>
> 0. This patch merely moves an indentation adjustment from parser.py to
>    qapidoc.py.
>
>    Checks out: you remove indentation adjustment in parser.py, add it in
>    qapidoc.py, and that's it.
>
> 1. The new indentation adjuster in qapidoc.py behaves just like the old
>    one in parser.py did.
>
>    parser.py's .get_doc_indented() is used to parse what follows certain
>    sections' first line.  It's users store the text on this first line,
>    if any, with leading whitespace stripped, then call
>    .get_doc_indented() to eat the section's remaining lines.  All
>    non-blank lines eaten must be indented at least as much as the first
>    non-blank line.  .get_doc_indented() appends the lines eaten to the
>    stored text with the first non-blank line's indentation stripped from
>    all the non-blank lines.
>
>    Your patch drops this stripping of indentation from non-first lines.
>    This is what must now be done in qapidoc.py.
>
>    If the section's first line has no text, the first non-blank line's
>    indentation must be stripped from all non-blank lines.
>
>    If the section's first line has text, the next non-blank line's
>    indentation must be stripped from all lines but the first.
>
>    How can qapidoc.py detect whether the section's first line has text?
>    Fortunately, such a line will be unindented, i.e. start with a
>    non-blank character, and all remaining lines will be blank or
>    indented.
>
>    qapidoc.py's dedent() seems to remove indentation common to all
>    non-blank lines.  Except when there are multiple lines, and the first
>    one is not indented, it removes common indentation from the remaining
>    lines.
>
>    Common indentation works, because all lines are indented at least as
>    much as the one whose indentation we want to remove.
>
>    The conditional doesn't quite match the "if the section's first line"
>    above.  This simpler dedent() does, and also works in my testing:
>
> def dedent(text: str) -> str:
>     lines = text.splitlines(True)
>     if re.match(r"\s+", lines[0]):
>         # First line is indented; description started on
>         # the line after the name. dedent the whole block.
>         return textwrap.dedent(text)
>     # Descr started on same line. Dedent line 2+.
>     return lines[0] + textwrap.dedent("".join(lines[1:]))
>
>    What do you think?
>

I try not to on most days.

I'll check it out, though in practice the generated documents were already
identical, so... I'll try yours and verify that's still true. If so, sure!


> 2. The new one is applied exactly when the old one was.
>
>    The old one is applied to the sections starting with @FOO: and the
>    tagged sections (Returns:, Errors:, ...).
>
>    The new one is applied in ._nodes_for_members(),
>    ._nodes_for_enum_values(), _nodes_for_features(), and
>    ._nodes_for_sections().
>
>    It is not applied to the text of untagged sections, including the
>    body section.
>
>    Good.
>
> > diff --git a/tests/qapi-schema/doc-good.out
> b/tests/qapi-schema/doc-good.out
> > index 716a9a41026..435f6e6d768 100644
> > --- a/tests/qapi-schema/doc-good.out
> > +++ b/tests/qapi-schema/doc-good.out
> > @@ -117,8 +117,8 @@ doc symbol=Base
> >      body=
> >
> >      arg=base1
> > -description starts on a new line,
> > -minimally indented
> > + description starts on a new line,
> > + minimally indented
> >  doc symbol=Variant1
> >      body=
> >  A paragraph
> > @@ -145,8 +145,8 @@ doc symbol=Alternate
> >
> >      arg=i
> >  description starts on the same line
> > -remainder indented the same
> > -@b is undocumented
> > +    remainder indented the same
> > +    @b is undocumented
> >      arg=b
> >
> >      feature=alt-feat
> > @@ -158,11 +158,11 @@ doc symbol=cmd
> >      body=
> >
> >      arg=arg1
> > -description starts on a new line,
> > -indented
> > +    description starts on a new line,
> > +    indented
> >      arg=arg2
> >  description starts on the same line
> > -remainder indented differently
> > +    remainder indented differently
> >      arg=arg3
> >
> >      feature=cmd-feat1
> > @@ -178,16 +178,16 @@ some
> >      section=TODO
> >  frobnicate
> >      section=Notes
> > -- Lorem ipsum dolor sit amet
> > -- Ut enim ad minim veniam
> > + - Lorem ipsum dolor sit amet
> > + - Ut enim ad minim veniam
> >
> > -Duis aute irure dolor
> > + Duis aute irure dolor
> >      section=Example
> > --> in
> > -<- out
> > + -> in
> > + <- out
> >      section=Examples
> > -- *verbatim*
> > -- {braces}
> > + - *verbatim*
> > + - {braces}
> >      section=Since
> >  2.10
> >  doc symbol=cmd-boxed
> > @@ -198,9 +198,9 @@ a feature
> >      feature=cmd-feat2
> >  another feature
> >      section=Example
> > --> in
> > + -> in
> >
> > -<- out
> > + <- out
> >  doc symbol=EVT_BOXED
> >      body=
>
> The indentation change is nicely visible here.
>
>

Re: [PATCH 04/20] qapi/parser: preserve indentation in QAPIDoc sections

Posted by Markus Armbruster 1 year, 8 months ago

John Snow <jsnow@redhat.com> writes:

> On Wed, May 15, 2024, 7:50 AM Markus Armbruster <armbru@redhat.com> wrote:
>
>> John Snow <jsnow@redhat.com> writes:
>>
>> > Prior to this patch, a section like this:
>> >
>> > @name: lorem ipsum
>> >    dolor sit amet
>> >      consectetur adipiscing elit
>> >
>> > would be parsed as:
>> >
>> > "lorem ipsum\ndolor sit amet\n  consectetur adipiscing elit"
>> >
>> > We want to preserve the indentation for even the first body line so that
>> > the entire block can be parsed directly as rST. This patch would now
>> > parse that segment as:
>> >
>> > "lorem ipsum\n   dolor sit amet\n     consectetur adipiscing elit"
>>
>> I'm afraid it's less than clear *why* we want to parse the entire block
>> directly as rST.  I have just enough insight into what you've built on
>> top of this series to hazard a guess.  Bear with me while I try to
>> explain it.
>>
>
> My own summary: qapidoc expects a paragraph, the new generator expects a
> block.
>
>
>> We first parse the doc comment with parser.py into an internal
>> representation.  The structural parts become objects, and the remainder
>> becomes text attributes of these objects.  Currently, parser.py
>> carefully adjusts indentation for these text attributes.  Why?  I'll get
>> to that.
>>
>> For your example, parser.py creates an ArgSection object, and sets its
>> @text member to
>>
>>     "lorem ipsum\ndolor sit amet\n  consectetur adipiscing elit"
>>
>> Printing this string gives us
>>
>>     lorem ipsum
>>     dolor sit amet
>>       consectetur adipiscing elit
>>
>> qapidoc.py then transforms parser.py's IR into Sphinx IR.  The objects
>> become (more complicated) Sphinx objects, and their text attributes get
>> re-parsed as rST text into still more Sphinx objects.
>>
>> This re-parse rejects your example with "Unexpected indentation."
>>
>
> Specifically, it'd be an unexpected *unindent*; the indent lacking on the
> first *two* lines is the problem.
>
>
>> Let me use a slightly different one:
>>
>>     # @name: lorem ipsum
>>     #    dolor sit amet
>>     #    consectetur adipiscing elit
>>
>> Results in this @text member
>>
>>     lorem ipsum
>>     dolor sit amet
>>     consectetur adipiscing elit
>>
>> which is re-parsed as paragraph, i.e. exactly what we want.
>>
>
> It's what we used to want, anyway.

Yes, I'm describing the current state here.

>> > This understandably breaks qapidoc.py;
>>
>> Without indentation adjustment, we'd get
>>
>>     lorem ipsum
>>        dolor sit amet
>>        consectetur adipiscing elit
>>
>> which would be re-parsed as a definition list, I guess.  This isn't what
>> we want.
>>
>> >                                        so a new function is added there
>> > to re-dedent the text.
>>
>> Your patch moves the indentation adjustment to another place.  No
>> functional change.
>>
>> You move it so you can branch off your new rendering pipeline before the
>> indentation adjustment, because ...
>>
>> >                        Once the new generator is merged, this function
>> > will not be needed any longer and can be dropped.
>>
>> ... yours doesn't want it.
>>
>> I believe it doesn't want it, because it generates rST (with a QAPI
>> extension) instead of Sphinx objects.  For your example, something like
>>
>>     :arg name: lorem ipsum
>>        dolor sit amet
>>          consectetur adipiscing elit
>>
>> For mine:
>>
>>     :arg name: lorem ipsum
>>        dolor sit amet
>>        consectetur adipiscing elit
>>
>> Fair?
>>
>
> Not quite;
>
> Old parsing, new generator:
>
> :arg type name: lorem ipsum
> dolor sit amet
>   consectetur apidiscing elit
>
> This is wrong - continuations of a field list must be indented. Unlike
> paragraphs, we want indents to "keep the block".
>
> New parsing, new generator:
>
> :arg type name: lorem ipsum
>    dolor sit amet
>      consectetur apidiscing elit
>
> indent is preserved, maintaining the block-level element.
>
> I don't have to re-add indents and any nested block elements will be
> preserved correctly. i.e. you can use code examples, nested lists, etc. in
> argument definitions.
>
> The goal here was "Do not treat this as a paragraph, treat it directly as
> rST and do not modify it needlessly."
>
> It's a lot simpler than trying to manage the indent and injecting spaces
> manually - and adding a temporary dedent to scheduled-for-demolition code
> seemed the nicer place to add the hack.

Understand.

A bit more rationale in the commit message would be nice.  Perhaps start
with current state ("we deintent"), then describe the patch ("move the
deindent"), then rationale "to get it out of the way of a new thingy I
wrote, and intend to post soonish", then "which will replace qapidoc.py
entirely".

>> The transition from the doc comment to (extended) rST is straightforward
>> for these examples.
>>
>> I hope it'll be as straightforward (and thus predictable) in other
>> cases, too.
>>
>> > (I verified this patch changes absolutely nothing by comparing the
>> > md5sums of the QMP ref html pages both before and after the change, so
>> > it's certified inert. QAPI test output has been updated to reflect the
>> > new strategy of preserving indents for rST.)
>> >
>> > Signed-off-by: John Snow <jsnow@redhat.com>
>> > ---
>> >  docs/sphinx/qapidoc.py         | 36 +++++++++++++++++++++++++++++-----
>> >  scripts/qapi/parser.py         |  8 ++++++--
>> >  tests/qapi-schema/doc-good.out | 32 +++++++++++++++---------------
>> >  3 files changed, 53 insertions(+), 23 deletions(-)
>> >
>> > diff --git a/docs/sphinx/qapidoc.py b/docs/sphinx/qapidoc.py
>> > index 1655682d4c7..2e3ffcbafb7 100644
>> > --- a/docs/sphinx/qapidoc.py
>> > +++ b/docs/sphinx/qapidoc.py
>> > @@ -26,6 +26,7 @@
>> >
>> >  import os
>> >  import re
>> > +import textwrap
>> >
>> >  from docutils import nodes
>> >  from docutils.parsers.rst import Directive, directives
>> > @@ -51,6 +52,28 @@
>> >  __version__ = "1.0"
>> >
>> >
>> > +def dedent(text: str) -> str:
>> > +    # Temporary: In service of the new QAPI domain, the QAPI doc parser
>> > +    # now preserves indents in args/members/features text. QAPIDoc does
>> > +    # not handle this well, so undo that change here.
>> > +
>> > +    # QAPIDoc is being rewritten and will be replaced soon,
>> > +    # but this function is here in the interim as transition glue.
>>
>> I'm not sure we need the comment.
>
> OK. Guess I'd rather overcomment than undercomment... easier to delete than
> add :)

Right :)

>> > +
>> > +    lines = text.splitlines(True)
>> > +    if len(lines) > 1:
>> > +        if re.match(r"\s+", lines[0]):
>> > +            # First line is indented; description started on
>> > +            # the line after the name. dedent the whole block.
>> > +            return textwrap.dedent(text)
>> > +        else:
>>
>> pylint gripes
>>
>>     docs/sphinx/qapidoc.py:65:8: R1705: Unnecessary "else" after "return",
>> remove the "else" and de-indent the code inside it (no-else-return)
>>
>
> Interesting. What pylint version?

Fedora 39

$ pylint --version
pylint 3.0.4
astroid 3.0.3
Python 3.12.3 (main, Apr 17 2024, 00:00:00) [GCC 13.2.1 20240316 (Red Hat 13.2.1-7)]

>> > +            # Descr started on same line. Dedent line 2+.
>> > +            return lines[0] + textwrap.dedent("".join(lines[1:]))
>> > +    else:
>> > +        # Descr was a single line; dedent entire line.
>> > +        return textwrap.dedent(text)
>> > +
>> > +
>> >  # fmt: off
>> >  # Function borrowed from pydash, which is under the MIT license
>> >  def intersperse(iterable, separator):
>> > @@ -169,7 +192,7 @@ def _nodes_for_members(self, doc, what, base=None, branches=None):
>> >              term = self._nodes_for_one_member(section.member)
>> >              # TODO drop fallbacks when undocumented members are outlawed
>> >              if section.text:
>> > -                defn = section.text
>> > +                defn = dedent(section.text)
>> >              else:
>> >                  defn = [nodes.Text('Not documented')]
>> >
>> > @@ -207,7 +230,7 @@ def _nodes_for_enum_values(self, doc):
>> >                  termtext.extend(self._nodes_for_ifcond(section.member.ifcond))
>> >              # TODO drop fallbacks when undocumented members are outlawed
>> >              if section.text:
>> > -                defn = section.text
>> > +                defn = dedent(section.text)
>> >              else:
>> >                  defn = [nodes.Text('Not documented')]
>> >
>> > @@ -242,7 +265,7 @@ def _nodes_for_features(self, doc):
>> >          dlnode = nodes.definition_list()
>> >          for section in doc.features.values():
>> >              dlnode += self._make_dlitem(
>> > -                [nodes.literal('', section.member.name)], section.text)
>> > +                [nodes.literal('', section.member.name)], dedent(section.text))
>> >              seen_item = True
>> >
>> >          if not seen_item:
>> > @@ -265,9 +288,12 @@ def _nodes_for_sections(self, doc):
>> >                  continue
>> >              snode = self._make_section(section.tag)
>> >              if section.tag and section.tag.startswith('Example'):
>> > -                snode += self._nodes_for_example(section.text)
>> > +                snode += self._nodes_for_example(dedent(section.text))
>> >              else:
>> > -                self._parse_text_into_node(section.text, snode)
>> > +                self._parse_text_into_node(
>> > +                    dedent(section.text) if section.tag else section.text,
>> > +                    snode,
>> > +                )
>> >              nodelist.append(snode)
>> >          return nodelist
>> >
>> > diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
>> > index 7b13a583ac1..8cdd5334ec6 100644
>> > --- a/scripts/qapi/parser.py
>> > +++ b/scripts/qapi/parser.py
>> > @@ -448,7 +448,10 @@ def get_doc_indented(self, doc: 'QAPIDoc') -> Optional[str]:
>> >          indent = must_match(r'\s*', line).end()
>> >          if not indent:
>> >              return line
>> > -        doc.append_line(line[indent:])
>> > +
>> > +        # Preserve the indent, it's needed for rST formatting.
>>
>> I'm not sure we need the comment.
>>
>
> OK.
>
>
>> > +        doc.append_line(line)
>> > +
>> >          prev_line_blank = False
>> >          while True:
>> >              self.accept(False)
>> > @@ -465,7 +468,8 @@ def get_doc_indented(self, doc: 'QAPIDoc') -> Optional[str]:
>> >                      self,
>> >                      "unexpected de-indent (expected at least %d spaces)" %
>> >                      indent)
>> > -            doc.append_line(line[indent:])
>> > +            # Again, preserve the indent for ReST.
>>
>> Likewise.
>>
>> If you want to document the fact that .get_doc_indented() preserves
>> indentation, a function comment or doc string would the proper place.
>>
>
> Got it.
>
>
>> > +            doc.append_line(line)
>> >              prev_line_blank = True
>> >
>> >      def get_doc_paragraph(self, doc: 'QAPIDoc') -> Optional[str]:
>>
>> Correctness argument:
>>
>> 0. This patch merely moves an indentation adjustment from parser.py to
>>    qapidoc.py.
>>
>>    Checks out: you remove indentation adjustment in parser.py, add it in
>>    qapidoc.py, and that's it.
>>
>> 1. The new indentation adjuster in qapidoc.py behaves just like the old
>>    one in parser.py did.
>>
>>    parser.py's .get_doc_indented() is used to parse what follows certain
>>    sections' first line.  It's users store the text on this first line,
>>    if any, with leading whitespace stripped, then call
>>    .get_doc_indented() to eat the section's remaining lines.  All
>>    non-blank lines eaten must be indented at least as much as the first
>>    non-blank line.  .get_doc_indented() appends the lines eaten to the
>>    stored text with the first non-blank line's indentation stripped from
>>    all the non-blank lines.
>>
>>    Your patch drops this stripping of indentation from non-first lines.
>>    This is what must now be done in qapidoc.py.
>>
>>    If the section's first line has no text, the first non-blank line's
>>    indentation must be stripped from all non-blank lines.
>>
>>    If the section's first line has text, the next non-blank line's
>>    indentation must be stripped from all lines but the first.
>>
>>    How can qapidoc.py detect whether the section's first line has text?
>>    Fortunately, such a line will be unindented, i.e. start with a
>>    non-blank character, and all remaining lines will be blank or
>>    indented.
>>
>>    qapidoc.py's dedent() seems to remove indentation common to all
>>    non-blank lines.  Except when there are multiple lines, and the first
>>    one is not indented, it removes common indentation from the remaining
>>    lines.
>>
>>    Common indentation works, because all lines are indented at least as
>>    much as the one whose indentation we want to remove.
>>
>>    The conditional doesn't quite match the "if the section's first line"
>>    above.  This simpler dedent() does, and also works in my testing:
>>
>> def dedent(text: str) -> str:
>>     lines = text.splitlines(True)
>>     if re.match(r"\s+", lines[0]):
>>         # First line is indented; description started on
>>         # the line after the name. dedent the whole block.
>>         return textwrap.dedent(text)
>>     # Descr started on same line. Dedent line 2+.
>>     return lines[0] + textwrap.dedent("".join(lines[1:]))
>>
>>    What do you think?
>>
>
> I try not to on most days.

I always say "me thinking will cost you extra" ;)

> I'll check it out, though in practice the generated documents were already
> identical, so... I'll try yours and verify that's still true. If so, sure!
>
>
>> 2. The new one is applied exactly when the old one was.
>>
>>    The old one is applied to the sections starting with @FOO: and the
>>    tagged sections (Returns:, Errors:, ...).
>>
>>    The new one is applied in ._nodes_for_members(),
>>    ._nodes_for_enum_values(), _nodes_for_features(), and
>>    ._nodes_for_sections().
>>
>>    It is not applied to the text of untagged sections, including the
>>    body section.
>>
>>    Good.

Thanks!

[...]

Re: [PATCH 04/20] qapi/parser: preserve indentation in QAPIDoc sections

Posted by John Snow 1 year, 8 months ago

On Wed, May 15, 2024 at 10:18 AM Markus Armbruster <armbru@redhat.com>
wrote:

> John Snow <jsnow@redhat.com> writes:
>
> > On Wed, May 15, 2024, 7:50 AM Markus Armbruster <armbru@redhat.com>
> wrote:
> >
> >> John Snow <jsnow@redhat.com> writes:
> >>
> >> > Prior to this patch, a section like this:
> >> >
> >> > @name: lorem ipsum
> >> >    dolor sit amet
> >> >      consectetur adipiscing elit
> >> >
> >> > would be parsed as:
> >> >
> >> > "lorem ipsum\ndolor sit amet\n  consectetur adipiscing elit"
> >> >
> >> > We want to preserve the indentation for even the first body line so
> that
> >> > the entire block can be parsed directly as rST. This patch would now
> >> > parse that segment as:
> >> >
> >> > "lorem ipsum\n   dolor sit amet\n     consectetur adipiscing elit"
> >>
> >> I'm afraid it's less than clear *why* we want to parse the entire block
> >> directly as rST.  I have just enough insight into what you've built on
> >> top of this series to hazard a guess.  Bear with me while I try to
> >> explain it.
> >>
> >
> > My own summary: qapidoc expects a paragraph, the new generator expects a
> > block.
> >
> >
> >> We first parse the doc comment with parser.py into an internal
> >> representation.  The structural parts become objects, and the remainder
> >> becomes text attributes of these objects.  Currently, parser.py
> >> carefully adjusts indentation for these text attributes.  Why?  I'll get
> >> to that.
> >>
> >> For your example, parser.py creates an ArgSection object, and sets its
> >> @text member to
> >>
> >>     "lorem ipsum\ndolor sit amet\n  consectetur adipiscing elit"
> >>
> >> Printing this string gives us
> >>
> >>     lorem ipsum
> >>     dolor sit amet
> >>       consectetur adipiscing elit
> >>
> >> qapidoc.py then transforms parser.py's IR into Sphinx IR.  The objects
> >> become (more complicated) Sphinx objects, and their text attributes get
> >> re-parsed as rST text into still more Sphinx objects.
> >>
> >> This re-parse rejects your example with "Unexpected indentation."
> >>
> >
> > Specifically, it'd be an unexpected *unindent*; the indent lacking on the
> > first *two* lines is the problem.
> >
> >
> >> Let me use a slightly different one:
> >>
> >>     # @name: lorem ipsum
> >>     #    dolor sit amet
> >>     #    consectetur adipiscing elit
> >>
> >> Results in this @text member
> >>
> >>     lorem ipsum
> >>     dolor sit amet
> >>     consectetur adipiscing elit
> >>
> >> which is re-parsed as paragraph, i.e. exactly what we want.
> >>
> >
> > It's what we used to want, anyway.
>
> Yes, I'm describing the current state here.
>
> >> > This understandably breaks qapidoc.py;
> >>
> >> Without indentation adjustment, we'd get
> >>
> >>     lorem ipsum
> >>        dolor sit amet
> >>        consectetur adipiscing elit
> >>
> >> which would be re-parsed as a definition list, I guess.  This isn't what
> >> we want.
> >>
> >> >                                        so a new function is added
> there
> >> > to re-dedent the text.
> >>
> >> Your patch moves the indentation adjustment to another place.  No
> >> functional change.
> >>
> >> You move it so you can branch off your new rendering pipeline before the
> >> indentation adjustment, because ...
> >>
> >> >                        Once the new generator is merged, this function
> >> > will not be needed any longer and can be dropped.
> >>
> >> ... yours doesn't want it.
> >>
> >> I believe it doesn't want it, because it generates rST (with a QAPI
> >> extension) instead of Sphinx objects.  For your example, something like
> >>
> >>     :arg name: lorem ipsum
> >>        dolor sit amet
> >>          consectetur adipiscing elit
> >>
> >> For mine:
> >>
> >>     :arg name: lorem ipsum
> >>        dolor sit amet
> >>        consectetur adipiscing elit
> >>
> >> Fair?
> >>
> >
> > Not quite;
> >
> > Old parsing, new generator:
> >
> > :arg type name: lorem ipsum
> > dolor sit amet
> >   consectetur apidiscing elit
> >
> > This is wrong - continuations of a field list must be indented. Unlike
> > paragraphs, we want indents to "keep the block".
> >
> > New parsing, new generator:
> >
> > :arg type name: lorem ipsum
> >    dolor sit amet
> >      consectetur apidiscing elit
> >
> > indent is preserved, maintaining the block-level element.
> >
> > I don't have to re-add indents and any nested block elements will be
> > preserved correctly. i.e. you can use code examples, nested lists, etc.
> in
> > argument definitions.
> >
> > The goal here was "Do not treat this as a paragraph, treat it directly as
> > rST and do not modify it needlessly."
> >
> > It's a lot simpler than trying to manage the indent and injecting spaces
> > manually - and adding a temporary dedent to scheduled-for-demolition code
> > seemed the nicer place to add the hack.
>
> Understand.
>
> A bit more rationale in the commit message would be nice.  Perhaps start
> with current state ("we deintent"), then describe the patch ("move the
> deindent"), then rationale "to get it out of the way of a new thingy I
> wrote, and intend to post soonish", then "which will replace qapidoc.py
> entirely".
>

Updated with more info than you require. I hear that American third graders
get a free trip to Pizza Hut if they read at least 20 of my commit messages
in a school year.


>
> >> The transition from the doc comment to (extended) rST is straightforward
> >> for these examples.
> >>
> >> I hope it'll be as straightforward (and thus predictable) in other
> >> cases, too.
> >>
> >> > (I verified this patch changes absolutely nothing by comparing the
> >> > md5sums of the QMP ref html pages both before and after the change, so
> >> > it's certified inert. QAPI test output has been updated to reflect the
> >> > new strategy of preserving indents for rST.)
> >> >
> >> > Signed-off-by: John Snow <jsnow@redhat.com>
> >> > ---
> >> >  docs/sphinx/qapidoc.py         | 36
> +++++++++++++++++++++++++++++-----
> >> >  scripts/qapi/parser.py         |  8 ++++++--
> >> >  tests/qapi-schema/doc-good.out | 32 +++++++++++++++---------------
> >> >  3 files changed, 53 insertions(+), 23 deletions(-)
> >> >
> >> > diff --git a/docs/sphinx/qapidoc.py b/docs/sphinx/qapidoc.py
> >> > index 1655682d4c7..2e3ffcbafb7 100644
> >> > --- a/docs/sphinx/qapidoc.py
> >> > +++ b/docs/sphinx/qapidoc.py
> >> > @@ -26,6 +26,7 @@
> >> >
> >> >  import os
> >> >  import re
> >> > +import textwrap
> >> >
> >> >  from docutils import nodes
> >> >  from docutils.parsers.rst import Directive, directives
> >> > @@ -51,6 +52,28 @@
> >> >  __version__ = "1.0"
> >> >
> >> >
> >> > +def dedent(text: str) -> str:
> >> > +    # Temporary: In service of the new QAPI domain, the QAPI doc
> parser
> >> > +    # now preserves indents in args/members/features text. QAPIDoc
> does
> >> > +    # not handle this well, so undo that change here.
> >> > +
> >> > +    # QAPIDoc is being rewritten and will be replaced soon,
> >> > +    # but this function is here in the interim as transition glue.
> >>
> >> I'm not sure we need the comment.
> >
> > OK. Guess I'd rather overcomment than undercomment... easier to delete
> than
> > add :)
>
> Right :)
>
> >> > +
> >> > +    lines = text.splitlines(True)
> >> > +    if len(lines) > 1:
> >> > +        if re.match(r"\s+", lines[0]):
> >> > +            # First line is indented; description started on
> >> > +            # the line after the name. dedent the whole block.
> >> > +            return textwrap.dedent(text)
> >> > +        else:
> >>
> >> pylint gripes
> >>
> >>     docs/sphinx/qapidoc.py:65:8: R1705: Unnecessary "else" after
> "return",
> >> remove the "else" and de-indent the code inside it (no-else-return)
> >>
> >
> > Interesting. What pylint version?
>
> Fedora 39
>
> $ pylint --version
> pylint 3.0.4
> astroid 3.0.3
> Python 3.12.3 (main, Apr 17 2024, 00:00:00) [GCC 13.2.1 20240316 (Red Hat
> 13.2.1-7)]
>

Realized from the prior patch I was not actually running pylint on this
file, for good reason. Nonetheless, I did fix this particular gripe by
using your rewrite.

My current best attempt at reducing the noise:

*from inside the docs/sphinx directory*:

> PYTHONPATH=/home/jsnow/src/qemu/scripts/ pylint --rc-file
../../scripts/qapi/pylintrc qapidoc.py
************* Module qapidoc
qapidoc.py:45:0: C0103: Constant name "Use_SSI" doesn't conform to
UPPER_CASE naming style (invalid-name)
qapidoc.py:49:4: E0611: No name 'AutodocReporter' in module
'sphinx.ext.autodoc' (no-name-in-module)
qapidoc.py:77:10: R1708: Do not raise StopIteration in generator, use
return statement instead (stop-iteration-return)
qapidoc.py:567:11: R1735: Consider using '{"version": __version__,
"parallel_read_safe": True, "parallel_write_safe": True, ... }' instead of
a call to 'dict'. (use-dict-literal)

 Hm, actually, maybe this is tractable...


> >> > +            # Descr started on same line. Dedent line 2+.
> >> > +            return lines[0] + textwrap.dedent("".join(lines[1:]))
> >> > +    else:
> >> > +        # Descr was a single line; dedent entire line.
> >> > +        return textwrap.dedent(text)
> >> > +
> >> > +
> >> >  # fmt: off
> >> >  # Function borrowed from pydash, which is under the MIT license
> >> >  def intersperse(iterable, separator):
> >> > @@ -169,7 +192,7 @@ def _nodes_for_members(self, doc, what,
> base=None, branches=None):
> >> >              term = self._nodes_for_one_member(section.member)
> >> >              # TODO drop fallbacks when undocumented members are
> outlawed
> >> >              if section.text:
> >> > -                defn = section.text
> >> > +                defn = dedent(section.text)
> >> >              else:
> >> >                  defn = [nodes.Text('Not documented')]
> >> >
> >> > @@ -207,7 +230,7 @@ def _nodes_for_enum_values(self, doc):
> >> >
> termtext.extend(self._nodes_for_ifcond(section.member.ifcond))
> >> >              # TODO drop fallbacks when undocumented members are
> outlawed
> >> >              if section.text:
> >> > -                defn = section.text
> >> > +                defn = dedent(section.text)
> >> >              else:
> >> >                  defn = [nodes.Text('Not documented')]
> >> >
> >> > @@ -242,7 +265,7 @@ def _nodes_for_features(self, doc):
> >> >          dlnode = nodes.definition_list()
> >> >          for section in doc.features.values():
> >> >              dlnode += self._make_dlitem(
> >> > -                [nodes.literal('', section.member.name)],
> section.text)
> >> > +                [nodes.literal('', section.member.name)],
> dedent(section.text))
> >> >              seen_item = True
> >> >
> >> >          if not seen_item:
> >> > @@ -265,9 +288,12 @@ def _nodes_for_sections(self, doc):
> >> >                  continue
> >> >              snode = self._make_section(section.tag)
> >> >              if section.tag and section.tag.startswith('Example'):
> >> > -                snode += self._nodes_for_example(section.text)
> >> > +                snode +=
> self._nodes_for_example(dedent(section.text))
> >> >              else:
> >> > -                self._parse_text_into_node(section.text, snode)
> >> > +                self._parse_text_into_node(
> >> > +                    dedent(section.text) if section.tag else
> section.text,
> >> > +                    snode,
> >> > +                )
> >> >              nodelist.append(snode)
> >> >          return nodelist
> >> >
> >> > diff --git a/scripts/qapi/parser.py b/scripts/qapi/parser.py
> >> > index 7b13a583ac1..8cdd5334ec6 100644
> >> > --- a/scripts/qapi/parser.py
> >> > +++ b/scripts/qapi/parser.py
> >> > @@ -448,7 +448,10 @@ def get_doc_indented(self, doc: 'QAPIDoc') ->
> Optional[str]:
> >> >          indent = must_match(r'\s*', line).end()
> >> >          if not indent:
> >> >              return line
> >> > -        doc.append_line(line[indent:])
> >> > +
> >> > +        # Preserve the indent, it's needed for rST formatting.
> >>
> >> I'm not sure we need the comment.
> >>
> >
> > OK.
> >
> >
> >> > +        doc.append_line(line)
> >> > +
> >> >          prev_line_blank = False
> >> >          while True:
> >> >              self.accept(False)
> >> > @@ -465,7 +468,8 @@ def get_doc_indented(self, doc: 'QAPIDoc') ->
> Optional[str]:
> >> >                      self,
> >> >                      "unexpected de-indent (expected at least %d
> spaces)" %
> >> >                      indent)
> >> > -            doc.append_line(line[indent:])
> >> > +            # Again, preserve the indent for ReST.
> >>
> >> Likewise.
> >>
> >> If you want to document the fact that .get_doc_indented() preserves
> >> indentation, a function comment or doc string would the proper place.
> >>
> >
> > Got it.
> >
> >
> >> > +            doc.append_line(line)
> >> >              prev_line_blank = True
> >> >
> >> >      def get_doc_paragraph(self, doc: 'QAPIDoc') -> Optional[str]:
> >>
> >> Correctness argument:
> >>
> >> 0. This patch merely moves an indentation adjustment from parser.py to
> >>    qapidoc.py.
> >>
> >>    Checks out: you remove indentation adjustment in parser.py, add it in
> >>    qapidoc.py, and that's it.
> >>
> >> 1. The new indentation adjuster in qapidoc.py behaves just like the old
> >>    one in parser.py did.
> >>
> >>    parser.py's .get_doc_indented() is used to parse what follows certain
> >>    sections' first line.  It's users store the text on this first line,
> >>    if any, with leading whitespace stripped, then call
> >>    .get_doc_indented() to eat the section's remaining lines.  All
> >>    non-blank lines eaten must be indented at least as much as the first
> >>    non-blank line.  .get_doc_indented() appends the lines eaten to the
> >>    stored text with the first non-blank line's indentation stripped from
> >>    all the non-blank lines.
> >>
> >>    Your patch drops this stripping of indentation from non-first lines.
> >>    This is what must now be done in qapidoc.py.
> >>
> >>    If the section's first line has no text, the first non-blank line's
> >>    indentation must be stripped from all non-blank lines.
> >>
> >>    If the section's first line has text, the next non-blank line's
> >>    indentation must be stripped from all lines but the first.
> >>
> >>    How can qapidoc.py detect whether the section's first line has text?
> >>    Fortunately, such a line will be unindented, i.e. start with a
> >>    non-blank character, and all remaining lines will be blank or
> >>    indented.
> >>
> >>    qapidoc.py's dedent() seems to remove indentation common to all
> >>    non-blank lines.  Except when there are multiple lines, and the first
> >>    one is not indented, it removes common indentation from the remaining
> >>    lines.
> >>
> >>    Common indentation works, because all lines are indented at least as
> >>    much as the one whose indentation we want to remove.
> >>
> >>    The conditional doesn't quite match the "if the section's first line"
> >>    above.  This simpler dedent() does, and also works in my testing:
> >>
> >> def dedent(text: str) -> str:
> >>     lines = text.splitlines(True)
> >>     if re.match(r"\s+", lines[0]):
> >>         # First line is indented; description started on
> >>         # the line after the name. dedent the whole block.
> >>         return textwrap.dedent(text)
> >>     # Descr started on same line. Dedent line 2+.
> >>     return lines[0] + textwrap.dedent("".join(lines[1:]))
> >>
> >>    What do you think?
> >>
> >
> > I try not to on most days.
>
> I always say "me thinking will cost you extra" ;)
>
> > I'll check it out, though in practice the generated documents were
> already
> > identical, so... I'll try yours and verify that's still true. If so,
> sure!
>

Yup, it checks out and the md5sums are still the same. Good enough for me.


> >
> >
> >> 2. The new one is applied exactly when the old one was.
> >>
> >>    The old one is applied to the sections starting with @FOO: and the
> >>    tagged sections (Returns:, Errors:, ...).
> >>
> >>    The new one is applied in ._nodes_for_members(),
> >>    ._nodes_for_enum_values(), _nodes_for_features(), and
> >>    ._nodes_for_sections().
> >>
> >>    It is not applied to the text of untagged sections, including the
> >>    body section.
> >>
> >>    Good.
>
> Thanks!
>
> [...]
>
>