[v1] json: More fixes, error reporting improvements, cleanups

[Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR

Posted by Markus Armbruster 7 years, 5 months ago

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-lexer.c      | 9 +++++----
 qobject/json-parser-int.h | 8 ++++----
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 39c7ce7adc..2a5561c917 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -100,8 +100,7 @@
  */
 
 enum json_lexer_state {
-    IN_ERROR = 0,               /* must really be 0, see json_lexer[] */
-    IN_RECOVERY,
+    IN_RECOVERY = 1,
     IN_DQ_STRING_ESCAPE,
     IN_DQ_STRING,
     IN_SQ_STRING_ESCAPE,
@@ -121,6 +120,8 @@ enum json_lexer_state {
     IN_START_INTERP,            /* must be IN_START + 1 */
 };
 
+QEMU_BUILD_BUG_ON(JSON_ERROR != 0);
+QEMU_BUILD_BUG_ON(IN_RECOVERY != JSON_ERROR + 1);
 QEMU_BUILD_BUG_ON((int)JSON_MIN <= (int)IN_START_INTERP);
 QEMU_BUILD_BUG_ON(JSON_MAX >= 0x80);
 QEMU_BUILD_BUG_ON(IN_START_INTERP != IN_START + 1);
@@ -176,7 +177,7 @@ static const uint8_t json_lexer[][256] =  {
     /* Zero */
     [IN_ZERO] = {
         TERMINAL(JSON_INTEGER),
-        ['0' ... '9'] = IN_ERROR,
+        ['0' ... '9'] = JSON_ERROR,
         ['.'] = IN_MANTISSA,
     },
 
@@ -328,7 +329,7 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush)
         case IN_START:
             new_state = lexer->start_state;
             break;
-        case IN_ERROR:
+        case JSON_ERROR:
             json_message_process_token(lexer, lexer->token, JSON_ERROR,
                                        lexer->x, lexer->y);
             new_state = IN_RECOVERY;
diff --git a/qobject/json-parser-int.h b/qobject/json-parser-int.h
index abeec63af5..57cb8e79d3 100644
--- a/qobject/json-parser-int.h
+++ b/qobject/json-parser-int.h
@@ -16,10 +16,11 @@
 
 #include "qapi/qmp/json-parser.h"
 
-
 typedef enum json_token_type {
-    JSON_MIN = 100,
-    JSON_LCURLY = JSON_MIN,
+    JSON_ERROR = 0,             /* must be zero, see json_lexer[] */
+    /* Gap for lexer states */
+    JSON_LCURLY = 100,
+    JSON_MIN = JSON_LCURLY,
     JSON_RCURLY,
     JSON_LSQUARE,
     JSON_RSQUARE,
@@ -31,7 +32,6 @@ typedef enum json_token_type {
     JSON_STRING,
     JSON_INTERP,
     JSON_SKIP,
-    JSON_ERROR,
     JSON_END_OF_INPUT,
     JSON_MAX = JSON_END_OF_INPUT
 } JSONTokenType;
-- 
2.17.1

Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR

Posted by Eric Blake 7 years, 5 months ago

On 08/27/2018 02:00 AM, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-lexer.c      | 9 +++++----
>   qobject/json-parser-int.h | 8 ++++----
>   2 files changed, 9 insertions(+), 8 deletions(-)
> 
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR

Posted by Eric Blake 7 years, 5 months ago

On 08/27/2018 02:00 AM, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   qobject/json-lexer.c      | 9 +++++----
>   qobject/json-parser-int.h | 8 ++++----
>   2 files changed, 9 insertions(+), 8 deletions(-)

> -
>   typedef enum json_token_type {
> -    JSON_MIN = 100,
> -    JSON_LCURLY = JSON_MIN,
> +    JSON_ERROR = 0,             /* must be zero, see json_lexer[] */
> +    /* Gap for lexer states */
> +    JSON_LCURLY = 100,
> +    JSON_MIN = JSON_LCURLY,

In an earlier version of this type of cleanup, you swapped the IN_ and 
JSON_ values and eliminated the gap, to make the overall table more 
compact (no storage wasted on any of the states in the gap between the two).

https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg01178.html

Is it still worth trying to minimize the gap between the two sequences, 
even if you now no longer swap them in order?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR

Posted by Markus Armbruster 7 years, 5 months ago

Eric Blake <eblake@redhat.com> writes:

> On 08/27/2018 02:00 AM, Markus Armbruster wrote:
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   qobject/json-lexer.c      | 9 +++++----
>>   qobject/json-parser-int.h | 8 ++++----
>>   2 files changed, 9 insertions(+), 8 deletions(-)
>
>> -
>>   typedef enum json_token_type {
>> -    JSON_MIN = 100,
>> -    JSON_LCURLY = JSON_MIN,
>> +    JSON_ERROR = 0,             /* must be zero, see json_lexer[] */
>> +    /* Gap for lexer states */
>> +    JSON_LCURLY = 100,
>> +    JSON_MIN = JSON_LCURLY,
>
> In an earlier version of this type of cleanup, you swapped the IN_ and
> JSON_ values and eliminated the gap, to make the overall table more
> compact (no storage wasted on any of the states in the gap between the
> two).
>
> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg01178.html
>
> Is it still worth trying to minimize the gap between the two
> sequences, even if you now no longer swap them in order?

You caught me :)

Eliminating the gap actually enlarges the table.  I first got confused,
then measured the size change backwards to confirm my confused ideas.
When I looked at the patch again, I realized my mistake, and silently
dropped this part of the change.

Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR

Posted by Eric Blake 7 years, 5 months ago

On 08/27/2018 11:40 PM, Markus Armbruster wrote:

>>>    typedef enum json_token_type {
>>> -    JSON_MIN = 100,
>>> -    JSON_LCURLY = JSON_MIN,
>>> +    JSON_ERROR = 0,             /* must be zero, see json_lexer[] */
>>> +    /* Gap for lexer states */
>>> +    JSON_LCURLY = 100,
>>> +    JSON_MIN = JSON_LCURLY,
>>
>> In an earlier version of this type of cleanup, you swapped the IN_ and
>> JSON_ values and eliminated the gap, to make the overall table more
>> compact (no storage wasted on any of the states in the gap between the
>> two).
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg01178.html
>>
>> Is it still worth trying to minimize the gap between the two
>> sequences, even if you now no longer swap them in order?
> 
> You caught me :)
> 
> Eliminating the gap actually enlarges the table.

Rather, switching the order enlarges the table.

>  I first got confused,
> then measured the size change backwards to confirm my confused ideas.
> When I looked at the patch again, I realized my mistake, and silently
> dropped this part of the change.

The size of the table is determined by the fact that we must initialize 
entry 0 (whether we spell it IN_ERROR or JSON_ERROR), then pay attention 
to the largest value assigned.  Re-reading json_lexer[], you are only 
initializing IN_* states, and not JSON_* states; swapping JSON_* to come 
first enlarged the table because you now have a bunch of additional rows 
in the table that are all 0-initialized to JSON_ERROR transitions.

So at the end of the day, leaving IN_* to be first, and putting JSON_* 
second, makes sense.

The question remains, then, if a fixed-size gap (by making JSON_MIN be 
exactly 100) is any smarter than a contiguous layout (by making JSON_MIN 
be IN_START_INTERP + 1).  I can't see any strong reason for preferring 
one form over the other, so keeping the gap doesn't hurt.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR

Posted by Eric Blake 7 years, 5 months ago

On 08/28/2018 10:01 AM, Eric Blake wrote:
> The question remains, then, if a fixed-size gap (by making JSON_MIN be 
> exactly 100) is any smarter than a contiguous layout (by making JSON_MIN 
> be IN_START_INTERP + 1).  I can't see any strong reason for preferring 
> one form over the other, so keeping the gap doesn't hurt.

And having said that, this patch series also introduced a second gap for 
LOOKAHEAD defined at 0x80, along with assertions that it doesn't collide 
with either IN_* or JSON_*. It may be just as easy to make JSON_MIN 
start at LOOKAHEAD+1 than to assert that all of JSON_* fits in the space 
between 100 and 0x80 - 1.  Although in the long run, I seriously doubt 
we'll be adding many new enum values to either the lexer or the parser.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR

Posted by Markus Armbruster 7 years, 5 months ago

Eric Blake <eblake@redhat.com> writes:

> On 08/28/2018 10:01 AM, Eric Blake wrote:
>> The question remains, then, if a fixed-size gap (by making JSON_MIN
>> be exactly 100) is any smarter than a contiguous layout (by making
>> JSON_MIN be IN_START_INTERP + 1).  I can't see any strong reason for
>> preferring one form over the other, so keeping the gap doesn't hurt.
>
> And having said that, this patch series also introduced a second gap
> for LOOKAHEAD defined at 0x80, along with assertions that it doesn't
> collide with either IN_* or JSON_*. It may be just as easy to make
> JSON_MIN start at LOOKAHEAD+1 than to assert that all of JSON_* fits
> in the space between 100 and 0x80 - 1.  Although in the long run, I
> seriously doubt we'll be adding many new enum values to either the
> lexer or the parser.

LOOKAHEAD is semantically independent from the lexer state.  Evidence:
we use both IN_START and IN_START | LOOKAHEAD in json_lexer[].

Re: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR

Posted by Markus Armbruster 7 years, 5 months ago

Eric Blake <eblake@redhat.com> writes:

> On 08/27/2018 11:40 PM, Markus Armbruster wrote:
>
>>>>    typedef enum json_token_type {
>>>> -    JSON_MIN = 100,
>>>> -    JSON_LCURLY = JSON_MIN,
>>>> +    JSON_ERROR = 0,             /* must be zero, see json_lexer[] */
>>>> +    /* Gap for lexer states */
>>>> +    JSON_LCURLY = 100,
>>>> +    JSON_MIN = JSON_LCURLY,
>>>
>>> In an earlier version of this type of cleanup, you swapped the IN_ and
>>> JSON_ values and eliminated the gap, to make the overall table more
>>> compact (no storage wasted on any of the states in the gap between the
>>> two).
>>>
>>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg01178.html
>>>
>>> Is it still worth trying to minimize the gap between the two
>>> sequences, even if you now no longer swap them in order?
>>
>> You caught me :)
>>
>> Eliminating the gap actually enlarges the table.
>
> Rather, switching the order enlarges the table.
>
>>  I first got confused,
>> then measured the size change backwards to confirm my confused ideas.
>> When I looked at the patch again, I realized my mistake, and silently
>> dropped this part of the change.
>
> The size of the table is determined by the fact that we must
> initialize entry 0 (whether we spell it IN_ERROR or JSON_ERROR), then
> pay attention to the largest value assigned.  Re-reading json_lexer[],
> you are only initializing IN_* states, and not JSON_* states;

Correct.

The JSON_* states other than JSON_ERROR all go to the start state
regardless of lookahead and without consuming it.  We implement that
state transition in code instead of putting it into the table:

        case JSON_STRING:
            json_message_process_token(lexer, lexer->token, new_state,
                                       lexer->x, lexer->y);
            /* fall through */
        case JSON_SKIP:
            g_string_truncate(lexer->token, 0);
            /* fall through */
        case IN_START:
-->         new_state = lexer->start_state;
            break;

JSON_ERROR goes to IN_RECOVERY instead:

        case JSON_ERROR:
            json_message_process_token(lexer, lexer->token, JSON_ERROR,
                                       lexer->x, lexer->y);
--->        new_state = IN_RECOVERY;
            /* fall through */
        case IN_RECOVERY:

>                                                               swapping
> JSON_* to come first enlarged the table because you now have a bunch
> of additional rows in the table that are all 0-initialized to
> JSON_ERROR transitions.

Yes.  These rows are never used.

> So at the end of the day, leaving IN_* to be first, and putting JSON_*
> second, makes sense.
>
> The question remains, then, if a fixed-size gap (by making JSON_MIN be
> exactly 100) is any smarter than a contiguous layout (by making
> JSON_MIN be IN_START_INTERP + 1).  I can't see any strong reason for
> preferring one form over the other, so keeping the gap doesn't hurt.

The gap lets us hide the IN_* in json-lexer.c.  Not sure it's worth the
trouble.  We could move the IN_* to json-parser-int.h for simplicity.
Not sure that's worth the trouble, either :)