Eric Blake <eblake@redhat.com> writes:
> On 08/27/2018 02:00 AM, Markus Armbruster wrote:
>> The lexer fails to end a valid token when the lookahead character is
>> beyond '\x7F'. For instance, input
>>
>> true\xC2\xA2
>>
>> produces the tokens
>>
>> JSON_ERROR true\xC2
>> JSON_ERROR \xA2
>>
>> The first token should be
>>
>> JSON_KEYWORD true
>>
>> instead.
>
> As long as we still get a JSON_ERROR in the end.
We do: one for \xC2, and one for \xA2. PATCH 4 will lose the second one.
>> The culprit is
>>
>> #define TERMINAL(state) [0 ... 0x7F] = (state)
>>
>> It leaves [0x80..0xFF] zero, i.e. IN_ERROR. Has always been broken.
>
> I wonder if that was done because it was assuming that valid input is
> only ASCII, and that any byte larger than 0x7f is invalid except
> within the context of a string.
Plausible thinko.
> But whatever the reason for the
> original bug, your fix makes sense.
>
>> Fix it to initialize the complete array.
>
> Worth testsuite coverage?
Since lookahead bytes > 0x7F are always a parse error, all the bug can
do is swallow a TERMINAL() token right before a parse error. The
TERMINAL() tokens are JSON_INTEGER, JSON_FLOAT, JSON_KEYWORD, JSON_SKIP,
JSON_INTERP. Fairly harmless. In particular, JSON objects get through
even when followed by a byte > 0x7F.
Of course, test coverage wouldn't hurt regardless.
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>> qobject/json-lexer.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
Thanks!