This rewrites the json-parser to use a push parser aka state machine.
While push parsers are inherently more complex than recursive descent,
the grammar for JSON is simple enough that the parser remains readable.
There is therefore no need to use e.g. QEMU coroutines.
Unlike the suggestion in commit 62815d85aed ("json: Redesign the callback
to consume JSON values", 2018-08-24), I kept the json-streamer concept.
It helps in handling input limits, it performs error recovery, and it
converts the token-at-a-time push interface to callbacks---all things
that are more easily done in a separate layer to keep the parser clean.
However, there is no need anymore for it to store partial JSON objects
in tokenized form, because the current state is stored in the push
parser's stack.
Another benefit is that QEMU can report the first parsing error
immediately, without waiting for parentheses to be balanced or for a
lexing error. Error recovery then proceeds as before (i.e., the next
parse still starts after balanced parentheses or a lexing error).
On top of the benefits intrinsic in the push architecture, it so happens
that it's really easy to add a location to JSON parsing errors now, so
do that as well.
The diffstat is unfavorable, but most of the new lines delta is really
new comments explaining the grammar and state machines.
Paolo
v2->v3:
- accept interpolation for the key of a dictionary
v1->v2:
- remove part of the patch to pass around the lookahead token,
it was hard to review and added little value
- separate patch to reuse the JSONParser
- separate patch to make brace/bracket count unsigned
- add comment with the structure of the stack
- add big comment with the grammar
- split long lines
- remove QObject **value argument to pop_entry()
- add assertions about the type of the top-of-stack
- change error to "key is not a string in object"
- split out json_parser_reset() already in the first patch
- rename json_parser_parse_token() to parse_token()
- do not use single quotes in commit messages
- move initialization of JSONToken close to usage
Paolo Bonzini (7):
json-parser: constify JSONToken
json-parser: replace with a push parser
json-streamer: reuse parser
json-streamer: make brace/bracket count unsigned
json-streamer: remove token queue
json-streamer: do not heap-allocate JSONToken
json-parser: add location to JSON parsing errors
include/qobject/json-parser.h | 16 +-
qobject/json-parser-int.h | 13 +-
qobject/json-lexer.c | 11 +-
qobject/json-parser.c | 580 +++++++++++++++++++---------------
qobject/json-streamer.c | 120 +++----
5 files changed, 415 insertions(+), 325 deletions(-)
--
2.54.0