include/qobject/json-parser.h | 12 +- qobject/json-parser-int.h | 13 +- qobject/json-lexer.c | 11 +- qobject/json-parser.c | 493 ++++++++++++++++------------------ qobject/json-streamer.c | 107 ++++---- 5 files changed, 310 insertions(+), 326 deletions(-)
This rewrites the json-parser to use a push parser aka state machine.
While push parsers are inherently more complex than recursive descent,
the grammar for JSON is simple enough that the parser remains readable.
There is therefore no need to use e.g. QEMU coroutines.
Unlike the suggestion in commit 62815d85aed ("json: Redesign the callback
to consume JSON values", 2018-08-24), I kept the json-streamer concept.
It helps in handling input limits, it performs error recovery, and it
converts the token-at-a-time push interface to callbacks---all things
that are more easily done in a separate layer to keep the parser clean.
However, there is no need anymore for it to store partial JSON objects
in tokenized form.
Another benefit is that QEMU can report the first parsing error
immediately, without waiting for delimiters to be balanced.
On top of the benefits intrinsic in the push architecture, it so happens
that it's really easy to add a location to JSON parsing errors now, so
do that as well.
Paolo
Paolo Bonzini (5):
json-parser: pass around lookahead token, constify
json-parser: replace with a push parser
json-streamer: remove token queue
json-streamer: do not heap-allocate JSONToken
json-parser: add location to JSON parsing errors
include/qobject/json-parser.h | 12 +-
qobject/json-parser-int.h | 13 +-
qobject/json-lexer.c | 11 +-
qobject/json-parser.c | 493 ++++++++++++++++------------------
qobject/json-streamer.c | 107 ++++----
5 files changed, 310 insertions(+), 326 deletions(-)
--
2.52.0
Paolo Bonzini <pbonzini@redhat.com> writes:
> This rewrites the json-parser to use a push parser aka state machine.
> While push parsers are inherently more complex than recursive descent,
> the grammar for JSON is simple enough that the parser remains readable.
> There is therefore no need to use e.g. QEMU coroutines.
>
> Unlike the suggestion in commit 62815d85aed ("json: Redesign the callback
> to consume JSON values", 2018-08-24), I kept the json-streamer concept.
> It helps in handling input limits, it performs error recovery, and it
> converts the token-at-a-time push interface to callbacks---all things
> that are more easily done in a separate layer to keep the parser clean.
> However, there is no need anymore for it to store partial JSON objects
> in tokenized form.
>
> Another benefit is that QEMU can report the first parsing error
> immediately, without waiting for delimiters to be balanced.
Sounds promising! Let's see...
Before the series:
$ socat "READLINE,prompt=QMP> " UNIX-CONNECT:$HOME/work/images/test-qmp
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 2, "major": 10}, "package": "v10.2.0-567-gfb6b66de43-dirty"}, "capabilities": ["oob"]}}
QMP> [{"a"]
Parse error not diagnosed right away, but ...
QMP> }
{"error": {"class": "GenericError", "desc": "JSON parse error, missing : in object pair"}}
.... only when the streamer decides the expression is complete.
After the series:
QMP> [{"a"]
{"error": {"class": "GenericError", "desc": "JSON parse error at line 1, column 6, expecting ':'"}}
Cool! However, if I do it again, things fall apart:
QMP> [{"a"
QMP> }
QMP> }
QMP> }
QMP> ]
QMP> ]
{"error": {"class": "GenericError", "desc": "JSON parse error at line 7, column 1, expecting value"}}
Parse error recovery not quite right?
> On top of the benefits intrinsic in the push architecture, it so happens
> that it's really easy to add a location to JSON parsing errors now, so
> do that as well.
>
> Paolo
Il ven 30 gen 2026, 14:00 Markus Armbruster <armbru@redhat.com> ha scritto:
> > Another benefit is that QEMU can report the first parsing error
> > immediately, without waiting for delimiters to be balanced.
>
> Sounds promising! Let's see...
>
> Before the series:
>
> $ socat "READLINE,prompt=QMP> "
> UNIX-CONNECT:$HOME/work/images/test-qmp
> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 2, "major": 10},
> "package": "v10.2.0-567-gfb6b66de43-dirty"}, "capabilities": ["oob"]}}
> QMP> [{"a"]
>
> Parse error not diagnosed right away, but ...
>
> QMP> }
> {"error": {"class": "GenericError", "desc": "JSON parse error, missing
> : in object pair"}}
>
> .... only when the streamer decides the expression is complete.
>
> After the series:
>
> QMP> [{"a"]
> {"error": {"class": "GenericError", "desc": "JSON parse error at line
> 1, column 6, expecting ':'"}}
>
> Cool! However, if I do it again, things fall apart:
>
> QMP> [{"a"
> QMP> }
> QMP> }
> QMP> }
> QMP> ]
> QMP> ]
> {"error": {"class": "GenericError", "desc": "JSON parse error at line
> 7, column 1, expecting value"}}
>
> Parse error recovery not quite right?
>
Well, if you read the above very carefully :) the error is *reported*
immediately, but recovery still waits for delimiters to be balanced.
In testing, when I got an error I just typed a long enough variant on
"]}]}]}" and that is enough to recover—just like in the old parser. Still
the difference in error reporting matters, because it gives feedback that
is immediately useful, rather than possibly delayed forever.
The policy is easy to change, either in v2 or in subsequent work, because
recovery is layered on top of json-parser and its code is nothing more than
"if you believe it's a good time to recover, reset the parser".
Paolo
> On top of the benefits intrinsic in the push architecture, it so happens
> > that it's really easy to add a location to JSON parsing errors now, so
> > do that as well.
> >
> > Paolo
>
>
Ping.
Paolo
Il mer 7 gen 2026, 09:48 Paolo Bonzini <pbonzini@redhat.com> ha scritto:
> This rewrites the json-parser to use a push parser aka state machine.
> While push parsers are inherently more complex than recursive descent,
> the grammar for JSON is simple enough that the parser remains readable.
> There is therefore no need to use e.g. QEMU coroutines.
>
> Unlike the suggestion in commit 62815d85aed ("json: Redesign the callback
> to consume JSON values", 2018-08-24), I kept the json-streamer concept.
> It helps in handling input limits, it performs error recovery, and it
> converts the token-at-a-time push interface to callbacks---all things
> that are more easily done in a separate layer to keep the parser clean.
> However, there is no need anymore for it to store partial JSON objects
> in tokenized form.
>
> Another benefit is that QEMU can report the first parsing error
> immediately, without waiting for delimiters to be balanced.
>
> On top of the benefits intrinsic in the push architecture, it so happens
> that it's really easy to add a location to JSON parsing errors now, so
> do that as well.
>
> Paolo
>
>
> Paolo Bonzini (5):
> json-parser: pass around lookahead token, constify
> json-parser: replace with a push parser
> json-streamer: remove token queue
> json-streamer: do not heap-allocate JSONToken
> json-parser: add location to JSON parsing errors
>
> include/qobject/json-parser.h | 12 +-
> qobject/json-parser-int.h | 13 +-
> qobject/json-lexer.c | 11 +-
> qobject/json-parser.c | 493 ++++++++++++++++------------------
> qobject/json-streamer.c | 107 ++++----
> 5 files changed, 310 insertions(+), 326 deletions(-)
>
> --
> 2.52.0
>
© 2016 - 2026 Red Hat, Inc.