From nobody Wed Nov 5 16:42:19 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 15353536218881011.0068453824595; Mon, 27 Aug 2018 00:07:01 -0700 (PDT) Received: from localhost ([::1]:51720 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fuBbz-0001L4-QV for importer@patchew.org; Mon, 27 Aug 2018 03:06:55 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57811) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fuBVv-0005tk-5f for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fuBVr-00067Q-T6 for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:39 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:54312 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fuBVr-00066z-LE for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:35 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 42CE640241D8; Mon, 27 Aug 2018 07:00:35 +0000 (UTC) Received: from blackfin.pond.sub.org (ovpn-116-97.ams2.redhat.com [10.36.116.97]) by smtp.corp.redhat.com (Postfix) with ESMTPS id F147A1049481; Mon, 27 Aug 2018 07:00:32 +0000 (UTC) Received: by blackfin.pond.sub.org (Postfix, from userid 1000) id 040A811385CB; Mon, 27 Aug 2018 09:00:22 +0200 (CEST) From: Markus Armbruster To: qemu-devel@nongnu.org Date: Mon, 27 Aug 2018 09:00:19 +0200 Message-Id: <20180827070021.11931-5-armbru@redhat.com> In-Reply-To: <20180827070021.11931-1-armbru@redhat.com> References: <20180827070021.11931-1-armbru@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Mon, 27 Aug 2018 07:00:35 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Mon, 27 Aug 2018 07:00:35 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'armbru@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH 4/6] json: Nicer recovery from lexical errors X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: marcandre.lureau@redhat.com, mdroth@linux.vnet.ibm.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" When the lexer chokes on an input character, it consumes the character, emits a JSON error token, and enters its start state. This can lead to suboptimal error recovery. For instance, input 0123 , produces the tokens JSON_ERROR 01 JSON_INTEGER 23 JSON_COMMA , Make the lexer skip characters after a lexical error until a structural character ('[', ']', '{', '}', ':', ','), an ASCII control character, or '\xFE', or '\xFF'. Note that we must not skip ASCII control characters, '\xFE', '\xFF', because those are documented to force the JSON parser into known-good state, by docs/interop/qmp-spec.txt. The lexer now produces JSON_ERROR 01 JSON_COMMA , Update qmp-test for the nicer error recovery: QMP now report just one error for input %p instead of two. Also drop the newline after %p; it was needed to tease out the second error. Signed-off-by: Markus Armbruster Reviewed-by: Eric Blake --- qobject/json-lexer.c | 43 +++++++++++++++++++++++++++++-------------- tests/qmp-test.c | 6 +----- 2 files changed, 30 insertions(+), 19 deletions(-) diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c index 28582e17d9..39c7ce7adc 100644 --- a/qobject/json-lexer.c +++ b/qobject/json-lexer.c @@ -101,6 +101,7 @@ =20 enum json_lexer_state { IN_ERROR =3D 0, /* must really be 0, see json_lexer[] */ + IN_RECOVERY, IN_DQ_STRING_ESCAPE, IN_DQ_STRING, IN_SQ_STRING_ESCAPE, @@ -130,6 +131,28 @@ QEMU_BUILD_BUG_ON(IN_START_INTERP !=3D IN_START + 1); static const uint8_t json_lexer[][256] =3D { /* Relies on default initialization to IN_ERROR! */ =20 + /* error recovery */ + [IN_RECOVERY] =3D { + /* + * Skip characters until a structural character, an ASCII + * control character other than '\t', or impossible UTF-8 + * bytes '\xFE', '\xFF'. Structural characters and line + * endings are promising resynchronization points. Clients + * may use the others to force the JSON parser into known-good + * state; see docs/interop/qmp-spec.txt. + */ + [0 ... 0x1F] =3D IN_START | LOOKAHEAD, + [0x20 ... 0xFD] =3D IN_RECOVERY, + [0xFE ... 0xFF] =3D IN_START | LOOKAHEAD, + ['\t'] =3D IN_RECOVERY, + ['['] =3D IN_START | LOOKAHEAD, + [']'] =3D IN_START | LOOKAHEAD, + ['{'] =3D IN_START | LOOKAHEAD, + ['}'] =3D IN_START | LOOKAHEAD, + [':'] =3D IN_START | LOOKAHEAD, + [','] =3D IN_START | LOOKAHEAD, + }, + /* double quote string */ [IN_DQ_STRING_ESCAPE] =3D { [0x20 ... 0xFD] =3D IN_DQ_STRING, @@ -301,26 +324,18 @@ static void json_lexer_feed_char(JSONLexer *lexer, ch= ar ch, bool flush) /* fall through */ case JSON_SKIP: g_string_truncate(lexer->token, 0); + /* fall through */ + case IN_START: new_state =3D lexer->start_state; break; case IN_ERROR: - /* XXX: To avoid having previous bad input leaving the parser = in an - * unresponsive state where we consume unpredictable amounts of - * subsequent "good" input, percolate this error state up to t= he - * parser by emitting a JSON_ERROR token, then reset lexer sta= te. - * - * Also note that this handling is required for reliable chann= el - * negotiation between QMP and the guest agent, since chr(0xFF) - * is placed at the beginning of certain events to ensure prop= er - * delivery when the channel is in an unknown state. chr(0xFF)= is - * never a valid ASCII/UTF-8 sequence, so this should reliably - * induce an error/flush state. - */ json_message_process_token(lexer, lexer->token, JSON_ERROR, lexer->x, lexer->y); + new_state =3D IN_RECOVERY; + /* fall through */ + case IN_RECOVERY: g_string_truncate(lexer->token, 0); - lexer->state =3D lexer->start_state; - return; + break; default: break; } diff --git a/tests/qmp-test.c b/tests/qmp-test.c index 4ae2245484..04ad7648d2 100644 --- a/tests/qmp-test.c +++ b/tests/qmp-test.c @@ -93,11 +93,7 @@ static void test_malformed(QTestState *qts) g_assert(recovered(qts)); =20 /* lexical error: interpolation */ - qtest_qmp_send_raw(qts, "%%p\n"); - /* two errors, one for "%", one for "p" */ - resp =3D qtest_qmp_receive(qts); - g_assert_cmpstr(get_error_class(resp), =3D=3D, "GenericError"); - qobject_unref(resp); + qtest_qmp_send_raw(qts, "%%p"); resp =3D qtest_qmp_receive(qts); g_assert_cmpstr(get_error_class(resp), =3D=3D, "GenericError"); qobject_unref(resp); --=20 2.17.1