From nobody Wed Nov 5 13:00:25 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (208.118.235.17 [208.118.235.17]) by mx.zohomail.com with SMTPS id 1534520356691555.0446081328766; Fri, 17 Aug 2018 08:39:16 -0700 (PDT) Received: from localhost ([::1]:34809 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fqgq5-00017x-NQ for importer@patchew.org; Fri, 17 Aug 2018 11:39:01 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46304) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fqgKI-00036Q-R7 for qemu-devel@nongnu.org; Fri, 17 Aug 2018 11:06:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fqgKF-0001ln-Cu for qemu-devel@nongnu.org; Fri, 17 Aug 2018 11:06:10 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:46146 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fqgKE-0001iR-NO for qemu-devel@nongnu.org; Fri, 17 Aug 2018 11:06:06 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6F1F9407B395; Fri, 17 Aug 2018 15:06:05 +0000 (UTC) Received: from blackfin.pond.sub.org (ovpn-116-56.ams2.redhat.com [10.36.116.56]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 41BB12156714; Fri, 17 Aug 2018 15:06:05 +0000 (UTC) Received: by blackfin.pond.sub.org (Postfix, from userid 1000) id 38C64110E679; Fri, 17 Aug 2018 17:06:00 +0200 (CEST) From: Markus Armbruster To: qemu-devel@nongnu.org Date: Fri, 17 Aug 2018 17:05:22 +0200 Message-Id: <20180817150559.16243-24-armbru@redhat.com> In-Reply-To: <20180817150559.16243-1-armbru@redhat.com> References: <20180817150559.16243-1-armbru@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Fri, 17 Aug 2018 15:06:05 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Fri, 17 Aug 2018 15:06:05 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'armbru@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v2 23/60] json: Leave rejecting invalid UTF-8 to parser X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: marcandre.lureau@redhat.com, mdroth@linux.vnet.ibm.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Both the lexer and the parser (attempt to) validate UTF-8 in JSON strings. The lexer rejects bytes that can't occur in valid UTF-8: \xC0..\xC1, \xF5..\xFF. This rejects some, but not all invalid UTF-8. It also rejects ASCII control characters \x00..\x1F, in accordance with RFC 7159 (see recent commit "json: Reject unescaped control characters"). When the lexer rejects, it ends the token right after the first bad byte. Good when the bad byte is a newline. Not so good when it's something like an overlong sequence in the middle of a string. For instance, input {"abc\xC0\xAFijk": 1}\n produces the tokens JSON_LCURLY { JSON_ERROR "abc\xC0 JSON_ERROR \xAF JSON_KEYWORD ijk JSON_ERROR ": 1}\n The parser then reports four errors Invalid JSON syntax Invalid JSON syntax JSON parse error, invalid keyword 'ijk' Invalid JSON syntax before it recovers at the newline. The commit before previous made the parser reject invalid UTF-8 sequences. Since then, anything the lexer rejects, the parser would reject as well. Thus, the lexer's rejecting is unnecessary for correctness, and harmful for error reporting. However, we want to keep rejecting ASCII control characters in the lexer, because that produces the behavior we want for unclosed strings. We also need to keep rejecting \xFF in the lexer, because we documented that as a way to reset the JSON parser (docs/interop/qmp-spec.txt section 2.6 QGA Synchronization), which means we can't change how we recover from this error now. I wish we hadn't done that. I think we should treat \xFE the same as \xFF. Change the lexer to accept \xC0..\xC1 and \xF5..\xFD. It now rejects only \x00..\x1F and \xFE..\xFF. Error reporting for invalid UTF-8 in strings is much improved, except for \xFE and \xFF. For the example above, the lexer now produces JSON_LCURLY { JSON_STRING "abc\xC0\xAFijk" JSON_COLON : JSON_INTEGER 1 JSON_RCURLY and the parser reports just JSON parse error, invalid UTF-8 sequence in string Signed-off-by: Markus Armbruster Reviewed-by: Eric Blake --- qobject/json-lexer.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c index 109a7d8bb8..ca1e0e2c03 100644 --- a/qobject/json-lexer.c +++ b/qobject/json-lexer.c @@ -177,8 +177,7 @@ static const uint8_t json_lexer[][256] =3D { ['u'] =3D IN_DQ_UCODE0, }, [IN_DQ_STRING] =3D { - [0x20 ... 0xBF] =3D IN_DQ_STRING, - [0xC2 ... 0xF4] =3D IN_DQ_STRING, + [0x20 ... 0xFD] =3D IN_DQ_STRING, ['\\'] =3D IN_DQ_STRING_ESCAPE, ['"'] =3D JSON_STRING, }, @@ -217,8 +216,7 @@ static const uint8_t json_lexer[][256] =3D { ['u'] =3D IN_SQ_UCODE0, }, [IN_SQ_STRING] =3D { - [0x20 ... 0xBF] =3D IN_SQ_STRING, - [0xC2 ... 0xF4] =3D IN_SQ_STRING, + [0x20 ... 0xFD] =3D IN_SQ_STRING, ['\\'] =3D IN_SQ_STRING_ESCAPE, ['\''] =3D JSON_STRING, }, --=20 2.17.1