From nobody Wed Nov 5 10:32:23 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1535042650853743.6177614202145; Thu, 23 Aug 2018 09:44:10 -0700 (PDT) Received: from localhost ([::1]:37711 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fssiP-0005GD-Lt for importer@patchew.org; Thu, 23 Aug 2018 12:44:09 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:32795) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fssgH-0003T9-Dh for qemu-devel@nongnu.org; Thu, 23 Aug 2018 12:42:00 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fsseu-0003bJ-VP for qemu-devel@nongnu.org; Thu, 23 Aug 2018 12:40:37 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:58526 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fsseu-0003a3-H1 for qemu-devel@nongnu.org; Thu, 23 Aug 2018 12:40:32 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0F1E240241DF; Thu, 23 Aug 2018 16:40:32 +0000 (UTC) Received: from blackfin.pond.sub.org (ovpn-116-97.ams2.redhat.com [10.36.116.97]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 558342026E1A; Thu, 23 Aug 2018 16:40:31 +0000 (UTC) Received: by blackfin.pond.sub.org (Postfix, from userid 1000) id 1C5C1110E7CE; Thu, 23 Aug 2018 18:40:26 +0200 (CEST) From: Markus Armbruster To: qemu-devel@nongnu.org Date: Thu, 23 Aug 2018 18:39:56 +0200 Message-Id: <20180823164025.12553-30-armbru@redhat.com> In-Reply-To: <20180823164025.12553-1-armbru@redhat.com> References: <20180823164025.12553-1-armbru@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 23 Aug 2018 16:40:32 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Thu, 23 Aug 2018 16:40:32 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'armbru@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v3 29/58] json: Fix \uXXXX for surrogate pairs X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: marcandre.lureau@redhat.com, mdroth@linux.vnet.ibm.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RDMRC_1 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" The JSON parser treats each half of a surrogate pair as unpaired surrogate. Fix it to recognize surrogate pairs. Signed-off-by: Markus Armbruster Reviewed-by: Eric Blake --- qobject/json-parser.c | 60 ++++++++++++++++++++++++++++--------------- tests/check-qjson.c | 3 +-- 2 files changed, 40 insertions(+), 23 deletions(-) diff --git a/qobject/json-parser.c b/qobject/json-parser.c index e49da192fe..73e6ad7458 100644 --- a/qobject/json-parser.c +++ b/qobject/json-parser.c @@ -64,16 +64,27 @@ static void GCC_FMT_ATTR(3, 4) parse_error(JSONParserCo= ntext *ctxt, error_setg(&ctxt->err, "JSON parse error, %s", message); } =20 -static int hex2decimal(char ch) +static int cvt4hex(const char *s) { - if (ch >=3D '0' && ch <=3D '9') { - return (ch - '0'); - } else if (ch >=3D 'a' && ch <=3D 'f') { - return 10 + (ch - 'a'); - } else if (ch >=3D 'A' && ch <=3D 'F') { - return 10 + (ch - 'A'); + int cp, i; + + cp =3D 0; + for (i =3D 0; i < 4; i++) { + if (!qemu_isxdigit(s[i])) { + return -1; + } + cp <<=3D 4; + if (s[i] >=3D '0' && s[i] <=3D '9') { + cp |=3D s[i] - '0'; + } else if (s[i] >=3D 'a' && s[i] <=3D 'f') { + cp |=3D 10 + s[i] - 'a'; + } else if (s[i] >=3D 'A' && s[i] <=3D 'F') { + cp |=3D 10 + s[i] - 'A'; + } else { + return -1; + } } - abort(); + return cp; } =20 /** @@ -115,7 +126,8 @@ static QString *parse_string(JSONParserContext *ctxt, J= SONToken *token) const char *ptr =3D token->str; QString *str; char quote; - int cp, i; + const char *beg; + int cp, trailing; char *end; ssize_t len; char utf8_buf[5]; @@ -127,7 +139,7 @@ static QString *parse_string(JSONParserContext *ctxt, J= SONToken *token) while (*ptr !=3D quote) { assert(*ptr); if (*ptr =3D=3D '\\') { - ptr++; + beg =3D ptr++; switch (*ptr++) { case '"': qstring_append_chr(str, '"'); @@ -157,22 +169,28 @@ static QString *parse_string(JSONParserContext *ctxt,= JSONToken *token) qstring_append_chr(str, '\t'); break; case 'u': - cp =3D 0; - for (i =3D 0; i < 4; i++) { - if (!qemu_isxdigit(*ptr)) { - parse_error(ctxt, token, - "invalid hex escape sequence in string= "); - goto out; + cp =3D cvt4hex(ptr); + ptr +=3D 4; + + /* handle surrogate pairs */ + if (cp >=3D 0xD800 && cp <=3D 0xDBFF + && ptr[0] =3D=3D '\\' && ptr[1] =3D=3D 'u') { + /* leading surrogate followed by \u */ + cp =3D 0x10000 + ((cp & 0x3FF) << 10); + trailing =3D cvt4hex(ptr + 2); + if (trailing >=3D 0xDC00 && trailing <=3D 0xDFFF) { + /* followed by trailing surrogate */ + cp |=3D trailing & 0x3FF; + ptr +=3D 6; + } else { + cp =3D -1; /* invalid */ } - cp <<=3D 4; - cp |=3D hex2decimal(*ptr); - ptr++; } =20 if (mod_utf8_encode(utf8_buf, sizeof(utf8_buf), cp) < 0) { parse_error(ctxt, token, - "\\u%.4s is not a valid Unicode character", - ptr - 3); + "%.*s is not a valid Unicode character", + (int)(ptr - beg), beg); goto out; } qstring_append(str, utf8_buf); diff --git a/tests/check-qjson.c b/tests/check-qjson.c index 4abb5847ad..343f8af36a 100644 --- a/tests/check-qjson.c +++ b/tests/check-qjson.c @@ -63,8 +63,7 @@ static void escaped_string(void) { "double byte utf-8 \\u00A2", "double byte utf-8 \xc2\xa2" }, { "triple byte utf-8 \\u20AC", "triple byte utf-8 \xe2\x82\xac" }, { "quadruple byte utf-8 \\uD834\\uDD1E", /* U+1D11E */ - /* bug: want \xF0\x9D\x84\x9E */ - NULL }, + "quadruple byte utf-8 \xF0\x9D\x84\x9E" }, { "\\", NULL }, { "\\z", NULL }, { "\\ux", NULL }, --=20 2.17.1