[Qemu-devel] [PATCH 17/56] json: Reject unescaped control characters

Markus Armbruster posted 56 patches 7 years, 2 months ago
There is a newer version of this series
[Qemu-devel] [PATCH 17/56] json: Reject unescaped control characters
Posted by Markus Armbruster 7 years, 2 months ago
Fix the lexer to reject unescaped control characters in JSON strings,
in accordance with RFC 7159.

Bonus: we now recover more nicely from unclosed strings.  E.g.

    {"one: 1}\n{"two": 2}

now recovers cleanly after the newline, where before the lexer
remained confused until the next unpaired double quote or lexical
error.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 qobject/json-lexer.c | 4 ++--
 tests/check-qjson.c  | 6 +-----
 tests/qmp-test.c     | 4 ++--
 3 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c
index 7c0875d225..e85e9a78ff 100644
--- a/qobject/json-lexer.c
+++ b/qobject/json-lexer.c
@@ -115,7 +115,7 @@ static const uint8_t json_lexer[][256] =  {
         ['u'] = IN_DQ_UCODE0,
     },
     [IN_DQ_STRING] = {
-        [1 ... 0xBF] = IN_DQ_STRING,
+        [0x20 ... 0xBF] = IN_DQ_STRING,
         [0xC2 ... 0xF4] = IN_DQ_STRING,
         ['\\'] = IN_DQ_STRING_ESCAPE,
         ['"'] = JSON_STRING,
@@ -155,7 +155,7 @@ static const uint8_t json_lexer[][256] =  {
         ['u'] = IN_SQ_UCODE0,
     },
     [IN_SQ_STRING] = {
-        [1 ... 0xBF] = IN_SQ_STRING,
+        [0x20 ... 0xBF] = IN_SQ_STRING,
         [0xC2 ... 0xF4] = IN_SQ_STRING,
         ['\\'] = IN_SQ_STRING_ESCAPE,
         ['\''] = JSON_STRING,
diff --git a/tests/check-qjson.c b/tests/check-qjson.c
index fda2b014a3..7d8ce5c68d 100644
--- a/tests/check-qjson.c
+++ b/tests/check-qjson.c
@@ -171,11 +171,7 @@ static void utf8_string(void)
             "\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F"
             "\x10\x11\x12\x13\x14\x15\x16\x17"
             "\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F",
-            /* bug: not corrected (valid UTF-8, but invalid JSON) */
-            "\x01\x02\x03\x04\x05\x06\x07"
-            "\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F"
-            "\x10\x11\x12\x13\x14\x15\x16\x17"
-            "\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F",
+            NULL,
             "\\u0001\\u0002\\u0003\\u0004\\u0005\\u0006\\u0007"
             "\\b\\t\\n\\u000B\\f\\r\\u000E\\u000F"
             "\\u0010\\u0011\\u0012\\u0013\\u0014\\u0015\\u0016\\u0017"
diff --git a/tests/qmp-test.c b/tests/qmp-test.c
index 5117a1ab25..b77987b644 100644
--- a/tests/qmp-test.c
+++ b/tests/qmp-test.c
@@ -86,9 +86,9 @@ static void test_malformed(QTestState *qts)
     g_assert(recovered(qts));
 
     /* lexical error: control character in string */
-    qtest_qmp_send_raw(qts, "{'execute': 'nonexistent', 'id':'\n'}");
+    qtest_qmp_send_raw(qts, "{'execute': 'nonexistent', 'id':'\n");
     resp = qtest_qmp_receive(qts);
-    g_assert_cmpstr(get_error_class(resp), ==, "CommandNotFound"); /* BUG */
+    g_assert_cmpstr(get_error_class(resp), ==, "GenericError");
     qobject_unref(resp);
     g_assert(recovered(qts));
 
-- 
2.17.1


Re: [Qemu-devel] [PATCH 17/56] json: Reject unescaped control characters
Posted by Eric Blake 7 years, 2 months ago
On 08/08/2018 07:02 AM, Markus Armbruster wrote:
> Fix the lexer to reject unescaped control characters in JSON strings,
> in accordance with RFC 7159.

Question - can this break existing QMP clients that were relying on this 
extension working?

Libvirt used to use libyajl, now it uses libjansson. So I'll check both 
of those libraries:

yajl: https://github.com/lloyd/yajl/blob/master/src/yajl_encode.c#L32

             default:
                 if ((unsigned char) str[end] < 32) {
                     CharToHex(str[end], hexBuf + 4);
escaped = hexBuf;

jansson: https://github.com/akheron/jansson/blob/master/src/dump.c#L101

             /* mandatory escape or control char */
if(codepoint == '\\' || codepoint == '"' || codepoint < 0x20)

Okay, both libraries appear to always send control characters encoded, 
and thus were not relying on this accidental QMP extension.

Are we worried about other clients?

> 
> Bonus: we now recover more nicely from unclosed strings.  E.g.
> 
>      {"one: 1}\n{"two": 2}
> 
> now recovers cleanly after the newline, where before the lexer
> remained confused until the next unpaired double quote or lexical
> error.

On that grounds alone, I could live with this patch, even if we end up 
having to revert it later if some client was actually depending on 
sending raw control characters as part of a string.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH 17/56] json: Reject unescaped control characters
Posted by Markus Armbruster 7 years, 2 months ago
Eric Blake <eblake@redhat.com> writes:

> On 08/08/2018 07:02 AM, Markus Armbruster wrote:
>> Fix the lexer to reject unescaped control characters in JSON strings,
>> in accordance with RFC 7159.
>
> Question - can this break existing QMP clients that were relying on
> this extension working?

In theory, yes.

The "extension" is undocumented.  That makes it a bug.

I'm not aware of clients relying on it.

> Libvirt used to use libyajl, now it uses libjansson. So I'll check
> both of those libraries:
>
> yajl: https://github.com/lloyd/yajl/blob/master/src/yajl_encode.c#L32
>
>             default:
>                 if ((unsigned char) str[end] < 32) {
>                     CharToHex(str[end], hexBuf + 4);
> escaped = hexBuf;
>
> jansson: https://github.com/akheron/jansson/blob/master/src/dump.c#L101
>
>             /* mandatory escape or control char */
> if(codepoint == '\\' || codepoint == '"' || codepoint < 0x20)
>
> Okay, both libraries appear to always send control characters encoded,
> and thus were not relying on this accidental QMP extension.
>
> Are we worried about other clients?

Breakage seems unlikely to me.

>> Bonus: we now recover more nicely from unclosed strings.  E.g.
>>
>>      {"one: 1}\n{"two": 2}
>>
>> now recovers cleanly after the newline, where before the lexer
>> remained confused until the next unpaired double quote or lexical
>> error.
>
> On that grounds alone, I could live with this patch, even if we end up
> having to revert it later if some client was actually depending on
> sending raw control characters as part of a string.

Having to revert the patch to stay bug-compatible wouldn't be exactly
terrible.

> Reviewed-by: Eric Blake <eblake@redhat.com>

Thanks!