schemas: Add vim modeline

[PATCH] schemas: Add vim modeline

Posted by Andrea Bolognani 5 years, 6 months ago

The various schemas included in QEMU use a JSON-based format which
is, however, strictly speaking not valid JSON.

As a consequence, when vim tries to apply syntax highlight rules
for JSON (as guessed from the file name), the result is an unreadable
mess which mostly consist of red markers pointing out supposed errors
in, well, pretty much everything.

Using Python syntax highlighting produces much better results, and
in fact these files already start with specially-formatted comments
that instruct Emacs to process them as if they were Python files.

This commit adds the equivalent special comments for vim.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
---
 docs/interop/firmware.json                | 1 +
 docs/interop/vhost-user.json              | 1 +
 qapi/authz.json                           | 1 +
 qapi/block-core.json                      | 1 +
 qapi/block.json                           | 1 +
 qapi/char.json                            | 1 +
 qapi/common.json                          | 1 +
 qapi/control.json                         | 1 +
 qapi/crypto.json                          | 1 +
 qapi/dump.json                            | 1 +
 qapi/error.json                           | 1 +
 qapi/introspect.json                      | 1 +
 qapi/job.json                             | 1 +
 qapi/machine-target.json                  | 1 +
 qapi/machine.json                         | 1 +
 qapi/migration.json                       | 1 +
 qapi/misc-target.json                     | 1 +
 qapi/misc.json                            | 1 +
 qapi/net.json                             | 1 +
 qapi/qapi-schema.json                     | 1 +
 qapi/qdev.json                            | 1 +
 qapi/qom.json                             | 1 +
 qapi/rdma.json                            | 1 +
 qapi/rocker.json                          | 1 +
 qapi/run-state.json                       | 1 +
 qapi/sockets.json                         | 1 +
 qapi/tpm.json                             | 1 +
 qapi/transaction.json                     | 1 +
 qapi/ui.json                              | 1 +
 qga/qapi-schema.json                      | 1 +
 storage-daemon/qapi/qapi-schema.json      | 1 +
 tests/qapi-schema/doc-good.json           | 2 ++
 tests/qapi-schema/include/sub-module.json | 1 +
 tests/qapi-schema/qapi-schema-test.json   | 1 +
 tests/qapi-schema/sub-sub-module.json     | 1 +
 35 files changed, 36 insertions(+)

diff --git a/docs/interop/firmware.json b/docs/interop/firmware.json
index 240f565397..989f10b626 100644
--- a/docs/interop/firmware.json
+++ b/docs/interop/firmware.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 # Copyright (C) 2018 Red Hat, Inc.
 #
diff --git a/docs/interop/vhost-user.json b/docs/interop/vhost-user.json
index ef8ac5941f..feb5fe58ca 100644
--- a/docs/interop/vhost-user.json
+++ b/docs/interop/vhost-user.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 # Copyright (C) 2018 Red Hat, Inc.
 #
diff --git a/qapi/authz.json b/qapi/authz.json
index 1c836a3abd..f3e9745426 100644
--- a/qapi/authz.json
+++ b/qapi/authz.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 # QAPI authz definitions
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 943df1926a..5f72b50149 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 
 ##
 # == Block core (VM unrelated)
diff --git a/qapi/block.json b/qapi/block.json
index 2ddbfa8306..c54a393cf3 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 
 ##
 # = Block devices
diff --git a/qapi/char.json b/qapi/char.json
index daceb20f84..8aeedf96b2 100644
--- a/qapi/char.json
+++ b/qapi/char.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 
 ##
diff --git a/qapi/common.json b/qapi/common.json
index 7b9cbcd97b..716712d4b3 100644
--- a/qapi/common.json
+++ b/qapi/common.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 
 ##
 # = Common data types
diff --git a/qapi/control.json b/qapi/control.json
index 6b816bb61f..de51e9916c 100644
--- a/qapi/control.json
+++ b/qapi/control.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 
 ##
diff --git a/qapi/crypto.json b/qapi/crypto.json
index b2a4cff683..c41e869e31 100644
--- a/qapi/crypto.json
+++ b/qapi/crypto.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 
 ##
diff --git a/qapi/dump.json b/qapi/dump.json
index a1eed7b15c..f7c4267e3f 100644
--- a/qapi/dump.json
+++ b/qapi/dump.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 # This work is licensed under the terms of the GNU GPL, version 2 or later.
 # See the COPYING file in the top-level directory.
diff --git a/qapi/error.json b/qapi/error.json
index 3fad08f506..94a6502de9 100644
--- a/qapi/error.json
+++ b/qapi/error.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 
 ##
 # = QMP errors
diff --git a/qapi/introspect.json b/qapi/introspect.json
index b1aabd4cfd..944bb87a20 100644
--- a/qapi/introspect.json
+++ b/qapi/introspect.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 # Copyright (C) 2015 Red Hat, Inc.
 #
diff --git a/qapi/job.json b/qapi/job.json
index 5e658281f5..e9fed7aeb5 100644
--- a/qapi/job.json
+++ b/qapi/job.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 
 ##
 # == Background jobs
diff --git a/qapi/machine-target.json b/qapi/machine-target.json
index f2c82949d8..698850cc78 100644
--- a/qapi/machine-target.json
+++ b/qapi/machine-target.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 # This work is licensed under the terms of the GNU GPL, version 2 or later.
 # See the COPYING file in the top-level directory.
diff --git a/qapi/machine.json b/qapi/machine.json
index ff7b5032e3..95558af98d 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 # This work is licensed under the terms of the GNU GPL, version 2 or later.
 # See the COPYING file in the top-level directory.
diff --git a/qapi/migration.json b/qapi/migration.json
index eca2981d0a..a855f3c763 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 
 ##
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index dee3b45930..1e561fa97b 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 
 ##
diff --git a/qapi/misc.json b/qapi/misc.json
index 99b90ac80b..9c5c227dfd 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 
 ##
diff --git a/qapi/net.json b/qapi/net.json
index cebb1b52e3..f63c282140 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 
 ##
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index 43b0ba0dea..f03ff91ceb 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 ##
 # = Introduction
 #
diff --git a/qapi/qdev.json b/qapi/qdev.json
index f4ed9735c4..13254529bf 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 # This work is licensed under the terms of the GNU GPL, version 2 or later.
 # See the COPYING file in the top-level directory.
diff --git a/qapi/qom.json b/qapi/qom.json
index 8abe998962..0b0b92944b 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 # This work is licensed under the terms of the GNU GPL, version 2 or later.
 # See the COPYING file in the top-level directory.
diff --git a/qapi/rdma.json b/qapi/rdma.json
index b58105b1b6..a1d2175a8b 100644
--- a/qapi/rdma.json
+++ b/qapi/rdma.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 
 ##
diff --git a/qapi/rocker.json b/qapi/rocker.json
index 52597db491..b48e49a89b 100644
--- a/qapi/rocker.json
+++ b/qapi/rocker.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 
 ##
 # = Rocker switch device
diff --git a/qapi/run-state.json b/qapi/run-state.json
index 2e22907740..7cc9f96a5b 100644
--- a/qapi/run-state.json
+++ b/qapi/run-state.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 
 ##
diff --git a/qapi/sockets.json b/qapi/sockets.json
index ea933ed4b2..2d558ebee7 100644
--- a/qapi/sockets.json
+++ b/qapi/sockets.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 
 ##
 # = Socket data types
diff --git a/qapi/tpm.json b/qapi/tpm.json
index dc1f081739..6a10c9ed8d 100644
--- a/qapi/tpm.json
+++ b/qapi/tpm.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 
 ##
diff --git a/qapi/transaction.json b/qapi/transaction.json
index b6c11158f0..15ddebdbc3 100644
--- a/qapi/transaction.json
+++ b/qapi/transaction.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 
 ##
diff --git a/qapi/ui.json b/qapi/ui.json
index e16e98a060..dacceeaf63 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 #
 
 ##
diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
index 4be9aad48e..359ab52ad1 100644
--- a/qga/qapi-schema.json
+++ b/qga/qapi-schema.json
@@ -1,4 +1,5 @@
 # *-*- Mode: Python -*-*
+# vim: filetype=python
 
 ##
 #
diff --git a/storage-daemon/qapi/qapi-schema.json b/storage-daemon/qapi/qapi-schema.json
index 14f4f8fe61..6100d1f0c9 100644
--- a/storage-daemon/qapi/qapi-schema.json
+++ b/storage-daemon/qapi/qapi-schema.json
@@ -1,4 +1,5 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
 
 # Note that modules are shared with the QEMU main schema under the assumption
 # that the storage daemon schema is a subset of the main schema. For the shared
diff --git a/tests/qapi-schema/doc-good.json b/tests/qapi-schema/doc-good.json
index ddd89d1233..9da72a1f55 100644
--- a/tests/qapi-schema/doc-good.json
+++ b/tests/qapi-schema/doc-good.json
@@ -1,4 +1,6 @@
 # -*- Mode: Python -*-
+# vim: filetype=python
+#
 # Positive QAPI doc comment tests
 
 { 'pragma': { 'doc-required': true } }
diff --git a/tests/qapi-schema/include/sub-module.json b/tests/qapi-schema/include/sub-module.json
index afdb267228..b9f7b9bb56 100644
--- a/tests/qapi-schema/include/sub-module.json
+++ b/tests/qapi-schema/include/sub-module.json
@@ -1,4 +1,5 @@
 # *-*- Mode: Python -*-*
+# vim: filetype=python
 
 # Sub-module of ../qapi-schema-test.json
 
diff --git a/tests/qapi-schema/qapi-schema-test.json b/tests/qapi-schema/qapi-schema-test.json
index 6b1f05afa7..3a9f2cbb33 100644
--- a/tests/qapi-schema/qapi-schema-test.json
+++ b/tests/qapi-schema/qapi-schema-test.json
@@ -1,4 +1,5 @@
 # *-*- Mode: Python -*-*
+# vim: filetype=python
 
 # This file is a stress test of supported qapi constructs that must
 # parse and compile correctly.
diff --git a/tests/qapi-schema/sub-sub-module.json b/tests/qapi-schema/sub-sub-module.json
index 524ef9b83f..94f36ec0b1 100644
--- a/tests/qapi-schema/sub-sub-module.json
+++ b/tests/qapi-schema/sub-sub-module.json
@@ -1,4 +1,5 @@
 # *-*- Mode: Python -*-*
+# vim: filetype=python
 
 # Sub-module of sub-module include/sub-module.json of qapi-schema-test.json
 
-- 
2.25.4

Re: [PATCH] schemas: Add vim modeline

Posted by Markus Armbruster 5 years, 6 months ago

Andrea Bolognani <abologna@redhat.com> writes:

> The various schemas included in QEMU use a JSON-based format which
> is, however, strictly speaking not valid JSON.
>
> As a consequence, when vim tries to apply syntax highlight rules
> for JSON (as guessed from the file name), the result is an unreadable
> mess which mostly consist of red markers pointing out supposed errors
> in, well, pretty much everything.
>
> Using Python syntax highlighting produces much better results, and
> in fact these files already start with specially-formatted comments
> that instruct Emacs to process them as if they were Python files.
>
> This commit adds the equivalent special comments for vim.
>
> Signed-off-by: Andrea Bolognani <abologna@redhat.com>

Naming QAPI schema files .json even though their contents isn't was a
mistake.  Correcting it would be a pain.  If we correct it, then the
sooner the better.

Renaming them to .py gives decent editor support out of the box.  Their
contents isn't quite Python, though: true vs. True, false vs. False.  Do
we care?  Only a few dozen occurences; they could be adjusted.

Renaming them to .qapi would perhaps be less confusing, for the price of
"out of the box".

Thoughts?

Re: [PATCH] schemas: Add vim modeline

Posted by Daniel P. Berrangé 5 years, 6 months ago

On Thu, Jul 30, 2020 at 11:07:26AM +0200, Markus Armbruster wrote:
> Andrea Bolognani <abologna@redhat.com> writes:
> 
> > The various schemas included in QEMU use a JSON-based format which
> > is, however, strictly speaking not valid JSON.
> >
> > As a consequence, when vim tries to apply syntax highlight rules
> > for JSON (as guessed from the file name), the result is an unreadable
> > mess which mostly consist of red markers pointing out supposed errors
> > in, well, pretty much everything.
> >
> > Using Python syntax highlighting produces much better results, and
> > in fact these files already start with specially-formatted comments
> > that instruct Emacs to process them as if they were Python files.
> >
> > This commit adds the equivalent special comments for vim.
> >
> > Signed-off-by: Andrea Bolognani <abologna@redhat.com>

Given that we already have emacs mode-lines, I see no reason to
not also have vim mode-lines, so regardless of the deeper discussion
I think this is patch is fine to merge in the short term

  Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

> Naming QAPI schema files .json even though their contents isn't was a
> mistake.  Correcting it would be a pain.  If we correct it, then the
> sooner the better.
> 
> Renaming them to .py gives decent editor support out of the box.  Their
> contents isn't quite Python, though: true vs. True, false vs. False.  Do
> we care?  Only a few dozen occurences; they could be adjusted.
> 
> Renaming them to .qapi would perhaps be less confusing, for the price of
> "out of the box".

IMHO, the critical rule is that if you a pick a particular file extension
associated with an existing language, you absolutely MUST BE compliant
with that language.

We fail at compliance with both JSON and Python because we're actually
using our own DSL (domain specific language).

IOW if we're going to stick with our current file format, then we should
be naming them .qapi. We can still use an editor mode line if we want to
claim we're approximately equiv to another language, but we can't be
surprised if editors get upset.

The bigger question is whether having our own DSL is justified ?

I'm *really* sceptical that it is.

We can't use JSON because it lacks comments. So we invented our own
psuedo-JSON parser that supported comments, and used ' instead of "
for some reason. We also needed to be able to parse a sequence of
multiple JSON documents in one file. We should have just picked a 
different language because JSON clearly didn't do what we eneeded.

You suggest naming them .py. If we do that, we must declare that they
are literally Python code and modify them so that we can load the 
files straight into the python intepretor as code, and not parse 
them as data. I feel unhappy about treating data as code though.

While JSON doesn't do what we need, its second-cousin YAML is a more
flexible format. Taking one example

---
##
# @ImageInfoSpecificQCow2:
#
# @compat: compatibility level
#
# ...snip...
#
# Since: 1.7
##
struct: ImageInfoSpecificQCow2
data:
  compat: str
  "*data-file": str
  "*data-file-raw": bool
  "*lazy-refcounts": bool
  "*corrupt": bool
  refcount-bits: int
  "*encrypt": ImageInfoSpecificQCow2Encryption
  "*bitmaps":
    - Qcow2BitmapInfo
  compression-type: Qcow2CompressionType

Then we could use a regular off the shelf YAML parser in python.

The uglyiness with quotes is due to the use of "*". Slightly less ugly
if we simply declare that quotes are always used, even where they're
not strictly required.

struct: ImageInfoSpecificQCow2
data:
  "compat": "str"
  "*data-file": "str"
  "*data-file-raw": "bool"
  "*lazy-refcounts": "bool"
  "*corrupt": "bool"
  "refcount-bits": "int"
  "*encrypt": "ImageInfoSpecificQCow2Encryption"
  "*bitmaps":
    - "Qcow2BitmapInfo"
  "compression-type": "Qcow2CompressionType"

With the use of "---" to denote the start of document, we have no trouble 
parsing our files which would actually be a concatenation of multiple 
documents. The python YAML library provides the easy yaml.load_all()
method.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH] schemas: Add vim modeline

Posted by Markus Armbruster 5 years, 6 months ago

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Thu, Jul 30, 2020 at 11:07:26AM +0200, Markus Armbruster wrote:
>> Andrea Bolognani <abologna@redhat.com> writes:
>> 
>> > The various schemas included in QEMU use a JSON-based format which
>> > is, however, strictly speaking not valid JSON.
>> >
>> > As a consequence, when vim tries to apply syntax highlight rules
>> > for JSON (as guessed from the file name), the result is an unreadable
>> > mess which mostly consist of red markers pointing out supposed errors
>> > in, well, pretty much everything.
>> >
>> > Using Python syntax highlighting produces much better results, and
>> > in fact these files already start with specially-formatted comments
>> > that instruct Emacs to process them as if they were Python files.
>> >
>> > This commit adds the equivalent special comments for vim.
>> >
>> > Signed-off-by: Andrea Bolognani <abologna@redhat.com>
>
> Given that we already have emacs mode-lines, I see no reason to
> not also have vim mode-lines, so regardless of the deeper discussion
> I think this is patch is fine to merge in the short term
>
>   Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
>
>
>> Naming QAPI schema files .json even though their contents isn't was a
>> mistake.  Correcting it would be a pain.  If we correct it, then the
>> sooner the better.
>> 
>> Renaming them to .py gives decent editor support out of the box.  Their
>> contents isn't quite Python, though: true vs. True, false vs. False.  Do
>> we care?  Only a few dozen occurences; they could be adjusted.
>> 
>> Renaming them to .qapi would perhaps be less confusing, for the price of
>> "out of the box".
>
> IMHO, the critical rule is that if you a pick a particular file extension
> associated with an existing language, you absolutely MUST BE compliant
> with that language.

Can't argue with that :)

> We fail at compliance with both JSON and Python because we're actually
> using our own DSL (domain specific language).
>
> IOW if we're going to stick with our current file format, then we should
> be naming them .qapi. We can still use an editor mode line if we want to
> claim we're approximately equiv to another language, but we can't be
> surprised if editors get upset.
>
>
> The bigger question is whether having our own DSL is justified ?
>
> I'm *really* sceptical that it is.

To be precise: our own DSL *syntax*.  Semantics is a separate question
to be skeptical about.

> We can't use JSON because it lacks comments. So we invented our own
> psuedo-JSON parser that supported comments, and used ' instead of "
> for some reason. We also needed to be able to parse a sequence of
> multiple JSON documents in one file. We should have just picked a 
> different language because JSON clearly didn't do what we eneeded.

JSON is a exceptionally poor choice for a DSL, or even a configuration
language.

Correcting our mistake involves a flag day and a re-learn.  We need to
weigh costs against benefits.

The QAPI schema language has two layers:

* JSON, with a lexical and a syntactical sub-layer (both in parser.py)

* QAPI, with a context-free and a context-dependend sub-layer (in
  expr.py and schema.py, respectively)

Replacing the JSON layer is possible as long as the replacement is
sufficiently expressive (not a tall order).

> You suggest naming them .py. If we do that, we must declare that they
> are literally Python code

Agree.

>                               modify them so that we can load the 
> files straight into the python intepretor as code, and not parse 
> them as data. I feel unhappy about treating data as code though.

Stress on *can* load.  Doesn't mean we should.

Ancient prior art: Lisp programs routinely use s-expressions as
configuration file syntax.  They don't load them as code, they read them
as data.

With Python, it's ast.parse(), I think.

> While JSON doesn't do what we need, its second-cousin YAML is a more
> flexible format. Taking one example
>
> ---
> ##
> # @ImageInfoSpecificQCow2:
> #
> # @compat: compatibility level
> #
> # ...snip...
> #
> # Since: 1.7
> ##
> struct: ImageInfoSpecificQCow2
> data:
>   compat: str
>   "*data-file": str
>   "*data-file-raw": bool
>   "*lazy-refcounts": bool
>   "*corrupt": bool
>   refcount-bits: int
>   "*encrypt": ImageInfoSpecificQCow2Encryption
>   "*bitmaps":
>     - Qcow2BitmapInfo
>   compression-type: Qcow2CompressionType
>
>
> Then we could use a regular off the shelf YAML parser in python.
>
> The uglyiness with quotes is due to the use of "*". Slightly less ugly
> if we simply declare that quotes are always used, even where they're
> not strictly required.

StrictYAML insists on quotes.

I hate having to quote identifiers.  There's a reason we don't write

    'int'
    'main'('int', 'argc', 'char' *'argv'[])
    {
        'printf'("hello world\n");
        return 0;
    }

> struct: ImageInfoSpecificQCow2
> data:
>   "compat": "str"
>   "*data-file": "str"
>   "*data-file-raw": "bool"
>   "*lazy-refcounts": "bool"
>   "*corrupt": "bool"
>   "refcount-bits": "int"
>   "*encrypt": "ImageInfoSpecificQCow2Encryption"
>   "*bitmaps":
>     - "Qcow2BitmapInfo"
>   "compression-type": "Qcow2CompressionType"
>
> With the use of "---" to denote the start of document, we have no trouble 
> parsing our files which would actually be a concatenation of multiple 
> documents. The python YAML library provides the easy yaml.load_all()
> method.

Required reading on YAML:
https://www.arp242.net/yaml-config.html

Some of the criticism there doesn't matter for our use case.

Re: [PATCH] schemas: Add vim modeline

Posted by Eric Blake 5 years, 6 months ago

On 7/30/20 6:51 AM, Markus Armbruster wrote:

>> Given that we already have emacs mode-lines, I see no reason to
>> not also have vim mode-lines, so regardless of the deeper discussion
>> I think this is patch is fine to merge in the short term
>>
>>    Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

Agreed on that front.

>>
>>
>>> Naming QAPI schema files .json even though their contents isn't was a
>>> mistake.  Correcting it would be a pain.  If we correct it, then the
>>> sooner the better.
>>>
>>> Renaming them to .py gives decent editor support out of the box.  Their
>>> contents isn't quite Python, though: true vs. True, false vs. False.  Do
>>> we care?  Only a few dozen occurences; they could be adjusted.
>>>
>>> Renaming them to .qapi would perhaps be less confusing, for the price of
>>> "out of the box".

I've tried that patch in the past, but it went nowhere at the time.
https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg03173.html

Rebasing it to the present would be a complete rewrite, but I'm still 
willing to do it if we think it will go somewhere this time.


>>
>> IMHO, the critical rule is that if you a pick a particular file extension
>> associated with an existing language, you absolutely MUST BE compliant
>> with that language.
> 
> Can't argue with that :)

Also in violent agreement.

> 
>> We fail at compliance with both JSON and Python because we're actually
>> using our own DSL (domain specific language).
>>
>> IOW if we're going to stick with our current file format, then we should
>> be naming them .qapi. We can still use an editor mode line if we want to
>> claim we're approximately equiv to another language, but we can't be
>> surprised if editors get upset.
>>
>>
>> The bigger question is whether having our own DSL is justified ?
>>
>> I'm *really* sceptical that it is.
> 
> To be precise: our own DSL *syntax*.  Semantics is a separate question
> to be skeptical about.
> 
>> We can't use JSON because it lacks comments. So we invented our own
>> psuedo-JSON parser that supported comments, and used ' instead of "
>> for some reason. We also needed to be able to parse a sequence of
>> multiple JSON documents in one file. We should have just picked a
>> different language because JSON clearly didn't do what we eneeded.
> 
> JSON is a exceptionally poor choice for a DSL, or even a configuration
> language.
> 
> Correcting our mistake involves a flag day and a re-learn.  We need to
> weigh costs against benefits.

How much of that can be automated with tooling?  Yes, a re-learn is 
needed, but if a tool can convert between the two syntaces with minimal 
pain, the re-learn (in both directions: rebasing patches written 
pre-change for merge after the change lands upstream, and backporting 
patches post-change to trees that are pre-change) is not as painful.

> 
> The QAPI schema language has two layers:
> 
> * JSON, with a lexical and a syntactical sub-layer (both in parser.py)
> 
> * QAPI, with a context-free and a context-dependend sub-layer (in
>    expr.py and schema.py, respectively)
> 
> Replacing the JSON layer is possible as long as the replacement is
> sufficiently expressive (not a tall order).

I'm open to the idea, if we want to attempt it, and agree with the 
assessment that it is not a tall order.  In fact, if we were to go with 
the JSON5 language instead of JSON, we are already that much closer to 
having sufficiently expressive (although JSON5 does not seem to be as 
popular in terms of pre-written libraries).

> 
>> You suggest naming them .py. If we do that, we must declare that they
>> are literally Python code
> 
> Agree.

I'm not necessarily a fan of .py for QAPI files; it makes me think of 
executable python code, even if we chose it merely for something that 
python could parse natively instead of rolling our own parser.

> 
>>                                modify them so that we can load the
>> files straight into the python intepretor as code, and not parse
>> them as data. I feel unhappy about treating data as code though.
> 
> Stress on *can* load.  Doesn't mean we should.
> 
> Ancient prior art: Lisp programs routinely use s-expressions as
> configuration file syntax.  They don't load them as code, they read them
> as data.
> 
> With Python, it's ast.parse(), I think.
> 
>> While JSON doesn't do what we need, its second-cousin YAML is a more
>> flexible format. Taking one example
>>

> StrictYAML insists on quotes.
> 
> I hate having to quote identifiers.  There's a reason we don't write
> 
>      'int'
>      'main'('int', 'argc', 'char' *'argv'[])
>      {
>          'printf'("hello world\n");
>          return 0;
>      }
> 

JSON5 would also let us get rid of some quotes, if that is considered a 
desirable goal of the representation (although I'm not sure that quote 
avoidance should be driving our decision, so much as automated conversion).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH] schemas: Add vim modeline

Posted by John Snow 5 years, 6 months ago

On 7/30/20 11:11 AM, Eric Blake wrote:
> 
> Agreed on that front.

Thirded:

Reviewed-by: John Snow <jsnow@redhat.com>

Re: [PATCH] schemas: Add vim modeline

Posted by John Snow 5 years, 6 months ago

On 7/30/20 11:11 AM, Eric Blake wrote:
> JSON5 would also let us get rid of some quotes, if that is considered a 
> desirable goal of the representation (although I'm not sure that quote 
> avoidance should be driving our decision, so much as automated conversion).

There's no JSON5 parser built in to Python.

OK, there's no built-in YAML parser either, but I am not keen on 
fighting the built-in json module.

Re: [PATCH] schemas: Add vim modeline

Posted by Kevin Wolf 5 years, 6 months ago

Am 30.07.2020 um 17:11 hat Eric Blake geschrieben:
> > The QAPI schema language has two layers:
> > 
> > * JSON, with a lexical and a syntactical sub-layer (both in parser.py)
> > 
> > * QAPI, with a context-free and a context-dependend sub-layer (in
> >    expr.py and schema.py, respectively)
> > 
> > Replacing the JSON layer is possible as long as the replacement is
> > sufficiently expressive (not a tall order).
> 
> I'm open to the idea, if we want to attempt it, and agree with the
> assessment that it is not a tall order.

I'm not so sure about that. I mean, it certainly sounds doable if need
be, but getting better syntax highlighting by default in some editors
feels like a pretty weak reason to switch out the complete schema
language.

At first I was going to say "but if you don't have anything else to do
with your time...", but it's actually not only your time, but the time
of everyone who has development branches or downstream repositories and
will suffer rather nasty merge conflicts. So this will likely end up
having a non-negligible cost.

So is there more to it or are we really considering doing this just
because editors can tell more easily what to do with a different syntax?

Kevin

Re: [PATCH] schemas: Add vim modeline

Posted by Daniel P. Berrangé 5 years, 6 months ago

On Fri, Jul 31, 2020 at 09:15:13AM +0200, Kevin Wolf wrote:
> Am 30.07.2020 um 17:11 hat Eric Blake geschrieben:
> > > The QAPI schema language has two layers:
> > > 
> > > * JSON, with a lexical and a syntactical sub-layer (both in parser.py)
> > > 
> > > * QAPI, with a context-free and a context-dependend sub-layer (in
> > >    expr.py and schema.py, respectively)
> > > 
> > > Replacing the JSON layer is possible as long as the replacement is
> > > sufficiently expressive (not a tall order).
> > 
> > I'm open to the idea, if we want to attempt it, and agree with the
> > assessment that it is not a tall order.
> 
> I'm not so sure about that. I mean, it certainly sounds doable if need
> be, but getting better syntax highlighting by default in some editors
> feels like a pretty weak reason to switch out the complete schema
> language.

The main compelling reason to me is that we get to throw away our
custom written parsing code and use a standard python library.
Actually following a standardized document format is a benefit to
contributors, over something that looks like a standard format but
is actually slightly different as it is our own parser.

Automatic correct syntax highlighting is a happy side effect.
More interesting is if the editor automatically does the right
thing with indentation/formatting when writing schemas.

I wont miss the "not allowed a trailing comma after the last field"
part of JSON as I always get it wrong.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH] schemas: Add vim modeline

Posted by Markus Armbruster 5 years, 6 months ago

Kevin Wolf <kwolf@redhat.com> writes:

> Am 30.07.2020 um 17:11 hat Eric Blake geschrieben:
>> > JSON is a exceptionally poor choice for a DSL, or even a configuration
>> > language.
>> > 
>> > Correcting our mistake involves a flag day and a re-learn.  We need to
>> > weigh costs against benefits.
>> > 
>> > The QAPI schema language has two layers:
>> > 
>> > * JSON, with a lexical and a syntactical sub-layer (both in parser.py)

An incompatible dialect of JSON with a doc comment language, actually.

The need to keep doc generation working could complicate replacing the
lower layer.

>> > 
>> > * QAPI, with a context-free and a context-dependend sub-layer (in
>> >    expr.py and schema.py, respectively)
>> > 
>> > Replacing the JSON layer is possible as long as the replacement is
>> > sufficiently expressive (not a tall order).
>> 
>> I'm open to the idea, if we want to attempt it, and agree with the
>> assessment that it is not a tall order.

Careful, "not a tall order" is meant to apply to the "sufficiently
expressive" requirement for a replacemnt syntax.

On actually replacing the lower layer, I wrote "we need to weigh costs
against benefits."

> I'm not so sure about that. I mean, it certainly sounds doable if need
> be, but getting better syntax highlighting by default in some editors
> feels like a pretty weak reason to switch out the complete schema
> language.
>
> At first I was going to say "but if you don't have anything else to do
> with your time...", but it's actually not only your time, but the time
> of everyone who has development branches or downstream repositories and
> will suffer rather nasty merge conflicts. So this will likely end up
> having a non-negligible cost.

Yup.

> So is there more to it or are we really considering doing this just
> because editors can tell more easily what to do with a different syntax?

If memory serves, the following arguments have been raised:

1. A chance to improve ergonomics for developers

   Pain points include

   - Confusion

     It claims to be JSON, but it's not.

   - Need to learn another syntax

     Sunk cost for old hands, but it's a valid point all the same.

   - Poor tool support

     JSON tools don't work.  Python tools do, but you may have to work
     around the issue of true, false.

   - Excessive quoting

   - Verbosity

     When all you have is KEY: VALUE, defining things with multiple
     properties becomes verbose like

         'status': { 'type': 'DirtyBitmapStatus',
                     'features': [ 'deprecated' ] }

     We need syntactic sugar to keep vebosity in check for the most
     common cases.  More complexity.

   - No trailing comma in arrays and objects

   - No way to split long strings for legibility

   - The doc comment language is poorly specified

   - Parse error reporting could be better (JSON part) / could hardly be
     worse (doc comment part)

2. Not having to maintain our own code for the lower layer

   I consider this argument quite weak.  parser.py has some 400 SLOC.
   Writing and rewriting it is sunk cost.  Keeping it working has been
   cheap.  Keeping the glue for some off-the-shelf parser working isn't
   free, either.  No big savings to be had here, sorry.

   Almost half of parser.c is about doc comments, and it's the hairier
   part by far.  Peter has patches to drag the doc comment language
   closer to rST.  I don't remember whether they shrink parser.py.

3. Make the schema more easily consumable by other programs

   Use of a "standard" syntax instead of our funky dialect of JSON means
   other programs can use an off-the-shelf parser instead of using or
   reimplementing parser.py.

   Valid point for programs that parse the lower layer, and no more, say
   for basic syntax highlighting.

   Pretty much irrelevant for programs that need to go beyond the lower
   layer.  Parsing the lower layer is the easy part.  The code dealing
   with the upper layer is much larger (expr.py and schema.py), and it
   actually changes as we add features to the schema language.
   Duplicating it would be a Bad Idea.  Reuse the existing frontend
   instead.

Re: [PATCH] schemas: Add vim modeline

Posted by Kevin Wolf 5 years, 6 months ago

Am 31.07.2020 um 11:01 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 30.07.2020 um 17:11 hat Eric Blake geschrieben:
> >> > JSON is a exceptionally poor choice for a DSL, or even a configuration
> >> > language.
> >> > 
> >> > Correcting our mistake involves a flag day and a re-learn.  We need to
> >> > weigh costs against benefits.
> >> > 
> >> > The QAPI schema language has two layers:
> >> > 
> >> > * JSON, with a lexical and a syntactical sub-layer (both in parser.py)
> 
> An incompatible dialect of JSON with a doc comment language, actually.
> 
> The need to keep doc generation working could complicate replacing the
> lower layer.

Good point, we would have to keep the comment parser either way to be
used on top of the YAML (or whatever) parser.

Whatever parser we use would have to actually make comments available
rather than immediately filtering them out. This might exist for most
languages, but it will probably not be the most commonly used parser
either (or at least it will not allow using a simple interface like
json.loads() in Python).

> >> > 
> >> > * QAPI, with a context-free and a context-dependend sub-layer (in
> >> >    expr.py and schema.py, respectively)
> >> > 
> >> > Replacing the JSON layer is possible as long as the replacement is
> >> > sufficiently expressive (not a tall order).
> >> 
> >> I'm open to the idea, if we want to attempt it, and agree with the
> >> assessment that it is not a tall order.
> 
> Careful, "not a tall order" is meant to apply to the "sufficiently
> expressive" requirement for a replacemnt syntax.
> 
> On actually replacing the lower layer, I wrote "we need to weigh costs
> against benefits."
> 
> > I'm not so sure about that. I mean, it certainly sounds doable if need
> > be, but getting better syntax highlighting by default in some editors
> > feels like a pretty weak reason to switch out the complete schema
> > language.
> >
> > At first I was going to say "but if you don't have anything else to do
> > with your time...", but it's actually not only your time, but the time
> > of everyone who has development branches or downstream repositories and
> > will suffer rather nasty merge conflicts. So this will likely end up
> > having a non-negligible cost.
> 
> Yup.
> 
> > So is there more to it or are we really considering doing this just
> > because editors can tell more easily what to do with a different syntax?
> 
> If memory serves, the following arguments have been raised:
> 
> 1. A chance to improve ergonomics for developers
> 
>    Pain points include
> 
>    - Confusion
> 
>      It claims to be JSON, but it's not.
> 
>    - Need to learn another syntax
> 
>      Sunk cost for old hands, but it's a valid point all the same.

We use a similar (same?) form of "almost JSON" for QMP which will still
exist. So we'd be moving from having to learn one (non-standard)
language to two languages (one still non-standard and another one that
is hopefully more standard).

>    - Poor tool support
> 
>      JSON tools don't work.  Python tools do, but you may have to work
>      around the issue of true, false.

This is mostly the editor question this patch was about, right? Or are
people trying to use more sophisticated tools on it?

>    - Excessive quoting
> 
>    - Verbosity
> 
>      When all you have is KEY: VALUE, defining things with multiple
>      properties becomes verbose like
> 
>          'status': { 'type': 'DirtyBitmapStatus',
>                      'features': [ 'deprecated' ] }
> 
>      We need syntactic sugar to keep vebosity in check for the most
>      common cases.  More complexity.

I don't think this is something any of the suggested languages would
address.

>    - No trailing comma in arrays and objects
> 
>    - No way to split long strings for legibility
> 
>    - The doc comment language is poorly specified
> 
>    - Parse error reporting could be better (JSON part) / could hardly be
>      worse (doc comment part)

Has anyone looked into what error messages are like for the suggested
alternatives? "error reporting could be better" is something that is
true for a lot of software.

The doc comment part is not going to change unless we get rid of
comments and actually make documentation part of the objects themselves.
This might not be very readable.

Or I should rather say, making the doc comment part change is possible,
but would require the same changes as with our current lower layer
language and parser.

> 2. Not having to maintain our own code for the lower layer
> 
>    I consider this argument quite weak.  parser.py has some 400 SLOC.
>    Writing and rewriting it is sunk cost.  Keeping it working has been
>    cheap.  Keeping the glue for some off-the-shelf parser working isn't
>    free, either.  No big savings to be had here, sorry.
> 
>    Almost half of parser.c is about doc comments, and it's the hairier
>    part by far.  Peter has patches to drag the doc comment language
>    closer to rST.  I don't remember whether they shrink parser.py.
> 
> 3. Make the schema more easily consumable by other programs
> 
>    Use of a "standard" syntax instead of our funky dialect of JSON means
>    other programs can use an off-the-shelf parser instead of using or
>    reimplementing parser.py.
> 
>    Valid point for programs that parse the lower layer, and no more, say
>    for basic syntax highlighting.
> 
>    Pretty much irrelevant for programs that need to go beyond the lower
>    layer.  Parsing the lower layer is the easy part.  The code dealing
>    with the upper layer is much larger (expr.py and schema.py), and it
>    actually changes as we add features to the schema language.
>    Duplicating it would be a Bad Idea.  Reuse the existing frontend
>    instead.

Do other programs that go beyond syntax highlighting actually get to
parse our QAPI schema definitions? Or would they rather deal with the
return value of query-qmp-schema?

Neither the QAPI schema nor a YAML file with the same structure are a
standard approach to describe JSON documents. So even if we replace JSON
in the lower layer, the whole thing (and as you say, the upper layer is
the more interesting part) still stays non-standard in a way and more
advanced tools can't be used with it.

And of course, even if we did use something more standard like JSON
Schema or whatever exists for YAML, we would still have to massively
extend it because the QAPI schema contains much more information than
just what would be needed to validate some input. We control all aspects
of generated C code with it.

Kevin

Re: [PATCH] schemas: Add vim modeline

Posted by Markus Armbruster 5 years, 6 months ago

Kevin Wolf <kwolf@redhat.com> writes:

> Am 31.07.2020 um 11:01 hat Markus Armbruster geschrieben:
>> Kevin Wolf <kwolf@redhat.com> writes:
>> 
>> > Am 30.07.2020 um 17:11 hat Eric Blake geschrieben:
>> >> > JSON is a exceptionally poor choice for a DSL, or even a configuration
>> >> > language.
>> >> > 
>> >> > Correcting our mistake involves a flag day and a re-learn.  We need to
>> >> > weigh costs against benefits.
>> >> > 
>> >> > The QAPI schema language has two layers:
>> >> > 
>> >> > * JSON, with a lexical and a syntactical sub-layer (both in parser.py)
>> 
>> An incompatible dialect of JSON with a doc comment language, actually.
>> 
>> The need to keep doc generation working could complicate replacing the
>> lower layer.
>
> Good point, we would have to keep the comment parser either way to be
> used on top of the YAML (or whatever) parser.
>
> Whatever parser we use would have to actually make comments available
> rather than immediately filtering them out. This might exist for most
> languages, but it will probably not be the most commonly used parser
> either (or at least it will not allow using a simple interface like
> json.loads() in Python).
>
>> >> > 
>> >> > * QAPI, with a context-free and a context-dependend sub-layer (in
>> >> >    expr.py and schema.py, respectively)
>> >> > 
>> >> > Replacing the JSON layer is possible as long as the replacement is
>> >> > sufficiently expressive (not a tall order).
>> >> 
>> >> I'm open to the idea, if we want to attempt it, and agree with the
>> >> assessment that it is not a tall order.
>> 
>> Careful, "not a tall order" is meant to apply to the "sufficiently
>> expressive" requirement for a replacemnt syntax.
>> 
>> On actually replacing the lower layer, I wrote "we need to weigh costs
>> against benefits."
>> 
>> > I'm not so sure about that. I mean, it certainly sounds doable if need
>> > be, but getting better syntax highlighting by default in some editors
>> > feels like a pretty weak reason to switch out the complete schema
>> > language.
>> >
>> > At first I was going to say "but if you don't have anything else to do
>> > with your time...", but it's actually not only your time, but the time
>> > of everyone who has development branches or downstream repositories and
>> > will suffer rather nasty merge conflicts. So this will likely end up
>> > having a non-negligible cost.
>> 
>> Yup.
>> 
>> > So is there more to it or are we really considering doing this just
>> > because editors can tell more easily what to do with a different syntax?
>> 
>> If memory serves, the following arguments have been raised:
>> 
>> 1. A chance to improve ergonomics for developers
>> 
>>    Pain points include
>> 
>>    - Confusion
>> 
>>      It claims to be JSON, but it's not.
>> 
>>    - Need to learn another syntax
>> 
>>      Sunk cost for old hands, but it's a valid point all the same.
>
> We use a similar (same?) form of "almost JSON" for QMP which will still
> exist. So we'd be moving from having to learn one (non-standard)
> language to two languages (one still non-standard and another one that
> is hopefully more standard).

QMP is JSON (no almost).  It accepts single-quoted strings as an
extension (but does not produce them).  This is permitted by the RFC.
We can get rid of the extension if it irks us.

There's also the QMP-generating language in the template string of
qobject_from_jsonf_nofail() & friends.  Helps keeping C code readable.
This template language is definitely not JSON (not even almost).

>>    - Poor tool support
>> 
>>      JSON tools don't work.  Python tools do, but you may have to work
>>      around the issue of true, false.
>
> This is mostly the editor question this patch was about, right? Or are
> people trying to use more sophisticated tools on it?

I occasionally paste schema bits into Python (working around the
true/false issue), for quick ad hoc hackery, where hooking into the real
frontend would be overkill.

Anything more advanced than that would be a bad idea, in my opinion.
Use the real frontend.

>>    - Excessive quoting
>> 
>>    - Verbosity
>> 
>>      When all you have is KEY: VALUE, defining things with multiple
>>      properties becomes verbose like
>> 
>>          'status': { 'type': 'DirtyBitmapStatus',
>>                      'features': [ 'deprecated' ] }
>> 
>>      We need syntactic sugar to keep vebosity in check for the most
>>      common cases.  More complexity.
>
> I don't think this is something any of the suggested languages would
> address.

Correct.

>>    - No trailing comma in arrays and objects
>> 
>>    - No way to split long strings for legibility
>> 
>>    - The doc comment language is poorly specified
>> 
>>    - Parse error reporting could be better (JSON part) / could hardly be
>>      worse (doc comment part)
>
> Has anyone looked into what error messages are like for the suggested
> alternatives? "error reporting could be better" is something that is
> true for a lot of software.
>
> The doc comment part is not going to change unless we get rid of
> comments and actually make documentation part of the objects themselves.
> This might not be very readable.

With decent string syntax, the doc comment blocks could be turned into
strings.  But then we'd parse the strings instead, so no real change.

> Or I should rather say, making the doc comment part change is possible,
> but would require the same changes as with our current lower layer
> language and parser.
>
>> 2. Not having to maintain our own code for the lower layer
>> 
>>    I consider this argument quite weak.  parser.py has some 400 SLOC.
>>    Writing and rewriting it is sunk cost.  Keeping it working has been
>>    cheap.  Keeping the glue for some off-the-shelf parser working isn't
>>    free, either.  No big savings to be had here, sorry.
>> 
>>    Almost half of parser.c is about doc comments, and it's the hairier
>>    part by far.  Peter has patches to drag the doc comment language
>>    closer to rST.  I don't remember whether they shrink parser.py.
>> 
>> 3. Make the schema more easily consumable by other programs
>> 
>>    Use of a "standard" syntax instead of our funky dialect of JSON means
>>    other programs can use an off-the-shelf parser instead of using or
>>    reimplementing parser.py.
>> 
>>    Valid point for programs that parse the lower layer, and no more, say
>>    for basic syntax highlighting.
>> 
>>    Pretty much irrelevant for programs that need to go beyond the lower
>>    layer.  Parsing the lower layer is the easy part.  The code dealing
>>    with the upper layer is much larger (expr.py and schema.py), and it
>>    actually changes as we add features to the schema language.
>>    Duplicating it would be a Bad Idea.  Reuse the existing frontend
>>    instead.
>
> Do other programs that go beyond syntax highlighting actually get to
> parse our QAPI schema definitions? Or would they rather deal with the
> return value of query-qmp-schema?

query-qmp-schema is for introspecting QMP.  It tells you what *this*
build of QEMU's QMP can do.  The schema tells you what QMP can do in
*any* build of this version of QEMU, and more.

To introspect QMP, process output of query-qmp-schema.

To work with the QAPI schema, interface with the frontend from
scripts/qapi/.

> Neither the QAPI schema nor a YAML file with the same structure are a
> standard approach to describe JSON documents. So even if we replace JSON
> in the lower layer, the whole thing (and as you say, the upper layer is
> the more interesting part) still stays non-standard in a way and more
> advanced tools can't be used with it.
>
> And of course, even if we did use something more standard like JSON
> Schema or whatever exists for YAML, we would still have to massively
> extend it because the QAPI schema contains much more information than
> just what would be needed to validate some input. We control all aspects
> of generated C code with it.

Yup.  "IDL for QMP" is just one aspect of QAPI.

Re: [PATCH] schemas: Add vim modeline

Posted by Daniel P. Berrangé 5 years, 6 months ago

On Thu, Jul 30, 2020 at 01:51:10PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> >                               modify them so that we can load the 
> > files straight into the python intepretor as code, and not parse 
> > them as data. I feel unhappy about treating data as code though.
> 
> Stress on *can* load.  Doesn't mean we should.
> 
> Ancient prior art: Lisp programs routinely use s-expressions as
> configuration file syntax.  They don't load them as code, they read them
> as data.
> 
> With Python, it's ast.parse(), I think.

Yes, that could work


> > struct: ImageInfoSpecificQCow2
> > data:
> >   compat: str
> >   "*data-file": str
> >   "*data-file-raw": bool
> >   "*lazy-refcounts": bool
> >   "*corrupt": bool
> >   refcount-bits: int
> >   "*encrypt": ImageInfoSpecificQCow2Encryption
> >   "*bitmaps":
> >     - Qcow2BitmapInfo
> >   compression-type: Qcow2CompressionType
> >
> >
> > Then we could use a regular off the shelf YAML parser in python.
> >
> > The uglyiness with quotes is due to the use of "*". Slightly less ugly
> > if we simply declare that quotes are always used, even where they're
> > not strictly required.
> 
> StrictYAML insists on quotes.

I wouldn't suggest StrictYAML, just normal YAML is what pretty much
everyone uses.

If we came up with a different way to mark a field as optional
instead of using the magic "*" then we wouldn't need to quote
anything

> I hate having to quote identifiers.  There's a reason we don't write
> 
>     'int'
>     'main'('int', 'argc', 'char' *'argv'[])
>     {
>         'printf'("hello world\n");
>         return 0;
>     }
> 
> > struct: ImageInfoSpecificQCow2
> > data:
> >   "compat": "str"
> >   "*data-file": "str"
> >   "*data-file-raw": "bool"
> >   "*lazy-refcounts": "bool"
> >   "*corrupt": "bool"
> >   "refcount-bits": "int"
> >   "*encrypt": "ImageInfoSpecificQCow2Encryption"
> >   "*bitmaps":
> >     - "Qcow2BitmapInfo"
> >   "compression-type": "Qcow2CompressionType"
> >
> > With the use of "---" to denote the start of document, we have no trouble 
> > parsing our files which would actually be a concatenation of multiple 
> > documents. The python YAML library provides the easy yaml.load_all()
> > method.
> 
> Required reading on YAML:
> https://www.arp242.net/yaml-config.html

I don't think this is especially helpful to our evaluation. You can write
such blog posts about pretty much any thing if you want to pick holes in a
proposal. Certainly there's plenty of awful stuff you can write about
JSON, and Python.

> Some of the criticism there doesn't matter for our use case.

Yeah, what matters is whether it can do the job we need in a way that is
better than what we have today, and whether there are any further options
to consider that might be viable alternatives.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH] schemas: Add vim modeline

Posted by John Snow 5 years, 6 months ago

On 7/30/20 9:24 AM, Daniel P. Berrangé wrote:
> On Thu, Jul 30, 2020 at 01:51:10PM +0200, Markus Armbruster wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>>
>>>                                modify them so that we can load the
>>> files straight into the python intepretor as code, and not parse
>>> them as data. I feel unhappy about treating data as code though.
>>
>> Stress on *can* load.  Doesn't mean we should.
>>
>> Ancient prior art: Lisp programs routinely use s-expressions as
>> configuration file syntax.  They don't load them as code, they read them
>> as data.
>>
>> With Python, it's ast.parse(), I think.
> 
> Yes, that could work
> 

I use a similar trick for parsing "Fuzzy JSON" inside of qmp-shell.

It's cute, and I'm not really proud of it.

> 
>>> struct: ImageInfoSpecificQCow2
>>> data:
>>>    compat: str
>>>    "*data-file": str
>>>    "*data-file-raw": bool
>>>    "*lazy-refcounts": bool
>>>    "*corrupt": bool
>>>    refcount-bits: int
>>>    "*encrypt": ImageInfoSpecificQCow2Encryption
>>>    "*bitmaps":
>>>      - Qcow2BitmapInfo
>>>    compression-type: Qcow2CompressionType
>>>
>>>
>>> Then we could use a regular off the shelf YAML parser in python.
>>>

I have a prototype where I started this, but I use "---" as a document 
separator to allow us multiple definitions per file so that the nesting 
remains pleasant.

(YAML does not allow you to duplicate field names.)

>>> The uglyiness with quotes is due to the use of "*". Slightly less ugly
>>> if we simply declare that quotes are always used, even where they're
>>> not strictly required.
>>
>> StrictYAML insists on quotes.
> 
> I wouldn't suggest StrictYAML, just normal YAML is what pretty much
> everyone uses.
>  > If we came up with a different way to mark a field as optional
> instead of using the magic "*" then we wouldn't need to quote
> anything
> 

I have a YAML prototype branch where I use `?field` to indicate optional 
syntax. It works just fine, at the expense of being slightly new to people.

I tested with normal YAML, but I was thinking about adopting strict YAML 
because Markus wanted some assurance we wouldn't get lost in the weeds 
using complex feature of YAML.

(Or, shoot ourselves entirely by accident.)

My prototype doesn't use anything that Strict YAML prohibits, so I 
thought it was a good idea.

IF -- IF IF IF IF IF we decide that actually we need the crazy 
horsepower of standard YAML, or that strict YAML is too buggy -- we 
could always just replace it. No real big deal.

>> I hate having to quote identifiers.  There's a reason we don't write
>>
>>      'int'
>>      'main'('int', 'argc', 'char' *'argv'[])
>>      {
>>          'printf'("hello world\n");
>>          return 0;
>>      }
>>

Fair enough ... but there's no special meaning to quoting or not quoting 
the RHS in YAML, so maybe it's best to avoid pretending like there's a 
structural semantic between an identifier and a string there.

(Since they're both just strings, and the semantic difference is picked 
up inside the QAPI generator post-parse.)

>>> struct: ImageInfoSpecificQCow2
>>> data:
>>>    "compat": "str"
>>>    "*data-file": "str"
>>>    "*data-file-raw": "bool"
>>>    "*lazy-refcounts": "bool"
>>>    "*corrupt": "bool"
>>>    "refcount-bits": "int"
>>>    "*encrypt": "ImageInfoSpecificQCow2Encryption"
>>>    "*bitmaps":
>>>      - "Qcow2BitmapInfo"
>>>    "compression-type": "Qcow2CompressionType"
>>>
>>> With the use of "---" to denote the start of document, we have no trouble
>>> parsing our files which would actually be a concatenation of multiple
>>> documents. The python YAML library provides the easy yaml.load_all()
>>> method.
>>

Nevermind the earlier comment, then.

>> Required reading on YAML:
>> https://www.arp242.net/yaml-config.html
> 
> I don't think this is especially helpful to our evaluation. You can write
> such blog posts about pretty much any thing if you want to pick holes in a
> proposal. Certainly there's plenty of awful stuff you can write about
> JSON, and Python.
> 
>> Some of the criticism there doesn't matter for our use case.
> 
> Yeah, what matters is whether it can do the job we need in a way that is
> better than what we have today, and whether there are any further options
> to consider that might be viable alternatives.
> 
> Regards,
> Daniel
> 

I guess I'll dust off the work I have already to show the class.

--js

Re: [PATCH] schemas: Add vim modeline

Posted by Markus Armbruster 5 years, 6 months ago

John Snow <jsnow@redhat.com> writes:

> On 7/30/20 9:24 AM, Daniel P. Berrangé wrote:
>> On Thu, Jul 30, 2020 at 01:51:10PM +0200, Markus Armbruster wrote:
>>> Daniel P. Berrangé <berrange@redhat.com> writes:
>>>
>>>>                                modify them so that we can load the
>>>> files straight into the python intepretor as code, and not parse
>>>> them as data. I feel unhappy about treating data as code though.
>>>
>>> Stress on *can* load.  Doesn't mean we should.
>>>
>>> Ancient prior art: Lisp programs routinely use s-expressions as
>>> configuration file syntax.  They don't load them as code, they read them
>>> as data.
>>>
>>> With Python, it's ast.parse(), I think.
>>
>> Yes, that could work
>>
>
> I use a similar trick for parsing "Fuzzy JSON" inside of qmp-shell.
>
> It's cute, and I'm not really proud of it.
>
>>
>>>> struct: ImageInfoSpecificQCow2
>>>> data:
>>>>    compat: str
>>>>    "*data-file": str
>>>>    "*data-file-raw": bool
>>>>    "*lazy-refcounts": bool
>>>>    "*corrupt": bool
>>>>    refcount-bits: int
>>>>    "*encrypt": ImageInfoSpecificQCow2Encryption
>>>>    "*bitmaps":
>>>>      - Qcow2BitmapInfo
>>>>    compression-type: Qcow2CompressionType
>>>>
>>>>
>>>> Then we could use a regular off the shelf YAML parser in python.
>>>>
>
> I have a prototype where I started this, but I use "---" as a document
> separator to allow us multiple definitions per file so that the
> nesting remains pleasant.
>
> (YAML does not allow you to duplicate field names.)
>
>>>> The uglyiness with quotes is due to the use of "*". Slightly less ugly
>>>> if we simply declare that quotes are always used, even where they're
>>>> not strictly required.
>>>
>>> StrictYAML insists on quotes.
>>
>> I wouldn't suggest StrictYAML, just normal YAML is what pretty much
>> everyone uses.
>>  > If we came up with a different way to mark a field as optional
>> instead of using the magic "*" then we wouldn't need to quote
>> anything
>>
>
> I have a YAML prototype branch where I use `?field` to indicate
> optional syntax. It works just fine, at the expense of being slightly
> new to people.
>
> I tested with normal YAML, but I was thinking about adopting strict
> YAML because Markus wanted some assurance we wouldn't get lost in the
> weeds using complex feature of YAML.
>
> (Or, shoot ourselves entirely by accident.)
>
> My prototype doesn't use anything that Strict YAML prohibits, so I
> thought it was a good idea.
>
> IF -- IF IF IF IF IF we decide that actually we need the crazy
> horsepower of standard YAML, or that strict YAML is too buggy -- we
> could always just replace it. No real big deal.
>
>>> I hate having to quote identifiers.  There's a reason we don't write
>>>
>>>      'int'
>>>      'main'('int', 'argc', 'char' *'argv'[])
>>>      {
>>>          'printf'("hello world\n");
>>>          return 0;
>>>      }
>>>
>
> Fair enough ... but there's no special meaning to quoting or not
> quoting the RHS in YAML, so maybe it's best to avoid pretending like
> there's a structural semantic between an identifier and a string
> there.
>
> (Since they're both just strings, and the semantic difference is
> picked up inside the QAPI generator post-parse.)

You wish...

An unquoted right hand side is a string, unless it can be interpreted as
something else.  For instance, when something else is one of the
eleven(?) ways to say false, you have a variation of YAML's Norway
problem:

https://hitchdev.com/strictyaml/why/implicit-typing-removed/

>>>> struct: ImageInfoSpecificQCow2
>>>> data:
>>>>    "compat": "str"
>>>>    "*data-file": "str"
>>>>    "*data-file-raw": "bool"
>>>>    "*lazy-refcounts": "bool"
>>>>    "*corrupt": "bool"
>>>>    "refcount-bits": "int"
>>>>    "*encrypt": "ImageInfoSpecificQCow2Encryption"
>>>>    "*bitmaps":
>>>>      - "Qcow2BitmapInfo"
>>>>    "compression-type": "Qcow2CompressionType"
>>>>
>>>> With the use of "---" to denote the start of document, we have no trouble
>>>> parsing our files which would actually be a concatenation of multiple
>>>> documents. The python YAML library provides the easy yaml.load_all()
>>>> method.
>>>
>
> Nevermind the earlier comment, then.
>
>>> Required reading on YAML:
>>> https://www.arp242.net/yaml-config.html
>>
>> I don't think this is especially helpful to our evaluation. You can write
>> such blog posts about pretty much any thing if you want to pick holes in a
>> proposal. Certainly there's plenty of awful stuff you can write about
>> JSON, and Python.
>>
>>> Some of the criticism there doesn't matter for our use case.
>>
>> Yeah, what matters is whether it can do the job we need in a way that is
>> better than what we have today, and whether there are any further options
>> to consider that might be viable alternatives.

The sheer complexity of YAML puts me off.  The spec exceeds 20k words.

> I guess I'll dust off the work I have already to show the class.

Prototype code should beat guesswork.

Re: [PATCH] schemas: Add vim modeline

Posted by Daniel P. Berrangé 5 years, 6 months ago

On Fri, Jul 31, 2020 at 11:21:35AM +0200, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
> > On 7/30/20 9:24 AM, Daniel P. Berrangé wrote:
> >>> Some of the criticism there doesn't matter for our use case.
> >>
> >> Yeah, what matters is whether it can do the job we need in a way that is
> >> better than what we have today, and whether there are any further options
> >> to consider that might be viable alternatives.
> 
> The sheer complexity of YAML puts me off.  The spec exceeds 20k words.

Just because complexity exists doesn't mean it has any negative impact
on us as users. A large spec is a concern if writing a parser, but we'd
be using an off the shelf parser so not a concern. Or if you're trying
to do obscure stuff that pushes the boundaries of what YAML can do in
terms of data representation, but QAPI schema needs are really trivial
so won't ever hit anything complex unless we actively went looking for
it. In all the times I've used YAML I can't say "complexity" has been
a word that ever came to mind, quite the opposite, it felt simple.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH] schemas: Add vim modeline

Posted by Markus Armbruster 5 years, 6 months ago

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Thu, Jul 30, 2020 at 01:51:10PM +0200, Markus Armbruster wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> >                               modify them so that we can load the 
>> > files straight into the python intepretor as code, and not parse 
>> > them as data. I feel unhappy about treating data as code though.
>> 
>> Stress on *can* load.  Doesn't mean we should.
>> 
>> Ancient prior art: Lisp programs routinely use s-expressions as
>> configuration file syntax.  They don't load them as code, they read them
>> as data.
>> 
>> With Python, it's ast.parse(), I think.
>
> Yes, that could work
>
>
>> > struct: ImageInfoSpecificQCow2
>> > data:
>> >   compat: str
>> >   "*data-file": str
>> >   "*data-file-raw": bool
>> >   "*lazy-refcounts": bool
>> >   "*corrupt": bool
>> >   refcount-bits: int
>> >   "*encrypt": ImageInfoSpecificQCow2Encryption
>> >   "*bitmaps":
>> >     - Qcow2BitmapInfo
>> >   compression-type: Qcow2CompressionType
>> >
>> >
>> > Then we could use a regular off the shelf YAML parser in python.
>> >
>> > The uglyiness with quotes is due to the use of "*". Slightly less ugly
>> > if we simply declare that quotes are always used, even where they're
>> > not strictly required.
>> 
>> StrictYAML insists on quotes.
>
> I wouldn't suggest StrictYAML, just normal YAML is what pretty much
> everyone uses.
>
> If we came up with a different way to mark a field as optional
> instead of using the magic "*" then we wouldn't need to quote
> anything

Nope.

Stupid test script:

    $ cat test-yaml.py
    #!/usr/bin/python3

    import sys, yaml

    data = yaml.load(sys.stdin, Loader=yaml.CLoader)
    print(data)

Example taken from block.json:

    $ ./test-yaml.py <<EOF
    enum: FloppyDriveType
    data:
    - 144
    - 288
    - 120
    - none
    - auto
    EOF
    {'enum': 'FloppyDriveType', 'data': [144, 288, 120, 'none', 'auto']}

The upper layer will choke on this:

    qapi/block.yaml: In enum 'FloppyDriveType':
    qapi/block.yaml:61: 'data' member requires a string name

You could propose to provide "wouldn't need to quote anything" by
silently converting numbers back to strings.  I got two issues with
that.

1. What's the point in switching to an off-the-shelf parser to replace
less than 400 SLOC if I then have to write non-trivial glue code to make
the syntax less surprising?

2. Let me trot out the tired Norway problem again.  Say we QAPIfy -k so
that its argument is an enum, for nice introspection.  Something like

    $ ./test-yaml.py <<EOF
    enum: Keymap
    data:
    - ar
    - cz
    - de
    - de-ch
    - en-gb
    - en-us
    - no
    - pt
    - pt-br
    EOF
    {'enum': 'Keymap', 'data': ['ar', 'cz', 'de', 'de-ch', 'en-gb', 'en-us', False, 'pt', 'pt-br']}

To which of the eleven ways to say False in YAML should we convert?

YAML and the schema language are fundamentally at odds here: they fight
over types.  YAML lets the value determine the type regardless of
context.  But in the schema, the context determines the type.

>> I hate having to quote identifiers.  There's a reason we don't write
>> 
>>     'int'
>>     'main'('int', 'argc', 'char' *'argv'[])
>>     {
>>         'printf'("hello world\n");
>>         return 0;
>>     }
>> 
>> > struct: ImageInfoSpecificQCow2
>> > data:
>> >   "compat": "str"
>> >   "*data-file": "str"
>> >   "*data-file-raw": "bool"
>> >   "*lazy-refcounts": "bool"
>> >   "*corrupt": "bool"
>> >   "refcount-bits": "int"
>> >   "*encrypt": "ImageInfoSpecificQCow2Encryption"
>> >   "*bitmaps":
>> >     - "Qcow2BitmapInfo"
>> >   "compression-type": "Qcow2CompressionType"
>> >
>> > With the use of "---" to denote the start of document, we have no trouble 
>> > parsing our files which would actually be a concatenation of multiple 
>> > documents. The python YAML library provides the easy yaml.load_all()
>> > method.
>> 
>> Required reading on YAML:
>> https://www.arp242.net/yaml-config.html
>
> I don't think this is especially helpful to our evaluation. You can write
> such blog posts about pretty much any thing if you want to pick holes in a
> proposal. Certainly there's plenty of awful stuff you can write about
> JSON, and Python.

Picking holes in a proposal is precisely what we need to do before we
act on it and rebase our QAPI schema DSL.  That's expensive and painful,
so we better don't screw it up *again*.

Here's my superficial five minute assessment of the essay's main points
for our use case:

* Insecure by default

  Valid criticism of existing YAML tools, but hardly relevant for us,
  because 1. "don't do that then", and 2. the QAPI schema is trusted
  input.

* Can be hard to edit, especially for large files

  Valid when you use YAML to describe data.  We would use it as an IDL,
  though.  If parts of the interface description get too deeply nested
  or too long for comfort, we better provide means to rearrange it in
  more pleasant ways.  However, if our new base language got
  uncomfortable earlier than the old one, and the existing means to
  rearrange prove to weak (they are pretty weak), then we'd create
  additional work.

  I'm cautiously optimistic that YAML would do okay here.

* It’s pretty complex

  If you go "we'll use only a simple subset", then I go "define the
  subset, and tell me how to enforce the subset.  I have no interest in
  teaching contributors one by one which of the nine ways to write a
  multiline string to avoid, or which of the eleven ways to say False to
  use.

* Surprising behavior

  See "fight over types" above.

* It’s not portable

  Probably irrelevant, because feeding the schema to another YAML parser
  is so unlikely to be useful.

>> Some of the criticism there doesn't matter for our use case.
>
> Yeah, what matters is whether it can do the job we need in a way that is
> better than what we have today, and whether there are any further options
> to consider that might be viable alternatives.

Would it improve things enough to be worth the switching pain?

Re: [PATCH] schemas: Add vim modeline

Posted by Daniel P. Berrangé 5 years, 6 months ago

On Fri, Jul 31, 2020 at 02:55:34PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:

> >> Some of the criticism there doesn't matter for our use case.
> >
> > Yeah, what matters is whether it can do the job we need in a way that is
> > better than what we have today, and whether there are any further options
> > to consider that might be viable alternatives.
> 
> Would it improve things enough to be worth the switching pain?

The short answer is that I don't think that question matters. We should
do the conversion regardless, as our JSON-but-not file format has no
compelling reason to exist as a thing when there's a variety of standard
file formats that could do the job. I'd explicitly ignore the sunk costs
and minor amount of work to convert to a new format.

The long answer is that as a general philosophy I'm in favour of agressively
eliminating anything that is custom to a project and isn't offering an
compelling benefit over a functionally equivalent, commonly used / standard
solution.

Any time a project re-invents the wheel, that is one more piece of custom
knowledge a contributor has to learn. Each one may seem insignificant on
its own, but cummulatively they result in death by a 1000 cuts. This makes
a project increasingly less attractive to contribute to over the long term.

Measuring the long term benefit of the change is generally quite difficult,
because while you can see what impact a change will have today on current
code, it is hard to usefully evaluate future benefits as you're trying to
imagine the impact on things that don't even exist.

Overall my POV is not to think too hard about measuring improvements, and
discard any concern about sunk costs. Instead have a general presumption
in favour of eliminating any examples of wheel re-invention in a project.
Even if regular contributors don't want to spend time on such work, this
kind of thing is pretty amenable to new contributors looking for tasks to
start their involvement.

The QAPI JSON-but-not file format is a case where I think we should just
adopt a standard file format no matter what. A conversion will have some
short term work, but this is really simple data to deal with and the code
involved is nicely self contained. Again I'm not saying QAPI maintainers
must do it, just put the idea out there as a piece of work that would
be welcomed if someone is interested in working ont.

Another example would be elimination of anything in QEMU code that is
duplicating functionality in GLib, even if there zero functional
difference between the two impls.

Another example would be adopting a standard code style and using a
tool like clang-format to enforce this for entire of existing code
base and future contributions and throwing away our checkpatch.pl
which nearly everyone is scared of touching as it is Perl code.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH] schemas: Add vim modeline

Posted by John Snow 5 years, 6 months ago

On 7/31/20 11:07 AM, Daniel P. Berrangé wrote:
> On Fri, Jul 31, 2020 at 02:55:34PM +0200, Markus Armbruster wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
>>>> Some of the criticism there doesn't matter for our use case.
>>>
>>> Yeah, what matters is whether it can do the job we need in a way that is
>>> better than what we have today, and whether there are any further options
>>> to consider that might be viable alternatives.
>>
>> Would it improve things enough to be worth the switching pain?
> 
> The short answer is that I don't think that question matters. We should
> do the conversion regardless, as our JSON-but-not file format has no
> compelling reason to exist as a thing when there's a variety of standard
> file formats that could do the job. I'd explicitly ignore the sunk costs
> and minor amount of work to convert to a new format.
> 

This is not a language we spend a lot of time editing, too. Is it really 
such a big cost to re-learn a tiny language like this?

That concern seems overplayed, in my view.

> The long answer is that as a general philosophy I'm in favour of agressively
> eliminating anything that is custom to a project and isn't offering an
> compelling benefit over a functionally equivalent, commonly used / standard
> solution.
> 

I agree as violently as I know how. The purpose of this is not for us, 
it's for the ecosystem.

I saw the critique that we still use JSON-ish for the runtime QMP 
protocol, and moving the QAPI IDL to a standard wouldn't remove all 
instances of a custom format from our tree.

This is actually a great argument for doing the other thing: moving QMP 
onto a standard protocol. The point is not to complicate matters for 
QEMU developers, but to simplify matters for developers from other 
projects who are integrating QEMU.

> Any time a project re-invents the wheel, that is one more piece of custom
> knowledge a contributor has to learn. Each one may seem insignificant on
> its own, but cummulatively they result in death by a 1000 cuts. This makes
> a project increasingly less attractive to contribute to over the long term.
> 
> Measuring the long term benefit of the change is generally quite difficult,
> because while you can see what impact a change will have today on current
> code, it is hard to usefully evaluate future benefits as you're trying to
> imagine the impact on things that don't even exist.
> 
> Overall my POV is not to think too hard about measuring improvements, and
> discard any concern about sunk costs. Instead have a general presumption
> in favour of eliminating any examples of wheel re-invention in a project.
> Even if regular contributors don't want to spend time on such work, this
> kind of thing is pretty amenable to new contributors looking for tasks to
> start their involvement.
> 
> The QAPI JSON-but-not file format is a case where I think we should just
> adopt a standard file format no matter what. A conversion will have some
> short term work, but this is really simple data to deal with and the code
> involved is nicely self contained. Again I'm not saying QAPI maintainers
> must do it, just put the idea out there as a piece of work that would
> be welcomed if someone is interested in working ont.
> 

I have a lot of code trying to pylint and mypy-ify the QAPI parser, and 
a prototype YAML parser. The work that remains to be done is hooking up 
my new YAML parser into the code generator.

Assuming I am able to continue working on this, an RFC could be soon.

> Another example would be elimination of anything in QEMU code that is
> duplicating functionality in GLib, even if there zero functional
> difference between the two impls.
> 
> Another example would be adopting a standard code style and using a
> tool like clang-format to enforce this for entire of existing code
> base and future contributions and throwing away our checkpatch.pl
> which nearly everyone is scared of touching as it is Perl code.
> 
> Regards,
> Daniel
>

Re: [PATCH] schemas: Add vim modeline

Posted by Daniel P. Berrangé 5 years, 6 months ago

On Fri, Jul 31, 2020 at 11:26:28AM -0400, John Snow wrote:
> > The long answer is that as a general philosophy I'm in favour of agressively
> > eliminating anything that is custom to a project and isn't offering an
> > compelling benefit over a functionally equivalent, commonly used / standard
> > solution.
> > 
> 
> I agree as violently as I know how. The purpose of this is not for us, it's
> for the ecosystem.
> 
> I saw the critique that we still use JSON-ish for the runtime QMP protocol,
> and moving the QAPI IDL to a standard wouldn't remove all instances of a
> custom format from our tree.

I'd consider the runtime protocol separately. In terms of what's on the
wire, we use genuine JSON format. The runtime problem is simply that JSON
standard is useless when it comes to integers, leaving behaviour undefined
in the standard if you exceed 53 bits of precision. So there's no way to
reliably parse unsigned 64-bit integers. Given that QEMU needs to pass
uint64 values, JSON was simply the wrong choice of format for QMP.

There's a 3rd aspect which is our source code that deals with JSON, where
we defined some JSON extensions to make it easier for C code to construct
JSON documents for sending over the wire. Back when we did this, it was a
reasonably good idea as no obvious alternative existed for this problem.
Today, I would just suggest using GLib's  GVariant feature, which solves
the same problem for GLib's DBus APIs.

It is a shame we didn't just use DBus back in the day as that's a well
specified, simple protocol that would have done everything we needed,
including the ability to actually represent integers reliably. We
would be able to trivially talk to QEMU from any programming language,
and use common DBus code-generator tools instead of writing code
generators ourselves.

I wish that libvirt had picked DBus all that time ago too, instead of
creating our own RPC based on XDR, which is 95% identical to what
DBus provides but with a massive maint burden for ourselves. Back then
DBus didn't seem like it was good enough as it didn't offer TLS or
SASL support and that looked like such a big deal. With hindsight the
right answer was to add TLS + SASL to DBus, and not invent our own
protocol. We would have lacked TLS/SASL support in libvirt for 6
to 12 months or so, but then would have had 10 years benefitting
from the DBus ecosystem. Life would have been much easier for mgmt
tools using libvirt too, as they could have used native DBus APIs
instead of having to use the C libvirt.so client. This is one of my
biggest regrets with libvirt's architecture, and even ater 10 years
it is still probably worth fixing this mistake and adopting DBus.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH] schemas: Add vim modeline

Posted by Markus Armbruster 5 years, 6 months ago

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Fri, Jul 31, 2020 at 11:26:28AM -0400, John Snow wrote:
>> > The long answer is that as a general philosophy I'm in favour of agressively
>> > eliminating anything that is custom to a project and isn't offering an
>> > compelling benefit over a functionally equivalent, commonly used / standard
>> > solution.
>> > 
>> 
>> I agree as violently as I know how. The purpose of this is not for us, it's
>> for the ecosystem.
>> 
>> I saw the critique that we still use JSON-ish for the runtime QMP protocol,
>> and moving the QAPI IDL to a standard wouldn't remove all instances of a
>> custom format from our tree.
>
> I'd consider the runtime protocol separately. In terms of what's on the
> wire, we use genuine JSON format.

Yes.

Fine print: QMP accepts (but does not emit) minor extensions[*].  RFC
8592 actually permits this: "A JSON parser MAY accept non-JSON forms or
extensions."  This is due to lazy implementation, not due to an actual
need.  We could deprecate and remove.

>                                   The Runtime problem is simply that JSON
> standard is useless when it comes to integers, leaving behaviour undefined
> in the standard if you exceed 53 bits of precision. So there's no way to
> reliably parse unsigned 64-bit integers. Given that QEMU needs to pass
> uint64 values, JSON was simply the wrong choice of format for QMP.

To be precise: "An implementation may set limits on the range and
precision of numbers."  Not quite undefined behavior.  Useless all the
same.  There are more interoperability pitfalls due to JSON's
notoriously useless minimal requirements.  Range of numbers is the most
relevant one, because so many implementations set useless numeric
limits, and there is no good way to work around the issue.

> There's a 3rd aspect which is our source code that deals with JSON, where
> we defined some JSON extensions to make it easier for C code to construct
> JSON documents for sending over the wire.

Example: qmp_error_response() uses the interpolation extension to safely
insert values into a JSON syntax tree.

    rsp = qdict_from_jsonf_nofail("{ 'error': { 'class': %s, 'desc': %s } }",
                                  QapiErrorClass_str(error_get_class(err)),
                                  error_get_pretty(err));

Two extensions, actually:

1. Interpolation: tokens starting with % get replaced by the
   corresponding variable argument converted to a QObject.

2. Single-quoted strings to avoid leaning toothpick syndrome.

Without interpolation, we'd have to construct the tree by hand, like
this:

    error = qdict_new();
    qdict_put_str(error, "class", QapiErrorClass_str(error_get_class(err)),
    qdict_put_str(error, "desc", error_get_pretty(err));
    rsp = qdict_new();
    qdict_put(rsp, error);

Much harder to read even for such a simple example.

>                                           Back when we did this, it was a
> reasonably good idea as no obvious alternative existed for this problem.
> Today, I would just suggest using GLib's  GVariant feature, which solves
> the same problem for GLib's DBus APIs.
>
> It is a shame we didn't just use DBus back in the day as that's a well
> specified, simple protocol that would have done everything we needed,
> including the ability to actually represent integers reliably. We
> would be able to trivially talk to QEMU from any programming language,
> and use common DBus code-generator tools instead of writing code
> generators ourselves.

Water under the bridge.

[...]

[*] Single-quoted strings and \' in strings.

Re: [PATCH] schemas: Add vim modeline

Posted by Paolo Bonzini 5 years, 6 months ago

On 31/07/20 17:44, Daniel P. Berrangé wrote:
> I'd consider the runtime protocol separately. In terms of what's on the
> wire, we use genuine JSON format. The runtime problem is simply that JSON
> standard is useless when it comes to integers, leaving behaviour undefined
> in the standard if you exceed 53 bits of precision. So there's no way to
> reliably parse unsigned 64-bit integers. Given that QEMU needs to pass
> uint64 values, JSON was simply the wrong choice of format for QMP.

JSON's 53-bit precision was not part of RFC4627, which was the JSON
specification in 2010.  They say hindsight is 20/20, but referring to
RFC7159 which would come 4 years later is rewriting history, not hindsight.

> There's a 3rd aspect which is our source code that deals with JSON, where
> we defined some JSON extensions to make it easier for C code to construct
> JSON documents for sending over the wire. Back when we did this, it was a
> reasonably good idea as no obvious alternative existed for this problem.
> Today, I would just suggest using GLib's  GVariant feature, which solves
> the same problem for GLib's DBus APIs.

Many years ago actually I tried replacing QObject with GVariant.  I'm
pretty sure the code for that experiment is lost but it took me just a
couple days so it could be redone.  The only issue was that QObjects are
mutable so some instances of QString had to be replaced with GString.

(A small part of it was merged as commit
9bada8971173345ceb37ed1a47b00a01a4dd48cf for unrelated reasons).

> It is a shame we didn't just use DBus back in the day as that's a well
> specified, simple protocol that would have done everything we needed,
> including the ability to actually represent integers reliably. We
> would be able to trivially talk to QEMU from any programming language,
> and use common DBus code-generator tools instead of writing code
> generators ourselves.

Not really, DBus doesn't provide the extensibility that we get from
optional arguments in commands and optional fields in structs.  Again,
we may discuss the QMP protocol itself, but JSON *was chosen for a reason*.

Paolo

Re: [PATCH] schemas: Add vim modeline

Posted by Markus Armbruster 5 years, 6 months ago

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 31/07/20 17:44, Daniel P. Berrangé wrote:
>> I'd consider the runtime protocol separately. In terms of what's on the
>> wire, we use genuine JSON format. The runtime problem is simply that JSON
>> standard is useless when it comes to integers, leaving behaviour undefined
>> in the standard if you exceed 53 bits of precision. So there's no way to
>> reliably parse unsigned 64-bit integers. Given that QEMU needs to pass
>> uint64 values, JSON was simply the wrong choice of format for QMP.
>
> JSON's 53-bit precision was not part of RFC4627, which was the JSON
> specification in 2010.  They say hindsight is 20/20, but referring to
> RFC7159 which would come 4 years later is rewriting history, not hindsight.

RFC 4627's original sin:

    4.  Parsers
    [...]
    An implementation may set limits on the size of texts that it
    accepts.  An implementation may set limits on the maximum depth of
    nesting.  An implementation may set limits on the range of numbers.
    An implementation may set limits on the length and character contents
    of strings.

Giving license to set limits is not a sin, absence of minimal limits is.

Became an actual problem only when many implementations followed the
early JavaScript implementation(s), which had inherited JavaScript's
quick-and-dirty approach to numerical towers: double should be enough
for anybody.

>> There's a 3rd aspect which is our source code that deals with JSON, where
>> we defined some JSON extensions to make it easier for C code to construct
>> JSON documents for sending over the wire. Back when we did this, it was a
>> reasonably good idea as no obvious alternative existed for this problem.
>> Today, I would just suggest using GLib's  GVariant feature, which solves
>> the same problem for GLib's DBus APIs.
>
> Many years ago actually I tried replacing QObject with GVariant.  I'm
> pretty sure the code for that experiment is lost but it took me just a
> couple days so it could be redone.  The only issue was that QObjects are
> mutable so some instances of QString had to be replaced with GString.
>
> (A small part of it was merged as commit
> 9bada8971173345ceb37ed1a47b00a01a4dd48cf for unrelated reasons).
>
>> It is a shame we didn't just use DBus back in the day as that's a well
>> specified, simple protocol that would have done everything we needed,
>> including the ability to actually represent integers reliably. We
>> would be able to trivially talk to QEMU from any programming language,
>> and use common DBus code-generator tools instead of writing code
>> generators ourselves.
>
> Not really, DBus doesn't provide the extensibility that we get from
> optional arguments in commands and optional fields in structs.

Enables compatible evolution of interfaces, which has been a massive win
for us.

>                                                                 Again,
> we may discuss the QMP protocol itself,

We can discuss everything, but please not everything in the same thread.
This one is / tries to be about the QAPI schema language.

>                                         but JSON *was chosen for a reason*.

In our field, dissing prior technical choices without making an effort
to understand the tradeoffs that led to them is an ancient tradition :)

Re: [PATCH] schemas: Add vim modeline

Posted by Daniel P. Berrangé 5 years, 6 months ago

On Mon, Aug 03, 2020 at 10:41:22AM +0200, Paolo Bonzini wrote:
> On 31/07/20 17:44, Daniel P. Berrangé wrote:
> > I'd consider the runtime protocol separately. In terms of what's on the
> > wire, we use genuine JSON format. The runtime problem is simply that JSON
> > standard is useless when it comes to integers, leaving behaviour undefined
> > in the standard if you exceed 53 bits of precision. So there's no way to
> > reliably parse unsigned 64-bit integers. Given that QEMU needs to pass
> > uint64 values, JSON was simply the wrong choice of format for QMP.
> 
> JSON's 53-bit precision was not part of RFC4627, which was the JSON
> specification in 2010.  They say hindsight is 20/20, but referring to
> RFC7159 which would come 4 years later is rewriting history, not hindsight.

I wasn't refering to RFC7159. The problem of undefined integer precision
with JSON came up right at the very start when QMP was first designed and
implemented, and has come up again periodically ever since then. libvirt
needed to do workarounds right at the start in 2009, in order to fully
handle signed/unsigned 64-bit integers with QMP.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH] schemas: Add vim modeline

Posted by Paolo Bonzini 5 years, 6 months ago

On 03/08/20 13:36, Daniel P. Berrangé wrote:
>>> Given that QEMU needs to pass
>>> uint64 values, JSON was simply the wrong choice of format for QMP.
>
> I wasn't refering to RFC7159. The problem of undefined integer precision
> with JSON came up right at the very start when QMP was first designed and
> implemented, and has come up again periodically ever since then. libvirt
> needed to do workarounds right at the start in 2009, in order to fully
> handle signed/unsigned 64-bit integers with QMP.

I assume the workaround you refer to is to store the number as a string
and converting it lazily to either an integer or a floating-point type
in whoever uses the JSON API.  It may not be pretty but probably it
would have been the same for any text-based, schema-less protocol.  For
example, it didn't require writing your own parser.

It could be avoided by using a schema in Libvirt, just like QEMU has no
problem with it on the other side; it's just a different design choice
with different trade-offs, I don't think it's enough of an issue to
declare JSON "the wrong choice of format for QMP".

Paolo

Re: [PATCH] schemas: Add vim modeline

Posted by Daniel P. Berrangé 5 years, 6 months ago

On Mon, Aug 03, 2020 at 02:16:19PM +0200, Paolo Bonzini wrote:
> On 03/08/20 13:36, Daniel P. Berrangé wrote:
> >>> Given that QEMU needs to pass
> >>> uint64 values, JSON was simply the wrong choice of format for QMP.
> >
> > I wasn't refering to RFC7159. The problem of undefined integer precision
> > with JSON came up right at the very start when QMP was first designed and
> > implemented, and has come up again periodically ever since then. libvirt
> > needed to do workarounds right at the start in 2009, in order to fully
> > handle signed/unsigned 64-bit integers with QMP.
> 
> I assume the workaround you refer to is to store the number as a string
> and converting it lazily to either an integer or a floating-point type
> in whoever uses the JSON API.  It may not be pretty but probably it
> would have been the same for any text-based, schema-less protocol.  For
> example, it didn't require writing your own parser.

Yes, we get the raw values as a string, but not all parsers for C
allow you to do this.  We'd really love to move off YAJL for JSON
parsing, but the most viable alternatives don't have a way to let
you get the string before they parse it as an integer and report
errors.

> It could be avoided by using a schema in Libvirt, just like QEMU has no
> problem with it on the other side; it's just a different design choice
> with different trade-offs, I don't think it's enough of an issue to
> declare JSON "the wrong choice of format for QMP".

The schema doesn't help - the problem is many JSON parsers don't allow
use of full uint64 values when parsing - alot will simply report an
error for anything bigger than LLONG_MAX and offer no workaround.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH] schemas: Add vim modeline

Posted by Paolo Bonzini 5 years, 6 months ago

On 03/08/20 14:23, Daniel P. Berrangé wrote:
> We'd really love to move off YAJL for JSON parsing

What are the issues with YAJL?

>> It could be avoided by using a schema in Libvirt, just like QEMU has no
>> problem with it on the other side; it's just a different design choice
>> with different trade-offs, I don't think it's enough of an issue to
>> declare JSON "the wrong choice of format for QMP".
>
> The schema doesn't help - the problem is many JSON parsers don't allow
> use of full uint64 values when parsing - alot will simply report an
> error for anything bigger than LLONG_MAX and offer no workaround.

Sure, but this problem is not at all unique to QEMU and JSON parsers
have a way to support large integers in pretty much every language
(including Javascript).  In some of them like Python or Ruby it's even
the default behavior.

Paolo

Re: [PATCH] schemas: Add vim modeline

Posted by Daniel P. Berrangé 5 years, 6 months ago

On Mon, Aug 03, 2020 at 02:33:36PM +0200, Paolo Bonzini wrote:
> On 03/08/20 14:23, Daniel P. Berrangé wrote:
> > We'd really love to move off YAJL for JSON parsing
> 
> What are the issues with YAJL?

It is abandonware for more than 5 years now, with an ever growing list
of bug reports and pull requests being ignored. This isn't good when
the JSON parser is a security critical part of the interface between
libvirt and QEMU.

We tried to switch to Jannsson but that raises hard errors for
values above LLONG_MAX and has no backdoor to workaround this.

There's various other C JSON parsers which have their own flaws
such as C namespace pollution in headers, or same integer parsing
problems.

> >> It could be avoided by using a schema in Libvirt, just like QEMU has no
> >> problem with it on the other side; it's just a different design choice
> >> with different trade-offs, I don't think it's enough of an issue to
> >> declare JSON "the wrong choice of format for QMP".
> >
> > The schema doesn't help - the problem is many JSON parsers don't allow
> > use of full uint64 values when parsing - alot will simply report an
> > error for anything bigger than LLONG_MAX and offer no workaround.
> 
> Sure, but this problem is not at all unique to QEMU and JSON parsers
> have a way to support large integers in pretty much every language
> (including Javascript).  In some of them like Python or Ruby it's even
> the default behavior.

Some parsers can do it, some cannot, and in some it isn't obvious that
you are loosing precision behind the scenes due to conversion to float.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH] schemas: Add vim modeline

Posted by Markus Armbruster 5 years, 6 months ago

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Mon, Aug 03, 2020 at 02:16:19PM +0200, Paolo Bonzini wrote:
>> On 03/08/20 13:36, Daniel P. Berrangé wrote:
>> >>> Given that QEMU needs to pass
>> >>> uint64 values, JSON was simply the wrong choice of format for QMP.
>> >
>> > I wasn't refering to RFC7159. The problem of undefined integer precision
>> > with JSON came up right at the very start when QMP was first designed and
>> > implemented, and has come up again periodically ever since then. libvirt
>> > needed to do workarounds right at the start in 2009, in order to fully
>> > handle signed/unsigned 64-bit integers with QMP.
>> 
>> I assume the workaround you refer to is to store the number as a string
>> and converting it lazily to either an integer or a floating-point type
>> in whoever uses the JSON API.  It may not be pretty but probably it
>> would have been the same for any text-based, schema-less protocol.  For
>> example, it didn't require writing your own parser.
>
> Yes, we get the raw values as a string, but not all parsers for C
> allow you to do this.  We'd really love to move off YAJL for JSON
> parsing, but the most viable alternatives don't have a way to let
> you get the string before they parse it as an integer and report
> errors.

You know, if I had to write a *general purpose* parser for a language
that obviously supports numbers of arbitrary precision (but permits
implementations to support less), then I'd try *hard* to stay as general
as practical.  At the very least, provide a way to retrieve number
tokens as strings, so the decision to limit precision devolves to the
client.  Also, GMP exists.

The fact that "most viable alternatives" to YAJL are unable to support
uint64_t tempts me to condemn the whole lot as toys :)

I suspect the simplicity of JSON not only lowers the barrier for toys,
but also dampens the demand for general purpose non-toys.  Writing and
maintaining yet another JSON parser is quite possibly easier than
getting one of the toys fixed up properly.

[...]

Re: [PATCH] schemas: Add vim modeline

Posted by Nir Soffer 5 years, 6 months ago

On Mon, Aug 3, 2020 at 3:23 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Mon, Aug 03, 2020 at 02:16:19PM +0200, Paolo Bonzini wrote:
> > On 03/08/20 13:36, Daniel P. Berrangé wrote:
> > >>> Given that QEMU needs to pass
> > >>> uint64 values, JSON was simply the wrong choice of format for QMP.
> > >
> > > I wasn't refering to RFC7159. The problem of undefined integer precision
> > > with JSON came up right at the very start when QMP was first designed and
> > > implemented, and has come up again periodically ever since then. libvirt
> > > needed to do workarounds right at the start in 2009, in order to fully
> > > handle signed/unsigned 64-bit integers with QMP.
> >
> > I assume the workaround you refer to is to store the number as a string
> > and converting it lazily to either an integer or a floating-point type
> > in whoever uses the JSON API.  It may not be pretty but probably it
> > would have been the same for any text-based, schema-less protocol.  For
> > example, it didn't require writing your own parser.
>
> Yes, we get the raw values as a string, but not all parsers for C
> allow you to do this.  We'd really love to move off YAJL for JSON
> parsing, but the most viable alternatives don't have a way to let
> you get the string before they parse it as an integer and report
> errors.
>
> > It could be avoided by using a schema in Libvirt, just like QEMU has no
> > problem with it on the other side; it's just a different design choice
> > with different trade-offs, I don't think it's enough of an issue to
> > declare JSON "the wrong choice of format for QMP".
>
> The schema doesn't help - the problem is many JSON parsers don't allow
> use of full uint64 values when parsing - alot will simply report an
> error for anything bigger than LLONG_MAX and offer no workaround.

Jansson is considering this:
https://github.com/akheron/jansson/issues/69

The issue is 4 years old, but there is some activity lately. I guess
someone who wants this needs to invest the time.

>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>
>

Re: [PATCH] schemas: Add vim modeline

Posted by Paolo Bonzini 5 years, 6 months ago

On 31/07/20 17:26, John Snow wrote:
> I saw the critique that we still use JSON-ish for the runtime QMP
> protocol, and moving the QAPI IDL to a standard wouldn't remove all
> instances of a custom format from our tree.

Sorry, but "still using JSON" is not a critique that makes any sense.

99% of the websites you use daily use JSON as their RPC
frontend-to-backend language; OpenAPI is essentially JSON over HTTP.
There must be something good in JSON.

Whenever you hear a complaint about "using JSON", it's actually a
complaint about bindings for your language, which can be a sensible
critique: gRPC is essentially {protobuf,FlatBuffers} over HTTP/2 plus a
boatload of language mappings.  Unfortunately C is not one of those
mappings.

Paolo

Re: [PATCH] schemas: Add vim modeline

Posted by Dr. David Alan Gilbert 5 years, 6 months ago

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> On 31/07/20 17:26, John Snow wrote:
> > I saw the critique that we still use JSON-ish for the runtime QMP
> > protocol, and moving the QAPI IDL to a standard wouldn't remove all
> > instances of a custom format from our tree.
> 
> Sorry, but "still using JSON" is not a critique that makes any sense.
> 
> 99% of the websites you use daily use JSON as their RPC
> frontend-to-backend language; OpenAPI is essentially JSON over HTTP.
> There must be something good in JSON.

If there is, I've not found it:
    a) It's integer definitions are a mess
    b) You can't require ordering
    c) No two parsers agree with each other

and those are the only ones I've hit in my very limited JSON wrangling.

It's possible attractions are that no one has anything widely used
that's better, and it's easy to use from JS.

But it seems popular to try and find replacements; e.g. Amazon Ion that
landed a few weeks ago (like JSON but...not quite and with a binary
format optionally).

Dave

> Whenever you hear a complaint about "using JSON", it's actually a
> complaint about bindings for your language, which can be a sensible
> critique: gRPC is essentially {protobuf,FlatBuffers} over HTTP/2 plus a
> boatload of language mappings.  Unfortunately C is not one of those
> mappings.
> 
> Paolo
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [PATCH] schemas: Add vim modeline

Posted by Daniel P. Berrangé 5 years, 6 months ago

On Fri, Jul 31, 2020 at 06:35:23PM +0200, Paolo Bonzini wrote:
> On 31/07/20 17:26, John Snow wrote:
> > I saw the critique that we still use JSON-ish for the runtime QMP
> > protocol, and moving the QAPI IDL to a standard wouldn't remove all
> > instances of a custom format from our tree.
> 
> Sorry, but "still using JSON" is not a critique that makes any sense.
> 
> 99% of the websites you use daily use JSON as their RPC
> frontend-to-backend language; OpenAPI is essentially JSON over HTTP.
> There must be something good in JSON.

I think it is more about the way we're using JSON to define QMP.

We take a raw socket (or really arbitrary reliable stream no matter
what its transport), and are defining the full RPC protocol. We define
the initial QMP handshake, the separation of commands/replies/events
and of course the data format for the content sent. We've invented
the whole stack above the raw sockets layer.

In the case of typical REST APIs, HTTP provides the core protocol
with handshake, and separation of commands/replies. The application
is merely declaring JSON to be the data format for the messages.

So in the case of REST APIS with JSON, you can use any standard
HTTP / REST client library, for the protocol part and any standard
JSON library for the data (de)serialization.

Talking to QEMU you get to build your own client from first
principals. QMP isn't especially complicated, so this isn't a
massive burden, but it doesn't exactly give a good first impression
either. It also means QMP isn't easily extensible. eg if we used
HTTP as our base, then we'd get remote TLS support for free from
whatever library we used. We could do TLS with QMP, but again we
get to build the pieces for this on both client/server side.
Using a standard like HTTP would open door to other interesting
ideas, like putting a HTTP proxy on a host, so you can have 1
HTTP server fronting all 1000 VMs on the host, meaning only need
a single port instead of 1000 ports in the firewall.  Again you
can build such an architecture on top of QMP but its all wheel
reinvention.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH] schemas: Add vim modeline

Posted by Paolo Bonzini 5 years, 6 months ago

On 31/07/20 19:20, Daniel P. Berrangé wrote:
> It also means QMP isn't easily extensible. eg if we used
> HTTP as our base, then we'd get remote TLS support for free from
> whatever library we used.

... and we would lose events, unless we do something with HTTP/2 and
streaming responses.  We would also have to pass the TLS certificates to
whatever library we used (which might even be using openssl instead of
gnutls).  So it's not that simple, and that's why I'm hesitant to see
things as a universal improvement without seeing the code.

Paolo

> We could do TLS with QMP, but again we
> get to build the pieces for this on both client/server side.
> Using a standard like HTTP would open door to other interesting
> ideas, like putting a HTTP proxy on a host, so you can have 1
> HTTP server fronting all 1000 VMs on the host, meaning only need
> a single port instead of 1000 ports in the firewall.  Again you
> can build such an architecture on top of QMP but its all wheel
> reinvention.

Re: [PATCH] schemas: Add vim modeline

Posted by Daniel P. Berrangé 5 years, 6 months ago

On Fri, Jul 31, 2020 at 07:47:01PM +0200, Paolo Bonzini wrote:
> On 31/07/20 19:20, Daniel P. Berrangé wrote:
> > It also means QMP isn't easily extensible. eg if we used
> > HTTP as our base, then we'd get remote TLS support for free from
> > whatever library we used.
> 
> ... and we would lose events, unless we do something with HTTP/2 and
> streaming responses.  We would also have to pass the TLS certificates to
> whatever library we used (which might even be using openssl instead of
> gnutls).  So it's not that simple, and that's why I'm hesitant to see
> things as a universal improvement without seeing the code.

I didn't mean to suggest that QEMU should use HTTP, I was just
comparing QMP use of JSON vs the common webservices using RST
with JSON.

HTTP/2 streaming is not required for async events though, there's a
variety of ways to do that with HTTP/1.1.

Open a HTTP connection and issue GET for /events, and the server
will then simply not respond until it has an event ready. Once a
event is received, that request is complete. With connection reuse
enabled though, another GET request can be made to wait for the
next event without the overhead of re-establishing the connection.

Alternatively a single HTTP request can be used, but with the response
using chunked event so that it can send back an arbitrary unbounded
amount of data spread over time. The client can receive and process
this data on the fly without waiting for the request to complete.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH] schemas: Add vim modeline

Posted by John Snow 5 years, 6 months ago

On 7/31/20 12:35 PM, Paolo Bonzini wrote:
> On 31/07/20 17:26, John Snow wrote:
>> I saw the critique that we still use JSON-ish for the runtime QMP
>> protocol, and moving the QAPI IDL to a standard wouldn't remove all
>> instances of a custom format from our tree.
> 
> Sorry, but "still using JSON" is not a critique that makes any sense.
> 
> 99% of the websites you use daily use JSON as their RPC
> frontend-to-backend language; OpenAPI is essentially JSON over HTTP.
> There must be something good in JSON.
> 
> Whenever you hear a complaint about "using JSON", it's actually a
> complaint about bindings for your language, which can be a sensible
> critique: gRPC is essentially {protobuf,FlatBuffers} over HTTP/2 plus a
> boatload of language mappings.  Unfortunately C is not one of those
> mappings.
> 
> Paolo
> 

You have misunderstood me.

The critique I am relaying, but not raising, is that we already use a 
custom JSON parser in two or more places, and so replacing one instance 
of this with a new format actually complicates QEMU instead of 
simplifies it.

I disagree with this concern on the premise that moving one non-standard 
JSON usage to a standard usage is a win because it reduces the total 
number of instances of proprietary formats.

Further, if we remove ALL instances of proprietary JSON, then we're back 
to the same level of complexity internally, but with a reduced level of 
complexity for outside observers.

This is why I said "JSON-ish", not "JSON".

--js

Re: [PATCH] schemas: Add vim modeline

Posted by Paolo Bonzini 5 years, 6 months ago

On 31/07/20 19:53, John Snow wrote:
> You have misunderstood me.
> 
> The critique I am relaying, but not raising, is that we already use a
> custom JSON parser in two or more places, and so replacing one instance
> of this with a new format actually complicates QEMU instead of
> simplifies it.
> 
> I disagree with this concern on the premise that moving one non-standard
> JSON usage to a standard usage is a win because it reduces the total
> number of instances of proprietary formats.
> 
> Further, if we remove ALL instances of proprietary JSON, then we're back
> to the same level of complexity internally, but with a reduced level of
> complexity for outside observers.

I think we should first build a consensus on using "real" JSON (plus
Javascript comments) for the schema, which is easy, and then somebody
can try his hands at removing the custom JSON parser.

I wouldn't conflate the QMP and schema parsers.  For example, QMP does
not need comments and schemas don't need either bigints or printf-style
% interpolation.

Paolo

Re: [PATCH] schemas: Add vim modeline

Posted by Markus Armbruster 5 years, 6 months ago

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 31/07/20 19:53, John Snow wrote:
>> You have misunderstood me.
>> 
>> The critique I am relaying, but not raising, is that we already use a
>> custom JSON parser in two or more places, and so replacing one instance
>> of this with a new format actually complicates QEMU instead of
>> simplifies it.
>> 
>> I disagree with this concern on the premise that moving one non-standard
>> JSON usage to a standard usage is a win because it reduces the total
>> number of instances of proprietary formats.
>> 
>> Further, if we remove ALL instances of proprietary JSON, then we're back
>> to the same level of complexity internally, but with a reduced level of
>> complexity for outside observers.
>
> I think we should first build a consensus on using "real" JSON (plus
> Javascript comments) for the schema, which is easy, and then somebody
> can try his hands at removing the custom JSON parser.
>
> I wouldn't conflate the QMP and schema parsers.  For example, QMP does
> not need comments and schemas don't need either bigints or printf-style
> % interpolation.

Seconded.

QAPI schema syntax and the QMP syntax are totally separate.  Heck, the
whole *languages* are.  They happen to use vaguely similar concrete
syntax, a bit like C, Java and JavaScript do.  That's all.

Let's keep this thread focused on the QAPI schema language.

cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Paolo Bonzini 5 years, 6 months ago

On 31/07/20 17:07, Daniel P. Berrangé wrote:
> The QAPI JSON-but-not file format is a case where I think we should just
> adopt a standard file format no matter what. A conversion will have some
> short term work, but this is really simple data to deal with and the code
> involved is nicely self contained. Again I'm not saying QAPI maintainers
> must do it, just put the idea out there as a piece of work that would
> be welcomed if someone is interested in working ont.

The main issues with JSON-but-not in QEMU are:

- the 64-bit integers, which does not apply to the QAPI schema though

- the comments, which are a common extension to JSON (see JSON5, NodeJS
config files, json_verify's -c option, etc.) so I find it quite surprising
that no off-the-shelf Python component can parse JSON + comments

- the single-quote strings, which are not particularly useful in QAPI schema

If we changed the single-quote strings to double-quote, jsonc.vim
(https://github.com/neoclide/jsonc.vim) seems to support JSON + comments.
In Emacs you'd probably add an epilogue like

(defconst json-mode-comments-re (rx (group "#" (zero-or-more nonl) line-end)))
(push (list json-mode-comments-re 1 font-lock-comment-face) json-font-lock-keywords-1)

Did I miss anything?

Besides that, why are we using Python and not JavaScript in the mode line?

> Another example would be elimination of anything in QEMU code that is
> duplicating functionality in GLib, even if there zero functional
> difference between the two impls.

Would you consider intrusive lists vs. GList/GSList to be duplicating 
functionality?  I don't think we have that many duplicates at this 
point.

We have qemu_strto*, but unfortunately the GLib function g_ascii_strtoll
does nothing to fix the awful design of the C standard strto* functions
(especially the overloading of the return value for error and result).
If there are cases that are clear cut against adopting the GLib version,
I think patches would be easily accepted.

> Another example would be adopting a standard code style and using a
> tool like clang-format to enforce this for entire of existing code
> base and future contributions and throwing away our checkpatch.pl
> which nearly everyone is scared of touching as it is Perl code.

Checkpatch does much more than that, though the scary part is indeed the 
one that enfoces coding style.  I wouldn't have a problem with using
clang-format and calling it a day.

Paolo

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Daniel P. Berrangé 5 years, 6 months ago

On Fri, Jul 31, 2020 at 06:28:06PM +0200, Paolo Bonzini wrote:
> On 31/07/20 17:07, Daniel P. Berrangé wrote:
> > The QAPI JSON-but-not file format is a case where I think we should just
> > adopt a standard file format no matter what. A conversion will have some
> > short term work, but this is really simple data to deal with and the code
> > involved is nicely self contained. Again I'm not saying QAPI maintainers
> > must do it, just put the idea out there as a piece of work that would
> > be welcomed if someone is interested in working ont.
> 
> The main issues with JSON-but-not in QEMU are:
> 
> - the 64-bit integers, which does not apply to the QAPI schema though
> 
> - the comments, which are a common extension to JSON (see JSON5, NodeJS
> config files, json_verify's -c option, etc.) so I find it quite surprising
> that no off-the-shelf Python component can parse JSON + comments
> 
> - the single-quote strings, which are not particularly useful in QAPI schema

NB our files are not JSON documents, they are a concatenation of a list
of JSON documents. 

> 
> If we changed the single-quote strings to double-quote, jsonc.vim
> (https://github.com/neoclide/jsonc.vim) seems to support JSON + comments.
> In Emacs you'd probably add an epilogue like
> 
> (defconst json-mode-comments-re (rx (group "#" (zero-or-more nonl) line-end)))
> (push (list json-mode-comments-re 1 font-lock-comment-face) json-font-lock-keywords-1)
> 
> Did I miss anything?
> 
> Besides that, why are we using Python and not JavaScript in the mode line?

If you use javascript mode, then emacs will highlight all the javascript
language keywords in the comments because it doesn't recognise "#" as a
comment - you need // or /* .. */ for comments in JavaScript

We could of course convert to genuinely use Javascript comment syntax
if we want to consider these files to be JavaScript code instead of
JSON data.

> > Another example would be elimination of anything in QEMU code that is
> > duplicating functionality in GLib, even if there zero functional
> > difference between the two impls.
> 
> Would you consider intrusive lists vs. GList/GSList to be duplicating 
> functionality?  I don't think we have that many duplicates at this 
> point.

Yep, I'd have a preference for use of GList / GSList. Anything which
uses G_DEFINE_AUTOPTR_CLEANUP_FUNC() automatically gets support for
auto cleanup of GList, GSList and GQueue, deep free'ing each element.

https://developer.gnome.org/glib/stable/glib-Miscellaneous-Macros.html#g-autolist

Some of the io/ subsystem could start making more use of GIO APIs. We
didn't use GIO originally as our min-GLib version at the time meant
we couldn't use the functionality we needed.

I'd also suggest replacing  fprintf/printf/sprintf with the g*printf*
equivalents. The benefit here is that GLib guarantees that its variants
work the same way on Windows as on Linux. They pulled in the GNULIB
printf impl to replace Microsoft's broken impl.

There's various silly little things like ARRAY_SIZE vs G_N_ELEMENTS,
and __attribute__ macros - QEMU_NORETURN vs G_GNUC_NORETURN.
QEMU_BUILD_BUG_ON vs G_STATIC_ASSERT

There's QEMU's threads related wrappers. I vaguely there were a couple of
cases where GLib's threads APIs didn't do what we needed. Even if we can't
eliminate all our threads APIs, I expect we can cull alot.

There's util/buffer.{c.h} that can be replaced by GString or GArray.

There's some places where we have misc utility functions that we should
just contribute back to GLib upstream. eg our util/base64.c which offers
improved error checking on base64 decoding. GLib would benefit from this,
and while it won't help qemu immediately, in the long term it will.

> We have qemu_strto*, but unfortunately the GLib function g_ascii_strtoll
> does nothing to fix the awful design of the C standard strto* functions
> (especially the overloading of the return value for error and result).
> If there are cases that are clear cut against adopting the GLib version,
> I think patches would be easily accepted.

The g_ascii_strtoll aren't a drop in replacement for qemu_strto*, as they
have intentionally different semantics from each other, so not an example
I was considering here.

The GLib functions are explicitly always using C-locale, while QEMU's
current places are honouring current locale.

There are some places in QEMU that really should be forcing C-locale,
and so g_ascii_strtoll would actually be a bug fix in those cases.
Other places simply have to stick with what they're currently using
to honour the locale.

> > Another example would be adopting a standard code style and using a
> > tool like clang-format to enforce this for entire of existing code
> > base and future contributions and throwing away our checkpatch.pl
> > which nearly everyone is scared of touching as it is Perl code.
> 
> Checkpatch does much more than that, though the scary part is indeed the 
> one that enfoces coding style.  I wouldn't have a problem with using
> clang-format and calling it a day.

If there are things missing that we consider important, a long term
better strategy would be to use the Python binding to libclang to
detect the problem, instead of trying to parse C code with Perl and
regexes.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Paolo Bonzini 5 years, 6 months ago

On 31/07/20 19:05, Daniel P. Berrangé wrote:
> NB our files are not JSON documents, they are a concatenation of a list
> of JSON documents. 

This is not something that editors generally have problems with.

> If you use javascript mode, then emacs will highlight all the javascript
> language keywords in the comments because it doesn't recognise "#" as a
> comment - you need // or /* .. */ for comments in JavaScript

Oops, good point.  So yeah, using // comments and double quotes would
fix editor problems and be much more kind to downstreams.

> I'd also suggest replacing  fprintf/printf/sprintf with the g*printf*
> equivalents. The benefit here is that GLib guarantees that its variants
> work the same way on Windows as on Linux. They pulled in the GNULIB
> printf impl to replace Microsoft's broken impl.

That should not be an issue to replace.  Most of our printf uses anyway
are wrapped within error_report, or they are g_strdup_printf which we're
already preferring to the messy libc asprintf for obvious reasons.

> There's various silly little things like ARRAY_SIZE vs G_N_ELEMENTS,
> and __attribute__ macros - QEMU_NORETURN vs G_GNUC_NORETURN.
> QEMU_BUILD_BUG_ON vs G_STATIC_ASSERT
> There's util/buffer.{c.h} that can be replaced by GString or GArray.

Some of these are no brainers too.  QEMU_BUILD_BUG_ON reverses the
logic; I'm not sure if Coccinelle can do De Morgan laws.

>>> Another example would be adopting a standard code style and using a
>>> tool like clang-format to enforce this for entire of existing code
>>> base and future contributions and throwing away our checkpatch.pl
>>> which nearly everyone is scared of touching as it is Perl code.
>> Checkpatch does much more than that, though the scary part is indeed the 
>> one that enfoces coding style.  I wouldn't have a problem with using
>> clang-format and calling it a day.
> If there are things missing that we consider important, a long term
> better strategy would be to use the Python binding to libclang to
> detect the problem, instead of trying to parse C code with Perl and
> regexes.

Most of it is simply "use this function instead of this one" or "place a
comment to explain why you're using this".  The main feature of
checkpatch.pl however is that it works on patches, not just files, but
still there would be a substantial advantage in employing clang-format.

Paolo

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Daniel P. Berrangé 5 years, 6 months ago

On Fri, Jul 31, 2020 at 07:16:54PM +0200, Paolo Bonzini wrote:

> >>> Another example would be adopting a standard code style and using a
> >>> tool like clang-format to enforce this for entire of existing code
> >>> base and future contributions and throwing away our checkpatch.pl
> >>> which nearly everyone is scared of touching as it is Perl code.
> >> Checkpatch does much more than that, though the scary part is indeed the 
> >> one that enfoces coding style.  I wouldn't have a problem with using
> >> clang-format and calling it a day.
> > If there are things missing that we consider important, a long term
> > better strategy would be to use the Python binding to libclang to
> > detect the problem, instead of trying to parse C code with Perl and
> > regexes.
> 
> Most of it is simply "use this function instead of this one" or "place a
> comment to explain why you're using this".  The main feature of
> checkpatch.pl however is that it works on patches, not just files, but
> still there would be a substantial advantage in employing clang-format.

You say "main feature", I say "biggest flaw" ;-P

Doing checks on patches is the single worst thing about the way
we do code style validation, at it means the bulk of committed code
is never in compliance. The need to check patches is precisely because
the committed code is unclean and so can't be checked without raising
pages of problems.

Once clang-format forces the entire codebase to be in compliance then
there is (almost) no reason to check patches at all. Simply apply the
patch and check the resulting tree.  You do still want a check on the
patch to validate Signed-off-by, but that can be done as a standalone
script eg in libvirt when using GitLab CI for validating patch series,
we have this as a job:

https://gitlab.com/libvirt/libvirt-ci/-/blob/master/containers/check-dco/check-dco.py

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Paolo Bonzini 5 years, 6 months ago

On 31/07/20 19:27, Daniel P. Berrangé wrote:
> You say "main feature", I say "biggest flaw" ;-P
> 
> Doing checks on patches is the single worst thing about the way
> we do code style validation, at it means the bulk of committed code
> is never in compliance. The need to check patches is precisely because
> the committed code is unclean and so can't be checked without raising
> pages of problems.

This is true for code formatting but not for style warnings.  A stupid
example is that you need to use strtol to implement the recommended
replacement qemu_strtol. We could invent our own "allow-this" lint
syntax but it would be a much bigger job.

Paolo

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Daniel P. Berrangé 5 years, 6 months ago

On Fri, Jul 31, 2020 at 07:42:52PM +0200, Paolo Bonzini wrote:
> On 31/07/20 19:27, Daniel P. Berrangé wrote:
> > You say "main feature", I say "biggest flaw" ;-P
> > 
> > Doing checks on patches is the single worst thing about the way
> > we do code style validation, at it means the bulk of committed code
> > is never in compliance. The need to check patches is precisely because
> > the committed code is unclean and so can't be checked without raising
> > pages of problems.
> 
> This is true for code formatting but not for style warnings.  A stupid
> example is that you need to use strtol to implement the recommended
> replacement qemu_strtol. We could invent our own "allow-this" lint
> syntax but it would be a much bigger job.

Yes, I assumed use of a mechanism for identifying exceptions, as that
is pretty standard practice for any tool that's doing whole source
tree validation. This would still be better than what we have today
because developers reading or copying the can actually then see 
whether the style violation was intentionale, as opposed to historical
accident, and thus less likely to submit patches with style violations.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Markus Armbruster 5 years, 6 months ago

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 31/07/20 17:07, Daniel P. Berrangé wrote:
>> The QAPI JSON-but-not file format is a case where I think we should just
>> adopt a standard file format no matter what. A conversion will have some
>> short term work, but this is really simple data to deal with and the code
>> involved is nicely self contained. Again I'm not saying QAPI maintainers
>> must do it, just put the idea out there as a piece of work that would
>> be welcomed if someone is interested in working ont.
>
> The main issues with JSON-but-not in QEMU are:
>
> - the 64-bit integers, which does not apply to the QAPI schema though

QMP only.  The QAPI schema does not use JSON numbers at all so far.

> - the comments, which are a common extension to JSON (see JSON5, NodeJS
> config files, json_verify's -c option, etc.) so I find it quite surprising
> that no off-the-shelf Python component can parse JSON + comments

QAPI schema only.  QMP does not support comments.

> - the single-quote strings, which are not particularly useful in QAPI schema

Every single string in the QAPI schema uses them, though.

I have no idea why Anthony put them in the QAPI schema language.

We could remove them from the QAPI schema language.  Flag day, and
git-blame becomes pretty much useless for a couple of years.

Removing them from QMP would be painless for QEMU itself, but could
upset QMP clients that (unwisely) use them.

> If we changed the single-quote strings to double-quote, jsonc.vim
> (https://github.com/neoclide/jsonc.vim) seems to support JSON + comments.
> In Emacs you'd probably add an epilogue like
>
> (defconst json-mode-comments-re (rx (group "#" (zero-or-more nonl) line-end)))
> (push (list json-mode-comments-re 1 font-lock-comment-face) json-font-lock-keywords-1)
>
> Did I miss anything?

Let me reiterate that parsing the lower layer of the QAPI schema
language is the trivial part (because we made it trivial, accepting a
considerable hit to ergonomics).  For basic editor support, parsing the
lower layer is all you need.  But to truly work with the schema, you
need to grok the (less than trivial!) upper layer.  Making the lower
layer slightly more trivial is not going to help with that.

We can of course indulge in buyer's remorse on our choice to develop our
own schema language for QAPI.  But that's separate discussion.

As long as we have our own QAPI schema language, we should use a single
frontend for working with it.

> Besides that, why are we using Python and not JavaScript in the mode line?

Falls apart for # comments.  JavaScript uses // and /* */.

[...]

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Paolo Bonzini 5 years, 6 months ago

On 03/08/20 10:18, Markus Armbruster wrote:
>> - the single-quote strings, which are not particularly useful in QAPI schema
> Every single string in the QAPI schema uses them, though.
> 
> I have no idea why Anthony put them in the QAPI schema language.
> 
> We could remove them from the QAPI schema language.  Flag day, and
> git-blame becomes pretty much useless for a couple of years.

Is that a nack or a "whatever"?

Paolo

> Removing them from QMP would be painless for QEMU itself, but could
> upset QMP clients that (unwisely) use them.
>

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Markus Armbruster 5 years, 6 months ago

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 03/08/20 10:18, Markus Armbruster wrote:
>>> - the single-quote strings, which are not particularly useful in QAPI schema
>> Every single string in the QAPI schema uses them, though.
>> 
>> I have no idea why Anthony put them in the QAPI schema language.
>> 
>> We could remove them from the QAPI schema language.  Flag day, and
>> git-blame becomes pretty much useless for a couple of years.
>
> Is that a nack or a "whatever"?

It's "is this really worth the trouble?"  I guess that's halfway between
NAK and whatever, ready to be moved in either direction by arguments :)

[...]

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Paolo Bonzini 5 years, 6 months ago

On 03/08/20 13:28, Markus Armbruster wrote:
>>> We could remove them from the QAPI schema language.  Flag day, and
>>> git-blame becomes pretty much useless for a couple of years.
>> Is that a nack or a "whatever"?
> It's "is this really worth the trouble?"  I guess that's halfway between
> NAK and whatever, ready to be moved in either direction by arguments :)

In general it seems like a good idea to use a standard file format and
not "a standard file format except for two characters". :)  We also
wouldn't be having discussions on editors.

That said, after a bit more research I'm skeptical about the possibility
of using an off-the-shelf parser because most of them either don't
support comments, or are based on YAJL which simply discards comments.

Since '//' comments are harder to parse than "#" comments, this would
actually _add_ code instead of removing it.  Also since our doc comment
syntax uses "##" as a delimiter, we'd have to bikeshed what the doc
comments would look like ("//!", "///", etc.).

This means the two parts might be considered separately:

- replacing single-quote with double-quote strings

- replacing # comments with //

Paolo

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by John Snow 5 years, 6 months ago

On 8/3/20 8:01 AM, Paolo Bonzini wrote:
> That said, after a bit more research I'm skeptical about the possibility
> of using an off-the-shelf parser because most of them either don't
> support comments, or are based on YAJL which simply discards comments.

Heresy:

Docstrings could become part of the data format so they can be parsed, 
analyzed and validated. Parsers largely treat comments like non-semantic 
information and discard it. Round-trip parsers that preserve comments in 
any language are extremely rare.

If the docstrings are relevant to the generator and aren't discardable, 
they should be fully-fledged data members.

In a prototype I had for a YAML format, I just promoted docstrings 
directly to fields, so I could allow clients to query help text for 
individual commands.

--js

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Paolo Bonzini 5 years, 6 months ago

On 03/08/20 20:10, John Snow wrote:
> Heresy:
> 
> Docstrings could become part of the data format so they can be parsed,
> analyzed and validated. Parsers largely treat comments like non-semantic
> information and discard it. Round-trip parsers that preserve comments in
> any language are extremely rare.
> 
> If the docstrings are relevant to the generator and aren't discardable,
> they should be fully-fledged data members.
> 
> In a prototype I had for a YAML format, I just promoted docstrings
> directly to fields, so I could allow clients to query help text for
> individual commands.

This would be actually a good idea, but somebody has to write the code.
 Each field's docstring should be attached to the field, however---no
parsing needed only looking at the tree.  Take a look at what Nir posted:

> Here is the patch adding schema convertor from qemu "json" format to
> standard yaml:
> https://github.com/oVirt/vdsm/commit/e57b69e72987c0929b20306c454835b52b5eb7ee
> 
> The current version of the new yaml based schema:
> https://github.com/oVirt/vdsm/blob/master/lib/vdsm/api/vdsm-api.yml


    VmDiskDevice: &VmDiskDevice
        added: '3.1'
        description: Properties of a VM disk device.
        name: VmDiskDevice
        properties:
        -   description: Indicates if writes are prohibited for the
                device
            name: readonly
            type: boolean

        -   description: The size of the disk (in bytes)
            name: apparentsize
            type: uint

etc.

Paolo

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by John Snow 5 years, 6 months ago

On 8/3/20 2:16 PM, Paolo Bonzini wrote:
> On 03/08/20 20:10, John Snow wrote:
>> Heresy:
>>
>> Docstrings could become part of the data format so they can be parsed,
>> analyzed and validated. Parsers largely treat comments like non-semantic
>> information and discard it. Round-trip parsers that preserve comments in
>> any language are extremely rare.
>>
>> If the docstrings are relevant to the generator and aren't discardable,
>> they should be fully-fledged data members.
>>
>> In a prototype I had for a YAML format, I just promoted docstrings
>> directly to fields, so I could allow clients to query help text for
>> individual commands.
> 
> This would be actually a good idea, but somebody has to write the code.
>   Each field's docstring should be attached to the field, however---no
> parsing needed only looking at the tree.  Take a look at what Nir posted:
> 
>> Here is the patch adding schema convertor from qemu "json" format to
>> standard yaml:
>> https://github.com/oVirt/vdsm/commit/e57b69e72987c0929b20306c454835b52b5eb7ee
>>
>> The current version of the new yaml based schema:
>> https://github.com/oVirt/vdsm/blob/master/lib/vdsm/api/vdsm-api.yml
> 
> 
>      VmDiskDevice: &VmDiskDevice
>          added: '3.1'
>          description: Properties of a VM disk device.
>          name: VmDiskDevice
>          properties:
>          -   description: Indicates if writes are prohibited for the
>                  device
>              name: readonly
>              type: boolean
> 
>          -   description: The size of the disk (in bytes)
>              name: apparentsize
>              type: uint
> 
> etc.
> 
> Paolo
> 

I was working on a small prototype that used something that looked like 
this; the "*opt" format was traded for "?opt", but otherwise:


struct:
   name: AudiodevPerDirectionOptions
   doc: >
     General audio backend options that are used for both
     playback and recording.
   since: '4.0'
   members:

     ?mixing-engine:
       type: bool
       default: 'true'
       since: '4.2'
       doc: |
         Use QEMU's mixing engine to mix all streams inside QEMU and
         convert audio formats when not supported by the backend.

         When set to off, fixed-settings must be also off.

     ?fixed-settings:
       type: bool
       default: 'true'
       doc: >-
         Use fixed settings for host input/output.
         When off, frequency, channels and format must not be specified.

     ?frequency:
       type: bool
       default: '44100'
       doc: >-
         frequency to use when using fixed settings.

     ?channels:
       type: 'uint32'
       default: 2
       doc: >-
         Number of channels when using fixed settings.

     ?voices:
       type: 'uint32'
       default: 1
       doc: "Number of voices to use."

     ?format:
       type: 'AudioFormat'
       default: 's16'
       doc: "Sample format to use when using fixed settings."

     ?buffer-length:
       type: 'uint32'
       doc: 'The buffer length, in microseconds.'

   features:
     my-cool-feature:
       since: '6.0'
       doc: 'This is, no doubt, an extremely cool feature.'

     my-bad-feature:
       doc: 'This is a very bad feature. I am sorry for making it.'
       since: '1.0'
       deprecated: '5.9'

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Nir Soffer 5 years, 6 months ago

On Mon, Aug 3, 2020 at 9:19 PM John Snow <jsnow@redhat.com> wrote:
>
> On 8/3/20 2:16 PM, Paolo Bonzini wrote:
> > On 03/08/20 20:10, John Snow wrote:
> >> Heresy:
> >>
> >> Docstrings could become part of the data format so they can be parsed,
> >> analyzed and validated. Parsers largely treat comments like non-semantic
> >> information and discard it. Round-trip parsers that preserve comments in
> >> any language are extremely rare.
> >>
> >> If the docstrings are relevant to the generator and aren't discardable,
> >> they should be fully-fledged data members.
> >>
> >> In a prototype I had for a YAML format, I just promoted docstrings
> >> directly to fields, so I could allow clients to query help text for
> >> individual commands.
> >
> > This would be actually a good idea, but somebody has to write the code.
> >   Each field's docstring should be attached to the field, however---no
> > parsing needed only looking at the tree.  Take a look at what Nir posted:
> >
> >> Here is the patch adding schema convertor from qemu "json" format to
> >> standard yaml:
> >> https://github.com/oVirt/vdsm/commit/e57b69e72987c0929b20306c454835b52b5eb7ee
> >>
> >> The current version of the new yaml based schema:
> >> https://github.com/oVirt/vdsm/blob/master/lib/vdsm/api/vdsm-api.yml
> >
> >
> >      VmDiskDevice: &VmDiskDevice
> >          added: '3.1'
> >          description: Properties of a VM disk device.
> >          name: VmDiskDevice
> >          properties:
> >          -   description: Indicates if writes are prohibited for the
> >                  device
> >              name: readonly
> >              type: boolean
> >
> >          -   description: The size of the disk (in bytes)
> >              name: apparentsize
> >              type: uint
> >
> > etc.
> >
> > Paolo
> >
>
> I was working on a small prototype that used something that looked like
> this; the "*opt" format was traded for "?opt", but otherwise:
>
>
> struct:
>    name: AudiodevPerDirectionOptions
>    doc: >
>      General audio backend options that are used for both
>      playback and recording.
>    since: '4.0'
>    members:
>
>      ?mixing-engine:

This optimizes for writing instead of reading.

    optional: true

Would be nicer to read, but more important is all the tools parsing
this schema in multiple languages that will have code like:

    def is_optional(node):
        return node.name.startswith("?")

Instead of :

   if node.optional:
       ...

Or maybe better:

    if node.required:

Because it seems that more nodes are optional, so focusing on the required
items will make the schema shorter and more clear.

>        type: bool
>        default: 'true'
>        since: '4.2'
>        doc: |
>          Use QEMU's mixing engine to mix all streams inside QEMU and
>          convert audio formats when not supported by the backend.

Using | is nicer than >-. Not sure what is the difference. In vdsm we don't use
anything and I think it causes trouble when indenting text.

>          When set to off, fixed-settings must be also off.
>
>      ?fixed-settings:
>        type: bool
>        default: 'true'

Why is the default a string and not the actual type?

>        doc: >-
>          Use fixed settings for host input/output.
>          When off, frequency, channels and format must not be specified.
>
>      ?frequency:
>        type: bool
>        default: '44100'
>        doc: >-
>          frequency to use when using fixed settings.
>
>      ?channels:
>        type: 'uint32'
>        default: 2

Here you use the real type, and this is nicer.

>        doc: >-
>          Number of channels when using fixed settings.
>
>      ?voices:
>        type: 'uint32'
>        default: 1
>        doc: "Number of voices to use."
>
>      ?format:
>        type: 'AudioFormat'
>        default: 's16'
>        doc: "Sample format to use when using fixed settings."
>
>      ?buffer-length:
>        type: 'uint32'
>        doc: 'The buffer length, in microseconds.'
>
>    features:
>      my-cool-feature:
>        since: '6.0'
>        doc: 'This is, no doubt, an extremely cool feature.'
>
>      my-bad-feature:
>        doc: 'This is a very bad feature. I am sorry for making it.'
>        since: '1.0'
>        deprecated: '5.9'

Good example :-)

>
>

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by John Snow 5 years, 6 months ago

On 8/3/20 3:54 PM, Nir Soffer wrote:
> On Mon, Aug 3, 2020 at 9:19 PM John Snow <jsnow@redhat.com> wrote:
>>
>> On 8/3/20 2:16 PM, Paolo Bonzini wrote:
>>> On 03/08/20 20:10, John Snow wrote:
>>>> Heresy:
>>>>
>>>> Docstrings could become part of the data format so they can be parsed,
>>>> analyzed and validated. Parsers largely treat comments like non-semantic
>>>> information and discard it. Round-trip parsers that preserve comments in
>>>> any language are extremely rare.
>>>>
>>>> If the docstrings are relevant to the generator and aren't discardable,
>>>> they should be fully-fledged data members.
>>>>
>>>> In a prototype I had for a YAML format, I just promoted docstrings
>>>> directly to fields, so I could allow clients to query help text for
>>>> individual commands.
>>>
>>> This would be actually a good idea, but somebody has to write the code.
>>>    Each field's docstring should be attached to the field, however---no
>>> parsing needed only looking at the tree.  Take a look at what Nir posted:
>>>
>>>> Here is the patch adding schema convertor from qemu "json" format to
>>>> standard yaml:
>>>> https://github.com/oVirt/vdsm/commit/e57b69e72987c0929b20306c454835b52b5eb7ee
>>>>
>>>> The current version of the new yaml based schema:
>>>> https://github.com/oVirt/vdsm/blob/master/lib/vdsm/api/vdsm-api.yml
>>>
>>>
>>>       VmDiskDevice: &VmDiskDevice
>>>           added: '3.1'
>>>           description: Properties of a VM disk device.
>>>           name: VmDiskDevice
>>>           properties:
>>>           -   description: Indicates if writes are prohibited for the
>>>                   device
>>>               name: readonly
>>>               type: boolean
>>>
>>>           -   description: The size of the disk (in bytes)
>>>               name: apparentsize
>>>               type: uint
>>>
>>> etc.
>>>
>>> Paolo
>>>
>>
>> I was working on a small prototype that used something that looked like
>> this; the "*opt" format was traded for "?opt", but otherwise:
>>
>>
>> struct:
>>     name: AudiodevPerDirectionOptions
>>     doc: >
>>       General audio backend options that are used for both
>>       playback and recording.
>>     since: '4.0'
>>     members:
>>
>>       ?mixing-engine:
> 
> This optimizes for writing instead of reading.
> 

Following a "path of least resistance" from the existing QAPI language, 
clearly carrying over the '*optional' syntax.

>      optional: true
> 
> Would be nicer to read, but more important is all the tools parsing
> this schema in multiple languages that will have code like:
> 
>      def is_optional(node):
>          return node.name.startswith("?")
> 
> Instead of :
> 
>     if node.optional:
>         ...
> 
> Or maybe better:
> 
>      if node.required:
> 
> Because it seems that more nodes are optional, so focusing on the required
> items will make the schema shorter and more clear.
> 
>>         type: bool
>>         default: 'true'
>>         since: '4.2'
>>         doc: |
>>           Use QEMU's mixing engine to mix all streams inside QEMU and
>>           convert audio formats when not supported by the backend.
> 
> Using | is nicer than >-. Not sure what is the difference. In vdsm we don't use
> anything and I think it causes trouble when indenting text.
> 

I believe when I wrote this example I was trying to highlight the 
different space consumption styles for the purposes of demonstrating 
what it would do to Sphinx document generation support.

ultimately, there's not really a way to enforce one or the other style 
post-parse.

>>           When set to off, fixed-settings must be also off.
>>
>>       ?fixed-settings:
>>         type: bool
>>         default: 'true'
> 
> Why is the default a string and not the actual type?
> 

I'm going to be honest: I forget. I was playing around with the idea of 
documenting defaults for the purposes of documentation, but not 
necessarily for performing the actual code generation of those defaults.

I believe I specified this field as a string in my grammar and `5` would 
get promoted to "5", but `true` caused a type error.

Doing something in a type-safe way seemed ... harder. So I didn't.

>>         doc: >-
>>           Use fixed settings for host input/output.
>>           When off, frequency, channels and format must not be specified.
>>
>>       ?frequency:
>>         type: bool
>>         default: '44100'
>>         doc: >-
>>           frequency to use when using fixed settings.
>>
>>       ?channels:
>>         type: 'uint32'
>>         default: 2
> 
> Here you use the real type, and this is nicer.
> 
>>         doc: >-
>>           Number of channels when using fixed settings.
>>
>>       ?voices:
>>         type: 'uint32'
>>         default: 1
>>         doc: "Number of voices to use."
>>
>>       ?format:
>>         type: 'AudioFormat'
>>         default: 's16'
>>         doc: "Sample format to use when using fixed settings."
>>
>>       ?buffer-length:
>>         type: 'uint32'
>>         doc: 'The buffer length, in microseconds.'
>>
>>     features:
>>       my-cool-feature:
>>         since: '6.0'
>>         doc: 'This is, no doubt, an extremely cool feature.'
>>
>>       my-bad-feature:
>>         doc: 'This is a very bad feature. I am sorry for making it.'
>>         since: '1.0'
>>         deprecated: '5.9'
> 
> Good example :-)
> 
>>
>>
>

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Markus Armbruster 5 years, 6 months ago

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 03/08/20 13:28, Markus Armbruster wrote:
>>>> We could remove them from the QAPI schema language.  Flag day, and
>>>> git-blame becomes pretty much useless for a couple of years.
>>> Is that a nack or a "whatever"?
>> It's "is this really worth the trouble?"  I guess that's halfway between
>> NAK and whatever, ready to be moved in either direction by arguments :)
>
> In general it seems like a good idea to use a standard file format and
> not "a standard file format except for two characters". :)  We also
> wouldn't be having discussions on editors.

No argument.  But towards which standard file format should the schema
evolve?

* Standard JSON: no comments, no go

* JSON with # comments: need to change strings from ' to "

* JavaScript: need to change comments from # to //

* Python: may want to switch bool literals from true, false to True,
  False

> That said, after a bit more research I'm skeptical about the possibility
> of using an off-the-shelf parser because most of them either don't
> support comments, or are based on YAJL which simply discards comments.
>
> Since '//' comments are harder to parse than "#" comments, this would
> actually _add_ code instead of removing it.  Also since our doc comment
> syntax uses "##" as a delimiter, we'd have to bikeshed what the doc
> comments would look like ("//!", "///", etc.).

Doc comments don't have to be comments in the schema language.  They
could be doc strings.  Requires decent support for long strings, which
JSON does not provide.

> This means the two parts might be considered separately:
>
> - replacing single-quote with double-quote strings
>
> - replacing # comments with //

If all we want is decent editor support out of the box, then rename to
.py, and drop the modelines.  No merge conflicts, no git-blame
pollution.

To make the .py files actual Python, additionally rename the bool
literals.  Much, much less churn than massaging all strings or all
comments.

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Kevin Wolf 5 years, 6 months ago

Am 03.08.2020 um 18:03 hat Markus Armbruster geschrieben:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> > This means the two parts might be considered separately:
> >
> > - replacing single-quote with double-quote strings
> >
> > - replacing # comments with //
> 
> If all we want is decent editor support out of the box, then rename to
> .py, and drop the modelines.  No merge conflicts, no git-blame
> pollution.
> 
> To make the .py files actual Python, additionally rename the bool
> literals.  Much, much less churn than massaging all strings or all
> comments.

I guess I could get behind this one. File renames still have a cost, but
it feels like it wouldn't be absurdly high at least.

And that you actually occasionally paste schema parts into real Python
code suggests that there would be even a small benefit in addition to
the good syntax highlighting out of the box.

I fully expect that we'd keep our existing parser instead of using an
actual Python parser, because the existing code (a) exists and (b) is
probably simpler than the resulting code.

Kevin

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Markus Armbruster 5 years, 6 months ago

Kevin Wolf <kwolf@redhat.com> writes:

> Am 03.08.2020 um 18:03 hat Markus Armbruster geschrieben:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>> > This means the two parts might be considered separately:
>> >
>> > - replacing single-quote with double-quote strings
>> >
>> > - replacing # comments with //
>> 
>> If all we want is decent editor support out of the box, then rename to
>> .py, and drop the modelines.  No merge conflicts, no git-blame
>> pollution.
>> 
>> To make the .py files actual Python, additionally rename the bool
>> literals.  Much, much less churn than massaging all strings or all
>> comments.
>
> I guess I could get behind this one. File renames still have a cost, but
> it feels like it wouldn't be absurdly high at least.
>
> And that you actually occasionally paste schema parts into real Python
> code suggests that there would be even a small benefit in addition to
> the good syntax highlighting out of the box.
>
> I fully expect that we'd keep our existing parser instead of using an
> actual Python parser, because the existing code (a) exists and (b) is
> probably simpler than the resulting code.

Replacing the part of parser.py that deals with JSON by off-the-shelf
code is a non-goal for me.  I got better things to do than replacing[*]
a tiny parser that works by glue for another parser, moving QAPI
sideways instead of forward.

Any messing with the lower syntax layer in non-trivial ways needs to
bring benefits that make it worth our while.


[*] Includes reviewing patches.

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Paolo Bonzini 5 years, 6 months ago

On 03/08/20 18:03, Markus Armbruster wrote:
>> In general it seems like a good idea to use a standard file format and
>> not "a standard file format except for two characters". :)  We also
>> wouldn't be having discussions on editors.
> 
> No argument.  But towards which standard file format should the schema
> evolve?
> 
> * Standard JSON: no comments, no go
> 
> * JSON with # comments: need to change strings from ' to "
> 
> * JavaScript: need to change comments from # to //
> 
> * Python: may want to switch bool literals from true, false to True,
>   False

Second or third, I'd say.  I dislike using .py because a stream of
Python objects doesn't really have a meaning in Python: that's the
difference between .js and .json.

Third requires someone to do the work in the parser.  Unlikely.

>> That said, after a bit more research I'm skeptical about the possibility
>> of using an off-the-shelf parser because most of them either don't
>> support comments, or are based on YAJL which simply discards comments.
>>
>> Since '//' comments are harder to parse than "#" comments, this would
>> actually _add_ code instead of removing it.  Also since our doc comment
>> syntax uses "##" as a delimiter, we'd have to bikeshed what the doc
>> comments would look like ("//!", "///", etc.).
> 
> Doc comments don't have to be comments in the schema language.  They
> could be doc strings.  Requires decent support for long strings, which
> JSON does not provide.

Exactly.  This was the appeal of YAML (or StrictYAML so that Norwegians
don't turn into Falsians) as far as I understood.  But if we were to go
YAML, I'd rather have make doc strings part of the YAML document too.
That is what Nir suggested, it makes sense but someone has to write the
conversion code.

> If all we want is decent editor support out of the box, then rename to
> .py, and drop the modelines.  No merge conflicts, no git-blame
> pollution.

Another possibility is to rename to .qapi and keep Python modelines as a
hack that does work.

Paolo

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Markus Armbruster 5 years, 6 months ago

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 03/08/20 18:03, Markus Armbruster wrote:
>>> In general it seems like a good idea to use a standard file format and
>>> not "a standard file format except for two characters". :)  We also
>>> wouldn't be having discussions on editors.
>> 
>> No argument.  But towards which standard file format should the schema
>> evolve?
>> 
>> * Standard JSON: no comments, no go
>> 
>> * JSON with # comments: need to change strings from ' to "
>> 
>> * JavaScript: need to change comments from # to //
>> 
>> * Python: may want to switch bool literals from true, false to True,
>>   False
>
> Second or third, I'd say.  I dislike using .py because a stream of
> Python objects doesn't really have a meaning in Python:

It does have a meaning: compute a bunch of dictionaries and throw them
away.  Its useless as a program, but it's not meaningless.

>                                                         that's the
> difference between .js and .json.

True.  RFC 4626: "JSON is a subset of JavaScript, but it is a safe
subset that excludes assignment and invocation."[*]

An analogous subset of Python is possible, but has not been formally
defined as far as I know.

> Third requires someone to do the work in the parser.  Unlikely.

The pain of tweaking the parser is likely dwarved several times over by
the pain of the flag day.

>>> That said, after a bit more research I'm skeptical about the possibility
>>> of using an off-the-shelf parser because most of them either don't
>>> support comments, or are based on YAJL which simply discards comments.
>>>
>>> Since '//' comments are harder to parse than "#" comments, this would
>>> actually _add_ code instead of removing it.  Also since our doc comment
>>> syntax uses "##" as a delimiter, we'd have to bikeshed what the doc
>>> comments would look like ("//!", "///", etc.).
>> 
>> Doc comments don't have to be comments in the schema language.  They
>> could be doc strings.  Requires decent support for long strings, which
>> JSON does not provide.
>
> Exactly.  This was the appeal of YAML (or StrictYAML so that Norwegians
> don't turn into Falsians) as far as I understood.  But if we were to go
> YAML, I'd rather have make doc strings part of the YAML document too.
> That is what Nir suggested, it makes sense but someone has to write the
> conversion code.

To write a converter, you first have to understand the doc comment
language.  It's a bit of a mess, because it was fitted to existing
conventions to reduce churn.

Peter Maydell has patches to generate rST instead of Texinfo.  They
affect the doc comment language.  I expect to merge them for 5.2.

>> If all we want is decent editor support out of the box, then rename to
>> .py, and drop the modelines.  No merge conflicts, no git-blame
>> pollution.
>
> Another possibility is to rename to .qapi and keep Python modelines as a
> hack that does work.

Yes.


[*] Unfortunately, this has become a statement of intent, not a
description of reality, due to JSON design accidents.

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by John Snow 5 years, 6 months ago

On 8/4/20 4:03 AM, Markus Armbruster wrote:
> The pain of tweaking the parser is likely dwarved several times over by
> the pain of the flag day.

You mention this often; I wonder if I misunderstand the critique, 
because the pain of a "flag day" for a new file format seems negligible 
to me.

I don't think we edit these .json files very often. Generally, we add a 
new command when we need one. The edits are usually one or two lines 
plus docstrings.

If anyone has patches in-flight, I genuinely doubt it will take more 
than a few minutes to rewrite for the new file format.

No?

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Markus Armbruster 5 years, 6 months ago

John Snow <jsnow@redhat.com> writes:

> On 8/4/20 4:03 AM, Markus Armbruster wrote:
>> The pain of tweaking the parser is likely dwarved several times over by
>> the pain of the flag day.
>
> You mention this often; I wonder if I misunderstand the critique,
> because the pain of a "flag day" for a new file format seems
> negligible to me.
>
> I don't think we edit these .json files very often. Generally, we add
> a new command when we need one. The edits are usually one or two lines
> plus docstrings.
>
> If anyone has patches in-flight, I genuinely doubt it will take more
> than a few minutes to rewrite for the new file format.
>
> No?

You describe the the flag day's one-time pain.

There's also the longer term pain of having to work around git-blame
unable to see beyond the flag day.

I'm not claiming the pain is prohibitive (if I thought it was, I
would've tried to strange this thread in its crib), I am claiming it'll
be much more painful (read: expensive) than a parser tweak.

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Paolo Bonzini 5 years, 6 months ago

On 05/08/20 09:36, Markus Armbruster wrote:
> There's also the longer term pain of having to work around git-blame
> unable to see beyond the flag day.

Do you really use "git blame" that much?  "git log -S" does more or less
the same function (in a different way) and is not affected as much by
large code movement and transformation patches.

Paolo

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Dr. David Alan Gilbert 5 years, 6 months ago

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> On 05/08/20 09:36, Markus Armbruster wrote:
> > There's also the longer term pain of having to work around git-blame
> > unable to see beyond the flag day.
> 
> Do you really use "git blame" that much?  "git log -S" does more or less
> the same function (in a different way) and is not affected as much by
> large code movement and transformation patches.

I use it a lot!   Following stuff back to find where a change came
from and then asking people.

Dave

> Paolo
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Paolo Bonzini 5 years, 6 months ago

On 05/08/20 10:39, Dr. David Alan Gilbert wrote:
>> Do you really use "git blame" that much?  "git log -S" does more or less
>> the same function (in a different way) and is not affected as much by
>> large code movement and transformation patches.
>
> I use it a lot!   Following stuff back to find where a change came
> from and then asking people.

Indeed, but I use "git log -S" instead. :)  Another possibility is to
just do "git log -p" and search for a relevant line of the code I'm
"blaming".

Paolo

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Daniel P. Berrangé 5 years, 6 months ago

On Wed, Aug 05, 2020 at 10:49:35AM +0200, Paolo Bonzini wrote:
> On 05/08/20 10:39, Dr. David Alan Gilbert wrote:
> >> Do you really use "git blame" that much?  "git log -S" does more or less
> >> the same function (in a different way) and is not affected as much by
> >> large code movement and transformation patches.
> >
> > I use it a lot!   Following stuff back to find where a change came
> > from and then asking people.
> 
> Indeed, but I use "git log -S" instead. :)  Another possibility is to
> just do "git log -p" and search for a relevant line of the code I'm
> "blaming".

I used git blame alot too, but I don't think its a reason to not do the
cleanups. It is easy enough to just tell blame to use an earlier commit
if you see it displaying a refactor. I don't think such mild inconvenience
should stop us making otherwise desirable code changes


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: cleanups with long-term benefits

Posted by Cornelia Huck 5 years, 6 months ago

On Wed, 5 Aug 2020 10:05:40 +0100
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Wed, Aug 05, 2020 at 10:49:35AM +0200, Paolo Bonzini wrote:
> > On 05/08/20 10:39, Dr. David Alan Gilbert wrote:  
> > >> Do you really use "git blame" that much?  "git log -S" does more or less
> > >> the same function (in a different way) and is not affected as much by
> > >> large code movement and transformation patches.  
> > >
> > > I use it a lot!   Following stuff back to find where a change came
> > > from and then asking people.  
> > 
> > Indeed, but I use "git log -S" instead. :)  Another possibility is to
> > just do "git log -p" and search for a relevant line of the code I'm
> > "blaming".  
> 
> I used git blame alot too, but I don't think its a reason to not do the
> cleanups. It is easy enough to just tell blame to use an earlier commit
> if you see it displaying a refactor. I don't think such mild inconvenience
> should stop us making otherwise desirable code changes

I don't think people argue that it should block changes; it it simply
another thing to consider when weighing benefits vs. drawbacks.

Re: cleanups with long-term benefits

Posted by Daniel P. Berrangé 5 years, 6 months ago

On Wed, Aug 05, 2020 at 11:11:55AM +0200, Cornelia Huck wrote:
> On Wed, 5 Aug 2020 10:05:40 +0100
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> > On Wed, Aug 05, 2020 at 10:49:35AM +0200, Paolo Bonzini wrote:
> > > On 05/08/20 10:39, Dr. David Alan Gilbert wrote:  
> > > >> Do you really use "git blame" that much?  "git log -S" does more or less
> > > >> the same function (in a different way) and is not affected as much by
> > > >> large code movement and transformation patches.  
> > > >
> > > > I use it a lot!   Following stuff back to find where a change came
> > > > from and then asking people.  
> > > 
> > > Indeed, but I use "git log -S" instead. :)  Another possibility is to
> > > just do "git log -p" and search for a relevant line of the code I'm
> > > "blaming".  
> > 
> > I used git blame alot too, but I don't think its a reason to not do the
> > cleanups. It is easy enough to just tell blame to use an earlier commit
> > if you see it displaying a refactor. I don't think such mild inconvenience
> > should stop us making otherwise desirable code changes
> 
> I don't think people argue that it should block changes; it it simply
> another thing to consider when weighing benefits vs. drawbacks.

Actually, I'm saying that including "git blame" in such a weighing is
the mechanism for blocking otherwise beneficial change to the code. Or
to put it another way, I believe the value assigned to "git blame" in
such a comparison is miniscule / effectively zero. The only time
"git blame" should win is if the change in question is purely the
bike shed colour and has no technical benefits at all.  If there's any
technical benefit to making the change it should always win.  We
shouldn't preserve technical debt in the code merely to avoid an impact
on "git blame".

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: cleanups with long-term benefits

Posted by Cornelia Huck 5 years, 6 months ago

On Wed, 5 Aug 2020 11:08:02 +0100
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Wed, Aug 05, 2020 at 11:11:55AM +0200, Cornelia Huck wrote:
> > On Wed, 5 Aug 2020 10:05:40 +0100
> > Daniel P. Berrangé <berrange@redhat.com> wrote:
> >   
> > > On Wed, Aug 05, 2020 at 10:49:35AM +0200, Paolo Bonzini wrote:  
> > > > On 05/08/20 10:39, Dr. David Alan Gilbert wrote:    
> > > > >> Do you really use "git blame" that much?  "git log -S" does more or less
> > > > >> the same function (in a different way) and is not affected as much by
> > > > >> large code movement and transformation patches.    
> > > > >
> > > > > I use it a lot!   Following stuff back to find where a change came
> > > > > from and then asking people.    
> > > > 
> > > > Indeed, but I use "git log -S" instead. :)  Another possibility is to
> > > > just do "git log -p" and search for a relevant line of the code I'm
> > > > "blaming".    
> > > 
> > > I used git blame alot too, but I don't think its a reason to not do the
> > > cleanups. It is easy enough to just tell blame to use an earlier commit
> > > if you see it displaying a refactor. I don't think such mild inconvenience
> > > should stop us making otherwise desirable code changes  
> > 
> > I don't think people argue that it should block changes; it it simply
> > another thing to consider when weighing benefits vs. drawbacks.  
> 
> Actually, I'm saying that including "git blame" in such a weighing is
> the mechanism for blocking otherwise beneficial change to the code. Or
> to put it another way, I believe the value assigned to "git blame" in
> such a comparison is miniscule / effectively zero. The only time
> "git blame" should win is if the change in question is purely the
> bike shed colour and has no technical benefits at all.  If there's any
> technical benefit to making the change it should always win.  We
> shouldn't preserve technical debt in the code merely to avoid an impact
> on "git blame".

I think we have to agree to disagree on this. Making "git blame" harder
to use is annoying, and at least for me there's a threshold of
usefulness for the code change that should be considered. (No judgment
on the proposed change here; I don't have really have a horse in that
race.)

Re: cleanups with long-term benefits

Posted by Kevin Wolf 5 years, 6 months ago

Am 05.08.2020 um 12:08 hat Daniel P. BerrangÃ© geschrieben:
> On Wed, Aug 05, 2020 at 11:11:55AM +0200, Cornelia Huck wrote:
> > On Wed, 5 Aug 2020 10:05:40 +0100
> > Daniel P. BerrangÃ© <berrange@redhat.com> wrote:
> > 
> > > On Wed, Aug 05, 2020 at 10:49:35AM +0200, Paolo Bonzini wrote:
> > > > On 05/08/20 10:39, Dr. David Alan Gilbert wrote:  
> > > > >> Do you really use "git blame" that much?  "git log -S" does more or less
> > > > >> the same function (in a different way) and is not affected as much by
> > > > >> large code movement and transformation patches.  
> > > > >
> > > > > I use it a lot!   Following stuff back to find where a change came
> > > > > from and then asking people.  
> > > > 
> > > > Indeed, but I use "git log -S" instead. :)  Another possibility is to
> > > > just do "git log -p" and search for a relevant line of the code I'm
> > > > "blaming".  
> > > 
> > > I used git blame alot too, but I don't think its a reason to not do the
> > > cleanups. It is easy enough to just tell blame to use an earlier commit
> > > if you see it displaying a refactor. I don't think such mild inconvenience
> > > should stop us making otherwise desirable code changes
> > 
> > I don't think people argue that it should block changes; it it simply
> > another thing to consider when weighing benefits vs. drawbacks.
> 
> Actually, I'm saying that including "git blame" in such a weighing is
> the mechanism for blocking otherwise beneficial change to the code. Or
> to put it another way, I believe the value assigned to "git blame" in
> such a comparison is miniscule / effectively zero. The only time
> "git blame" should win is if the change in question is purely the
> bike shed colour and has no technical benefits at all.  If there's any
> technical benefit to making the change it should always win.  We
> shouldn't preserve technical debt in the code merely to avoid an impact
> on "git blame".

We're basically weighing "git blame" against syntax highlighting
defaults. I don't think the latter has an obviously higher weight.

Kevin

Re: cleanups with long-term benefits

Posted by Eduardo Habkost 5 years, 6 months ago

On Wed, Aug 05, 2020 at 06:23:23PM +0200, Kevin Wolf wrote:
> Am 05.08.2020 um 12:08 hat Daniel P. BerrangÃ© geschrieben:
> > On Wed, Aug 05, 2020 at 11:11:55AM +0200, Cornelia Huck wrote:
> > > On Wed, 5 Aug 2020 10:05:40 +0100
> > > Daniel P. BerrangÃ© <berrange@redhat.com> wrote:
> > > 
> > > > On Wed, Aug 05, 2020 at 10:49:35AM +0200, Paolo Bonzini wrote:
> > > > > On 05/08/20 10:39, Dr. David Alan Gilbert wrote:  
> > > > > >> Do you really use "git blame" that much?  "git log -S" does more or less
> > > > > >> the same function (in a different way) and is not affected as much by
> > > > > >> large code movement and transformation patches.  
> > > > > >
> > > > > > I use it a lot!   Following stuff back to find where a change came
> > > > > > from and then asking people.  
> > > > > 
> > > > > Indeed, but I use "git log -S" instead. :)  Another possibility is to
> > > > > just do "git log -p" and search for a relevant line of the code I'm
> > > > > "blaming".  
> > > > 
> > > > I used git blame alot too, but I don't think its a reason to not do the
> > > > cleanups. It is easy enough to just tell blame to use an earlier commit
> > > > if you see it displaying a refactor. I don't think such mild inconvenience
> > > > should stop us making otherwise desirable code changes
> > > 
> > > I don't think people argue that it should block changes; it it simply
> > > another thing to consider when weighing benefits vs. drawbacks.
> > 
> > Actually, I'm saying that including "git blame" in such a weighing is
> > the mechanism for blocking otherwise beneficial change to the code. Or
> > to put it another way, I believe the value assigned to "git blame" in
> > such a comparison is miniscule / effectively zero. The only time
> > "git blame" should win is if the change in question is purely the
> > bike shed colour and has no technical benefits at all.  If there's any
> > technical benefit to making the change it should always win.  We
> > shouldn't preserve technical debt in the code merely to avoid an impact
> > on "git blame".
> 
> We're basically weighing "git blame" against syntax highlighting
> defaults. I don't think the latter has an obviously higher weight.

I think "syntax highlight defaults" is far from being an accurate
description of the motivations behind this discussion.

-- 
Eduardo

Re: cleanups with long-term benefits

Posted by Markus Armbruster 5 years, 6 months ago

Eduardo Habkost <ehabkost@redhat.com> writes:

> On Wed, Aug 05, 2020 at 06:23:23PM +0200, Kevin Wolf wrote:
>> We're basically weighing "git blame" against syntax highlighting
>> defaults. I don't think the latter has an obviously higher weight.
>
> I think "syntax highlight defaults" is far from being an accurate
> description of the motivations behind this discussion.

The motivation and the expected benefits are far from clear to me,
because all I have so far is a meandering e-mail thread.

For a change proposal as massive as "throw out working code and touch
basically every line in the QAPI schema", I'd like to see a concise memo
stating the goals, their benefits, the possible ways to get there, and
their cost.  I don't think that's too much to ask.

Re: cleanups with long-term benefits

Posted by Cornelia Huck 5 years, 6 months ago

On Wed, 5 Aug 2020 10:25:30 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 05/08/20 09:36, Markus Armbruster wrote:
> > There's also the longer term pain of having to work around git-blame
> > unable to see beyond the flag day.  
> 
> Do you really use "git blame" that much?  "git log -S" does more or less
> the same function (in a different way) and is not affected as much by
> large code movement and transformation patches.

I'm not sure the two of them really perform the same function.

FWIW, I like using git {blame|annotate} to find out when/why some code
areas were changed, and it's often not "when was this line introduced",
but "I see some commits changing this function, let's find out more
about them." And yes, I use that quite regularly.

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Markus Armbruster 5 years, 6 months ago

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 05/08/20 09:36, Markus Armbruster wrote:
>> There's also the longer term pain of having to work around git-blame
>> unable to see beyond the flag day.
>
> Do you really use "git blame" that much?  "git log -S" does more or less
> the same function (in a different way) and is not affected as much by
> large code movement and transformation patches.

C-x v g

When that doesn't work, I fall back to git-log -S in a shell.

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Alex Bennée 5 years, 6 months ago

Markus Armbruster <armbru@redhat.com> writes:

> Paolo Bonzini <pbonzini@redhat.com> writes:
>
>> On 05/08/20 09:36, Markus Armbruster wrote:
>>> There's also the longer term pain of having to work around git-blame
>>> unable to see beyond the flag day.
>>
>> Do you really use "git blame" that much?  "git log -S" does more or less
>> the same function (in a different way) and is not affected as much by
>> large code movement and transformation patches.
>
> C-x v g
>
> When that doesn't work, I fall back to git-log -S in a shell.

Yep, I'm another that uses blame in the first instance (or magit-blame
inside emacs). My usual fallback after that is git log -p and liberal
use of / which is very inefficient.

-- 
Alex Bennée

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by John Snow 5 years, 6 months ago

On 8/5/20 3:36 AM, Markus Armbruster wrote:
> John Snow <jsnow@redhat.com> writes:
> 
>> On 8/4/20 4:03 AM, Markus Armbruster wrote:
>>> The pain of tweaking the parser is likely dwarved several times over by
>>> the pain of the flag day.
>>
>> You mention this often; I wonder if I misunderstand the critique,
>> because the pain of a "flag day" for a new file format seems
>> negligible to me.
>>
>> I don't think we edit these .json files very often. Generally, we add
>> a new command when we need one. The edits are usually one or two lines
>> plus docstrings.
>>
>> If anyone has patches in-flight, I genuinely doubt it will take more
>> than a few minutes to rewrite for the new file format.
>>
>> No?
> 
> You describe the the flag day's one-time pain.
> 
> There's also the longer term pain of having to work around git-blame
> unable to see beyond the flag day.
> 

So it's not really the "flag day" we're worried about anymore, it's the 
ongoing ease-of-use for vcs history.

> I'm not claiming the pain is prohibitive (if I thought it was, I
> would've tried to strange this thread in its crib), I am claiming it'll
> be much more painful (read: expensive) than a parser tweak.
> 

I do use `git blame` quite a lot, but with a project as old as QEMU, 
most of my trips through history do involve jumping across a few 
refactor gaps as a normal part of that process.

As Dan points out, I often have to back out and add refactorSHA^ to my 
invocation, and just keep hopping backwards until I find what I am truly 
after. It just feels like a fact of programmer life for me at this point.

I've not used Paolo's invocation before, but it looks like it might be 
useful. I'll try to remember to try it out.

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Markus Armbruster 5 years, 6 months ago

John Snow <jsnow@redhat.com> writes:

> On 8/5/20 3:36 AM, Markus Armbruster wrote:
>> John Snow <jsnow@redhat.com> writes:
>>
>>> On 8/4/20 4:03 AM, Markus Armbruster wrote:
>>>> The pain of tweaking the parser is likely dwarved several times over by
>>>> the pain of the flag day.
>>>
>>> You mention this often; I wonder if I misunderstand the critique,
>>> because the pain of a "flag day" for a new file format seems
>>> negligible to me.
>>>
>>> I don't think we edit these .json files very often. Generally, we add
>>> a new command when we need one. The edits are usually one or two lines
>>> plus docstrings.
>>>
>>> If anyone has patches in-flight, I genuinely doubt it will take more
>>> than a few minutes to rewrite for the new file format.
>>>
>>> No?
>>
>> You describe the the flag day's one-time pain.
>>
>> There's also the longer term pain of having to work around git-blame
>> unable to see beyond the flag day.
>>
>
> So it's not really the "flag day" we're worried about anymore, it's
> the ongoing ease-of-use for vcs history.

Feel free to call that pain however you want.  I'm going to call it
"lasting aftereffects of the flag day" :)

>> I'm not claiming the pain is prohibitive (if I thought it was, I
>> would've tried to strange this thread in its crib), I am claiming it'll
>> be much more painful (read: expensive) than a parser tweak.
>>
>
> I do use `git blame` quite a lot, but with a project as old as QEMU,
> most of my trips through history do involve jumping across a few
> refactor gaps as a normal part of that process.
>
> As Dan points out, I often have to back out and add refactorSHA^ to my
> invocation, and just keep hopping backwards until I find what I am
> truly after. It just feels like a fact of programmer life for me at
> this point.

The fact that we all need to cope with this class of issue doesn't mean
we should create more instances unthinkingly.

We should only when we believe the benefits are worth it, and can't find
a cheaper way to get them.

We've discussed "is it really that bad" at some length.  What I'm
missing so far is a clear writeup of the benefits beyond "editor works
out of the box" (which is quite a desirable benefit, but can also be had
without a flag day).

> I've not used Paolo's invocation before, but it looks like it might be
> useful. I'll try to remember to try it out.

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Markus Armbruster 5 years, 6 months ago

Markus Armbruster <armbru@redhat.com> writes:

> Paolo Bonzini <pbonzini@redhat.com> writes:
[...]
>> That said, after a bit more research I'm skeptical about the possibility
>> of using an off-the-shelf parser because most of them either don't
>> support comments, or are based on YAJL which simply discards comments.
>>
>> Since '//' comments are harder to parse than "#" comments, this would
>> actually _add_ code instead of removing it.  Also since our doc comment
>> syntax uses "##" as a delimiter, we'd have to bikeshed what the doc
>> comments would look like ("//!", "///", etc.).
>
> Doc comments don't have to be comments in the schema language.  They
> could be doc strings.  Requires decent support for long strings, which
> JSON does not provide.

There's another complication besides multi-line strings: funny
characters.

Since QAPI schema strings are all names, and names are restricted to
ASCII letters, digits, hyphen, and underscore, we limit strings to
printable ASCII, so we don't have to deal with control characters,
escape sequences, surrogate pairs, and all that crap.  Comments are
UTF-8.

[...]

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Daniel P. Berrangé 5 years, 6 months ago

On Mon, Aug 03, 2020 at 10:18:29AM +0200, Markus Armbruster wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
> > - the single-quote strings, which are not particularly useful in QAPI schema
> 
> Every single string in the QAPI schema uses them, though.
> 
> I have no idea why Anthony put them in the QAPI schema language.
> 
> We could remove them from the QAPI schema language.  Flag day, and
> git-blame becomes pretty much useless for a couple of years.

I don't think the git-blame issue is a big deal, just a minor inconvenience.
Say you find the line of code you are examining hits the commit which did
the refactoring. You merely have to run git blame a second time passing
SHA-OF-REFACTOR^1 as an arg. 


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: cleanups with long-term benefits (was Re: [PATCH] schemas: Add vim modeline)

Posted by Markus Armbruster 5 years, 6 months ago

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Mon, Aug 03, 2020 at 10:18:29AM +0200, Markus Armbruster wrote:
>> Paolo Bonzini <pbonzini@redhat.com> writes:
>> 
>> > - the single-quote strings, which are not particularly useful in QAPI schema
>> 
>> Every single string in the QAPI schema uses them, though.
>> 
>> I have no idea why Anthony put them in the QAPI schema language.
>> 
>> We could remove them from the QAPI schema language.  Flag day, and
>> git-blame becomes pretty much useless for a couple of years.
>
> I don't think the git-blame issue is a big deal, just a minor inconvenience.
> Say you find the line of code you are examining hits the commit which did
> the refactoring. You merely have to run git blame a second time passing
> SHA-OF-REFACTOR^1 as an arg. 

I do that all the time, and it's bloody annoying, especially when I have
to recurse more than once.

Show-stopper?  Nah.

Still, I insist it is put on the scales when we weigh the tradeoffs.

Re: [PATCH] schemas: Add vim modeline

Posted by Kevin Wolf 5 years, 6 months ago

Am 31.07.2020 um 17:07 hat Daniel P. BerrangÃ© geschrieben:
> On Fri, Jul 31, 2020 at 02:55:34PM +0200, Markus Armbruster wrote:
> > Daniel P. BerrangÃ© <berrange@redhat.com> writes:
> 
> > >> Some of the criticism there doesn't matter for our use case.
> > >
> > > Yeah, what matters is whether it can do the job we need in a way that is
> > > better than what we have today, and whether there are any further options
> > > to consider that might be viable alternatives.
> > 
> > Would it improve things enough to be worth the switching pain?
> 
> The short answer is that I don't think that question matters. We should
> do the conversion regardless, as our JSON-but-not file format has no
> compelling reason to exist as a thing when there's a variety of standard
> file formats that could do the job.

I think the question does matter. Reusing existing code is not an end in
itself, but it has to actually improve something. Usually this would be
simplifying the code because a lot of the work is done by something
external now.

"The job" is defining APIs in a way that we they describe JSON messages
and we can generate C boilerplate code and documentation from the
description.

If we look for a standard format, then we should start with this rather
than keeping the non-standard thing that we have, but wrapping a new
syntax around it.

> I'd explicitly ignore the sunk costs and minor amount of work to
> convert to a new format.

I'm not worried about the work to convert the QAPI generator.
QAPISchemaParser is 264 lines of code, presumably not a lot of work to
rewrite. This is also the upper limit of code that could be improved.

Given the discussion so far, I wouldn't even be sure that the diffstat
after pulling in an external dependeny would be at least minimally
favourable.

More time will be spent with dealing with the results of the actual
conversion of the schema file because people will get lots of merge
conflicts.

Sometimes this is worth it, but then the benefit has to be a little more
than just that we could say we've reused an external library.

> The long answer is that as a general philosophy I'm in favour of agressively
> eliminating anything that is custom to a project and isn't offering an
> compelling benefit over a functionally equivalent, commonly used / standard
> solution.

The differences between JSON and the QAPI schema lower layer language is
structured documentation comments (I don't think any of the proposed
alternatives have this - might be a compelling benefit) and having
multiple objects in a single file (I would rather put brackets around
the whole file and commas between each definition and live with that
ugliness than taking the pain of a merge conflicts).

> Any time a project re-invents the wheel, that is one more piece of custom
> knowledge a contributor has to learn. Each one may seem insignificant on
> its own, but cummulatively they result in death by a 1000 cuts. This makes
> a project increasingly less attractive to contribute to over the long term.

I doubt documentation comments and having multiple objects in a single
file makes QEMU less attractice to contribute to.

> Measuring the long term benefit of the change is generally quite difficult,
> because while you can see what impact a change will have today on current
> code, it is hard to usefully evaluate future benefits as you're trying to
> imagine the impact on things that don't even exist.
> 
> Overall my POV is not to think too hard about measuring improvements, and
> discard any concern about sunk costs. Instead have a general presumption
> in favour of eliminating any examples of wheel re-invention in a project.
> Even if regular contributors don't want to spend time on such work, this
> kind of thing is pretty amenable to new contributors looking for tasks to
> start their involvement.
> 
> The QAPI JSON-but-not file format is a case where I think we should just
> adopt a standard file format no matter what. A conversion will have some
> short term work, but this is really simple data to deal with and the code
> involved is nicely self contained.

Almost every subsystem has some QAPI parts that would be affected.

> Again I'm not saying QAPI maintainers must do it, just put the idea
> out there as a piece of work that would be welcomed if someone is
> interested in working ont.
> 
> Another example would be elimination of anything in QEMU code that is
> duplicating functionality in GLib, even if there zero functional
> difference between the two impls.

If it's an actual duplicate and exact match, sure. But if using the GLib
functions results in double the code size that implementing the
functionality from scratch was, it becomes questionable in my opinion.

Kevin

Re: [PATCH] schemas: Add vim modeline

Posted by Nir Soffer 5 years, 6 months ago

On Thu, Jul 30, 2020 at 12:38 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Thu, Jul 30, 2020 at 11:07:26AM +0200, Markus Armbruster wrote:
> > Andrea Bolognani <abologna@redhat.com> writes:
> >
> > > The various schemas included in QEMU use a JSON-based format which
> > > is, however, strictly speaking not valid JSON.
> > >
> > > As a consequence, when vim tries to apply syntax highlight rules
> > > for JSON (as guessed from the file name), the result is an unreadable
> > > mess which mostly consist of red markers pointing out supposed errors
> > > in, well, pretty much everything.
> > >
> > > Using Python syntax highlighting produces much better results, and
> > > in fact these files already start with specially-formatted comments
> > > that instruct Emacs to process them as if they were Python files.
> > >
> > > This commit adds the equivalent special comments for vim.
> > >
> > > Signed-off-by: Andrea Bolognani <abologna@redhat.com>
>
> Given that we already have emacs mode-lines, I see no reason to
> not also have vim mode-lines, so regardless of the deeper discussion
> I think this is patch is fine to merge in the short term
>
>   Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
>
>
> > Naming QAPI schema files .json even though their contents isn't was a
> > mistake.  Correcting it would be a pain.  If we correct it, then the
> > sooner the better.
> >
> > Renaming them to .py gives decent editor support out of the box.  Their
> > contents isn't quite Python, though: true vs. True, false vs. False.  Do
> > we care?  Only a few dozen occurences; they could be adjusted.
> >
> > Renaming them to .qapi would perhaps be less confusing, for the price of
> > "out of the box".
>
> IMHO, the critical rule is that if you a pick a particular file extension
> associated with an existing language, you absolutely MUST BE compliant
> with that language.
>
> We fail at compliance with both JSON and Python because we're actually
> using our own DSL (domain specific language).
>
> IOW if we're going to stick with our current file format, then we should
> be naming them .qapi. We can still use an editor mode line if we want to
> claim we're approximately equiv to another language, but we can't be
> surprised if editors get upset.
>
>
> The bigger question is whether having our own DSL is justified ?
>
> I'm *really* sceptical that it is.
>
>
> We can't use JSON because it lacks comments. So we invented our own
> psuedo-JSON parser that supported comments, and used ' instead of "
> for some reason. We also needed to be able to parse a sequence of
> multiple JSON documents in one file. We should have just picked a
> different language because JSON clearly didn't do what we eneeded.
>
> You suggest naming them .py. If we do that, we must declare that they
> are literally Python code and modify them so that we can load the
> files straight into the python intepretor as code, and not parse
> them as data. I feel unhappy about treating data as code though.
>
>
> While JSON doesn't do what we need, its second-cousin YAML is a more
> flexible format. Taking one example
>
> ---
> ##
> # @ImageInfoSpecificQCow2:
> #
> # @compat: compatibility level
> #
> # ...snip...
> #
> # Since: 1.7
> ##
> struct: ImageInfoSpecificQCow2
> data:
>   compat: str
>   "*data-file": str
>   "*data-file-raw": bool
>   "*lazy-refcounts": bool
>   "*corrupt": bool
>   refcount-bits: int
>   "*encrypt": ImageInfoSpecificQCow2Encryption
>   "*bitmaps":
>     - Qcow2BitmapInfo
>   compression-type: Qcow2CompressionType
>
>
> Then we could use a regular off the shelf YAML parser in python.
>
> The uglyiness with quotes is due to the use of "*". Slightly less ugly
> if we simply declare that quotes are always used, even where they're
> not strictly required.
>
> struct: ImageInfoSpecificQCow2
> data:
>   "compat": "str"
>   "*data-file": "str"
>   "*data-file-raw": "bool"
>   "*lazy-refcounts": "bool"
>   "*corrupt": "bool"
>   "refcount-bits": "int"
>   "*encrypt": "ImageInfoSpecificQCow2Encryption"
>   "*bitmaps":
>     - "Qcow2BitmapInfo"
>   "compression-type": "Qcow2CompressionType"
>
> With the use of "---" to denote the start of document, we have no trouble
> parsing our files which would actually be a concatenation of multiple
> documents. The python YAML library provides the easy yaml.load_all()
> method.

We had the same issue in vdsm. Someone ported qemu "json" schema to vdsm,
probbay when the plan was to add C API to vdsm, which never happened.

My first patch to vdsm was fixing the parser for this "json" format,
because it used to
get in an endless loop if an unknown token was found. We hated this
format, and finally
replaced it with yaml. But we did not keep the comments since they
duplicate data
which is already in the json part, and not portable to other formats.

Here is the patch adding schema convertor from qemu "json" format to
standard yaml:
https://github.com/oVirt/vdsm/commit/e57b69e72987c0929b20306c454835b52b5eb7ee

The current version of the new yaml based schema:
https://github.com/oVirt/vdsm/blob/master/lib/vdsm/api/vdsm-api.yml

We don't use comments, so the yaml is portable to json or regular
python dict. In fact,
we use the schama in as a pickle of the parsed schema for 5 times
faster loading, which
is important since we use the schema in the command line client.

Having the comments part of the schema allows nice things like
verifying requests and
generating help messages directly from the schema. This is not a good
example before
our implementation is poor, but:

$ vdsm-client Host getDeviceList -h
usage: vdsm-client Host getDeviceList [-h] [arg=value [arg=value ...]]

positional arguments:
  arg=value   storageType: Only return devices of this type
              guids: Only return info on specific list of block device GUIDs
              checkStatus: Indicates if device status should be checked


              JSON representation:
              {
                  "storageType": {
                      "BlockDeviceType": "enum ['FCP', 'MIXED', 'iSCSI']"
                  },
                  "guids": [
                      "string",
                      {}
                  ],
                  "checkStatus": "boolean"
              }

optional arguments:
  -h, --help  show this help message and exit

vdsm-client knows nothing about vdsm API and we never have to change
it, because it generates
the command line interface and the help messages from the schema on
the fly, and its input and
output is json.

vdsm/client.py is similar, providing vdsm API without knowing anything
about the API, or requiring
changes when APIs are added or modified, because everything is done by
inspecting the schema:

>>> from vdsm import client
>>> c = client.connect("localhost")
>>> c.Host.getDeviceList(storageType="FCP", checkStatus=False)
[]
>>> print(c.Host)
<vdsm.client.Namespace object at 0x7fcda017fa58>
>>> print(c.Host.getDeviceList)
functools.partial(<bound method _Client._call of <vdsm.client._Client
object at 0x7fcda757e0f0>>, 'Host', 'getDeviceList')

I think inventing DSLs and developing tools is wrong. Use standard
format and tools and spend
time on the core of the project.

Nir

Re: [PATCH] schemas: Add vim modeline

Posted by Paolo Bonzini 5 years, 6 months ago

On 01/08/20 01:12, Nir Soffer wrote:
> I think inventing DSLs and developing tools is wrong. Use standard 
> format and tools and spend time on the core of the project.

Please don't apply 2020 standards to choices that were made in 2009.  Or
if you do, be ready to contribute code.

Paolo

Re: [PATCH] schemas: Add vim modeline

Posted by Markus Armbruster 5 years, 6 months ago

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 01/08/20 01:12, Nir Soffer wrote:
>> I think inventing DSLs and developing tools is wrong. Use standard 
>> format and tools and spend time on the core of the project.
>
> Please don't apply 2020 standards to choices that were made in 2009.  Or
> if you do, be ready to contribute code.

Is it still a good choice today?

For that question, we'd have to look beyond syntax.  Syntax has been the
most boring and least expensive part of QAPI.

Re: [PATCH] schemas: Add vim modeline

Posted by Alex Bennée 5 years, 6 months ago

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Thu, Jul 30, 2020 at 11:07:26AM +0200, Markus Armbruster wrote:
>> Andrea Bolognani <abologna@redhat.com> writes:
>> 
>> > The various schemas included in QEMU use a JSON-based format which
>> > is, however, strictly speaking not valid JSON.
>> >
>> > As a consequence, when vim tries to apply syntax highlight rules
>> > for JSON (as guessed from the file name), the result is an unreadable
>> > mess which mostly consist of red markers pointing out supposed errors
>> > in, well, pretty much everything.
>> >
>> > Using Python syntax highlighting produces much better results, and
>> > in fact these files already start with specially-formatted comments
>> > that instruct Emacs to process them as if they were Python files.
>> >
>> > This commit adds the equivalent special comments for vim.
>> >
>> > Signed-off-by: Andrea Bolognani <abologna@redhat.com>
>
> Given that we already have emacs mode-lines, I see no reason to
> not also have vim mode-lines, so regardless of the deeper discussion
> I think this is patch is fine to merge in the short term
>
>   Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

We could remove them all and defer to .editorconfig but I'm not going to
get in the way of making our Vim brethren's lives a little easier.

-- 
Alex Bennée

Re: [PATCH] schemas: Add vim modeline

Posted by Markus Armbruster 5 years, 5 months ago

Markus Armbruster <armbru@redhat.com> writes:

> Andrea Bolognani <abologna@redhat.com> writes:
>
>> The various schemas included in QEMU use a JSON-based format which
>> is, however, strictly speaking not valid JSON.
>>
>> As a consequence, when vim tries to apply syntax highlight rules
>> for JSON (as guessed from the file name), the result is an unreadable
>> mess which mostly consist of red markers pointing out supposed errors
>> in, well, pretty much everything.
>>
>> Using Python syntax highlighting produces much better results, and
>> in fact these files already start with specially-formatted comments
>> that instruct Emacs to process them as if they were Python files.
>>
>> This commit adds the equivalent special comments for vim.
>>
>> Signed-off-by: Andrea Bolognani <abologna@redhat.com>
>
> Naming QAPI schema files .json even though their contents isn't was a
> mistake.  Correcting it would be a pain.  If we correct it, then the
> sooner the better.
>
> Renaming them to .py gives decent editor support out of the box.  Their
> contents isn't quite Python, though: true vs. True, false vs. False.  Do
> we care?  Only a few dozen occurences; they could be adjusted.
>
> Renaming them to .qapi would perhaps be less confusing, for the price of
> "out of the box".
>
> Thoughts?

Let me try to summarize the discussion from my point of view.

Basing a DSL on some existing syntax has its advantages: the learning
curve is slightly more gentle, and we get some tooling for free, notably
editor support.

Basing on JSON was a mistake: JSON is designed for data interchange, not
for programming.

To make it fit for programming, we extended it, losing much of what we
gained by basing on existing syntax.

Use of .json file names for the resulting bastard is confusing.

Among the confused are editors.  We unconfuse them one by one by telling
them to treat the files as Python.  Works, because the syntax happens to
be a strict subset of Python's.

We discussed swapping out the base syntax layer for one that is actually
fit for purpose (and doesn't need extending), e.g. YAML.  Radical
change, benefits need to justify the cost.  Possible benefits:

* Better ergonomics for developers

  Depends on what we pick for a base, and how we use it.

  We discussed YAML at some length.  It's complex, and the way it
  assigns types is at odds with the QAPI schema's needs.  I doubt YAML
  could improve ergonomics all that much.

* Not having to maintain our own code for the lower layer

  Replacing the (trivial) JSON parser by glue for an off-the-shelf
  parser is quite unlikely to pay off.

  Replacing the (non-trivial) doc comment parser could be a win, but no
  credible replacements have been suggested so far.

* Make the schema more easily consumable by other programs

  Parsing is the easy part of this problem.  Making the easy part easier
  won't make the total problem appreciably easier.

Right now, I doubt the benefits are worth the cost.  To make me
reconsider, write a concise memo stating the goals, their benefits, the
possible ways to get there, and their cost.

More modest ways to stop our confusing misuse of .json:

* Rename QAPI schema files to .qapi, keep unconfusing tools one by one
  by telling them to treat the files as Python.

* Rename QAPI schema files to .py.  Optionally rename false, true to
  False, True in the schema language, so the schema expressions are
  semantically valid Python.  Humans might still find .py confusing.

* Rename QAPI schema files to .js, change comments from # to //.

* Keep .json, change comments from # to // and strings from ' to ".

  This is still not JSON, but should be close enough to make common
  tooling work.

The last two are too much churn for too little benefit, in my opinion.

I'm willing to do either of the first two.  If you have a preference,
let me know.

Re: [PATCH] schemas: Add vim modeline

Posted by Markus Armbruster 5 years, 6 months ago

Andrea Bolognani <abologna@redhat.com> writes:

> The various schemas included in QEMU use a JSON-based format which
> is, however, strictly speaking not valid JSON.
>
> As a consequence, when vim tries to apply syntax highlight rules
> for JSON (as guessed from the file name), the result is an unreadable
> mess which mostly consist of red markers pointing out supposed errors
> in, well, pretty much everything.
>
> Using Python syntax highlighting produces much better results, and
> in fact these files already start with specially-formatted comments
> that instruct Emacs to process them as if they were Python files.
>
> This commit adds the equivalent special comments for vim.
>
> Signed-off-by: Andrea Bolognani <abologna@redhat.com>

Fixing the real mistake would be better, but until then, mitigating it
helps, like the existing Emacs modelines do.

Queued, thanks!