[PATCH] decodetree: Allow 'dot' in opcode names

Philippe Mathieu-Daudé posted 1 patch 3 years, 2 months ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20210112184156.2014305-1-f4bug@amsat.org
scripts/decodetree.py | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
[PATCH] decodetree: Allow 'dot' in opcode names
Posted by Philippe Mathieu-Daudé 3 years, 2 months ago
Some ISA use a dot in their opcodes. Allow the decodetree
script to process them. The dot is replaced by an underscore
in the generated code.

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
 scripts/decodetree.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index 47aa9caf6d1..b7572589e64 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -49,7 +49,7 @@
 re_arg_ident = '&[a-zA-Z0-9_]*'
 re_fld_ident = '%[a-zA-Z0-9_]*'
 re_fmt_ident = '@[a-zA-Z0-9_]*'
-re_pat_ident = '[a-zA-Z0-9_]*'
+re_pat_ident = '[a-zA-Z0-9_.]*'
 
 def error_with_file(file, lineno, *args):
     """Print an error message from file:line and args and exit."""
@@ -1082,6 +1082,7 @@ def parse_file(f, parent_pat):
         elif re.fullmatch(re_fmt_ident, name):
             parse_generic(start_lineno, None, name[1:], toks)
         elif re.fullmatch(re_pat_ident, name):
+            name = name.replace('.', '_')
             parse_generic(start_lineno, parent_pat, name, toks)
         else:
             error(lineno, 'invalid token "{0}"'.format(name))
-- 
2.26.2

Re: [PATCH] decodetree: Allow 'dot' in opcode names
Posted by Richard Henderson 3 years, 2 months ago
On 1/12/21 8:41 AM, Philippe Mathieu-Daudé wrote:
> Some ISA use a dot in their opcodes. Allow the decodetree
> script to process them. The dot is replaced by an underscore
> in the generated code.

Given that you then have to remember to use '_' on the C side, what advantage
does this give?


r~

Re: [PATCH] decodetree: Allow 'dot' in opcode names
Posted by Philippe Mathieu-Daudé 3 years, 2 months ago
On 1/12/21 9:44 PM, Richard Henderson wrote:
> On 1/12/21 8:41 AM, Philippe Mathieu-Daudé wrote:
>> Some ISA use a dot in their opcodes. Allow the decodetree
>> script to process them. The dot is replaced by an underscore
>> in the generated code.
> 
> Given that you then have to remember to use '_' on the C side, what advantage
> does this give?

The direct advantage is you can copy/paste the opcode in a PDF viewer
without having to edit it :)

See i.e. some Loongson opcodes [*]:

MULT.G          011100 ..... ..... ..... 00000 010000   @rs_rt_rd
DMULT.G         011100 ..... ..... ..... 00000 010001   @rs_rt_rd
MULTU.G         011100 ..... ..... ..... 00000 010010   @rs_rt_rd
DMULTU.G        011100 ..... ..... ..... 00000 010011   @rs_rt_rd

DIV.G           011100 ..... ..... ..... 00000 010100   @rs_rt_rd
DDIV.G          011100 ..... ..... ..... 00000 010101   @rs_rt_rd
DIVU.G          011100 ..... ..... ..... 00000 010110   @rs_rt_rd
DDIVU.G         011100 ..... ..... ..... 00000 010111   @rs_rt_rd

MOD.G           011100 ..... ..... ..... 00000 011100   @rs_rt_rd
DMOD.G          011100 ..... ..... ..... 00000 011101   @rs_rt_rd
MODU.G          011100 ..... ..... ..... 00000 011110   @rs_rt_rd
DMODU.G         011100 ..... ..... ..... 00000 011111   @rs_rt_rd

The other - remote - advantage I see is when using a disassembler
based on decodetree (as AVR does), the opcode displayed also matches
the specs. We are not yet there with MIPS, but I have something in
progress...

[*] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg02509.html

Re: [PATCH] decodetree: Allow 'dot' in opcode names
Posted by Eduardo Habkost 3 years, 2 months ago
On Tue, Jan 12, 2021 at 11:15:38PM +0100, Philippe Mathieu-Daudé wrote:
> On 1/12/21 9:44 PM, Richard Henderson wrote:
> > On 1/12/21 8:41 AM, Philippe Mathieu-Daudé wrote:
> >> Some ISA use a dot in their opcodes. Allow the decodetree
> >> script to process them. The dot is replaced by an underscore
> >> in the generated code.
> > 
> > Given that you then have to remember to use '_' on the C side, what advantage
> > does this give?
> 
> The direct advantage is you can copy/paste the opcode in a PDF viewer
> without having to edit it :)
> 
> See i.e. some Loongson opcodes [*]:
> 
> MULT.G          011100 ..... ..... ..... 00000 010000   @rs_rt_rd
> DMULT.G         011100 ..... ..... ..... 00000 010001   @rs_rt_rd
> MULTU.G         011100 ..... ..... ..... 00000 010010   @rs_rt_rd
> DMULTU.G        011100 ..... ..... ..... 00000 010011   @rs_rt_rd
> 
> DIV.G           011100 ..... ..... ..... 00000 010100   @rs_rt_rd
> DDIV.G          011100 ..... ..... ..... 00000 010101   @rs_rt_rd
> DIVU.G          011100 ..... ..... ..... 00000 010110   @rs_rt_rd
> DDIVU.G         011100 ..... ..... ..... 00000 010111   @rs_rt_rd
> 
> MOD.G           011100 ..... ..... ..... 00000 011100   @rs_rt_rd
> DMOD.G          011100 ..... ..... ..... 00000 011101   @rs_rt_rd
> MODU.G          011100 ..... ..... ..... 00000 011110   @rs_rt_rd
> DMODU.G         011100 ..... ..... ..... 00000 011111   @rs_rt_rd
> 
> The other - remote - advantage I see is when using a disassembler
> based on decodetree (as AVR does), the opcode displayed also matches
> the specs. We are not yet there with MIPS, but I have something in
> progress...

Interesting.  So, the decodetree format is not used exclusively
inside the QEMU source tree, but also by other projects?  Is
there a specification somewhere else?

> 
> [*] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg02509.html
> 

-- 
Eduardo


Re: [PATCH] decodetree: Allow 'dot' in opcode names
Posted by Philippe Mathieu-Daudé 3 years, 2 months ago
On 1/12/21 11:28 PM, Eduardo Habkost wrote:
> On Tue, Jan 12, 2021 at 11:15:38PM +0100, Philippe Mathieu-Daudé wrote:
>> On 1/12/21 9:44 PM, Richard Henderson wrote:
>>> On 1/12/21 8:41 AM, Philippe Mathieu-Daudé wrote:
>>>> Some ISA use a dot in their opcodes. Allow the decodetree
>>>> script to process them. The dot is replaced by an underscore
>>>> in the generated code.
>>>
>>> Given that you then have to remember to use '_' on the C side, what advantage
>>> does this give?
>>
>> The direct advantage is you can copy/paste the opcode in a PDF viewer
>> without having to edit it :)
>>
>> See i.e. some Loongson opcodes [*]:
>>
>> MULT.G          011100 ..... ..... ..... 00000 010000   @rs_rt_rd
>> DMULT.G         011100 ..... ..... ..... 00000 010001   @rs_rt_rd
>> MULTU.G         011100 ..... ..... ..... 00000 010010   @rs_rt_rd
>> DMULTU.G        011100 ..... ..... ..... 00000 010011   @rs_rt_rd
>>
>> DIV.G           011100 ..... ..... ..... 00000 010100   @rs_rt_rd
>> DDIV.G          011100 ..... ..... ..... 00000 010101   @rs_rt_rd
>> DIVU.G          011100 ..... ..... ..... 00000 010110   @rs_rt_rd
>> DDIVU.G         011100 ..... ..... ..... 00000 010111   @rs_rt_rd
>>
>> MOD.G           011100 ..... ..... ..... 00000 011100   @rs_rt_rd
>> DMOD.G          011100 ..... ..... ..... 00000 011101   @rs_rt_rd
>> MODU.G          011100 ..... ..... ..... 00000 011110   @rs_rt_rd
>> DMODU.G         011100 ..... ..... ..... 00000 011111   @rs_rt_rd
>>
>> The other - remote - advantage I see is when using a disassembler
>> based on decodetree (as AVR does), the opcode displayed also matches
>> the specs. We are not yet there with MIPS, but I have something in
>> progress...
> 
> Interesting.  So, the decodetree format is not used exclusively
> inside the QEMU source tree, but also by other projects?  Is
> there a specification somewhere else?

"as AVR does in QEMU", see commit 9d8caa67a24
("target/avr: Add support for disassembling via option '-d in_asm'").

What seduces me with decodetree is we don't need to match QEMU
instruction class with each CPU capabilities. IOW we can use the
same decoder for TCG and disassembly, and the disassembly matches
the instruction set of the CPU (with all the specific instructions).

Currently some specific opcodes are displayed as generic ones (or
as unknown via hexadecimal value). Unfortunately not something we
can show with QEMU AVR target because the ISA is very simple.

Regards,

Phil.

Re: [PATCH] decodetree: Allow 'dot' in opcode names
Posted by Eduardo Habkost 3 years, 2 months ago
On Tue, Jan 12, 2021 at 07:41:56PM +0100, Philippe Mathieu-Daudé wrote:
> Some ISA use a dot in their opcodes. Allow the decodetree
> script to process them. The dot is replaced by an underscore
> in the generated code.

Will something break if we just use underscores instead of dots
in the input file?

> 
> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> ---
>  scripts/decodetree.py | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> index 47aa9caf6d1..b7572589e64 100644
> --- a/scripts/decodetree.py
> +++ b/scripts/decodetree.py
> @@ -49,7 +49,7 @@
>  re_arg_ident = '&[a-zA-Z0-9_]*'
>  re_fld_ident = '%[a-zA-Z0-9_]*'
>  re_fmt_ident = '@[a-zA-Z0-9_]*'
> -re_pat_ident = '[a-zA-Z0-9_]*'
> +re_pat_ident = '[a-zA-Z0-9_.]*'

If pattern identifiers are going to follow different rules,
doesn't this need to be documented at docs/devel/decodetree.rst?

>  
>  def error_with_file(file, lineno, *args):
>      """Print an error message from file:line and args and exit."""
> @@ -1082,6 +1082,7 @@ def parse_file(f, parent_pat):
>          elif re.fullmatch(re_fmt_ident, name):
>              parse_generic(start_lineno, None, name[1:], toks)
>          elif re.fullmatch(re_pat_ident, name):
> +            name = name.replace('.', '_')
>              parse_generic(start_lineno, parent_pat, name, toks)

Do we want error messages generated by the script to use the
modified identifier with underscores, or the original identifier
with dots?  (This patch does the former)

>          else:
>              error(lineno, 'invalid token "{0}"'.format(name))
> -- 
> 2.26.2
> 

-- 
Eduardo


Re: [PATCH] decodetree: Allow 'dot' in opcode names
Posted by Philippe Mathieu-Daudé 3 years, 2 months ago
On 1/12/21 10:05 PM, Eduardo Habkost wrote:
> On Tue, Jan 12, 2021 at 07:41:56PM +0100, Philippe Mathieu-Daudé wrote:
>> Some ISA use a dot in their opcodes. Allow the decodetree
>> script to process them. The dot is replaced by an underscore
>> in the generated code.
> 
> Will something break if we just use underscores instead of dots
> in the input file?

No, but then the opcode doesn't really match the spec.

> 
>>
>> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>> ---
>>  scripts/decodetree.py | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
>> index 47aa9caf6d1..b7572589e64 100644
>> --- a/scripts/decodetree.py
>> +++ b/scripts/decodetree.py
>> @@ -49,7 +49,7 @@
>>  re_arg_ident = '&[a-zA-Z0-9_]*'
>>  re_fld_ident = '%[a-zA-Z0-9_]*'
>>  re_fmt_ident = '@[a-zA-Z0-9_]*'
>> -re_pat_ident = '[a-zA-Z0-9_]*'
>> +re_pat_ident = '[a-zA-Z0-9_.]*'
> 
> If pattern identifiers are going to follow different rules,
> doesn't this need to be documented at docs/devel/decodetree.rst?

I checked and luckily for me the opcode pattern identifiers is
not documented <:)

> 
>>  
>>  def error_with_file(file, lineno, *args):
>>      """Print an error message from file:line and args and exit."""
>> @@ -1082,6 +1082,7 @@ def parse_file(f, parent_pat):
>>          elif re.fullmatch(re_fmt_ident, name):
>>              parse_generic(start_lineno, None, name[1:], toks)
>>          elif re.fullmatch(re_pat_ident, name):
>> +            name = name.replace('.', '_')
>>              parse_generic(start_lineno, parent_pat, name, toks)
> 
> Do we want error messages generated by the script to use the
> modified identifier with underscores, or the original identifier
> with dots?  (This patch does the former)

You are right, we want the former in the error message (the input
format).

Thanks,

Phil.

Re: [PATCH] decodetree: Allow 'dot' in opcode names
Posted by Eduardo Habkost 3 years, 2 months ago
On Tue, Jan 12, 2021 at 11:19:49PM +0100, Philippe Mathieu-Daudé wrote:
> On 1/12/21 10:05 PM, Eduardo Habkost wrote:
[...]
> >> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> >> index 47aa9caf6d1..b7572589e64 100644
> >> --- a/scripts/decodetree.py
> >> +++ b/scripts/decodetree.py
> >> @@ -49,7 +49,7 @@
> >>  re_arg_ident = '&[a-zA-Z0-9_]*'
> >>  re_fld_ident = '%[a-zA-Z0-9_]*'
> >>  re_fmt_ident = '@[a-zA-Z0-9_]*'
> >> -re_pat_ident = '[a-zA-Z0-9_]*'
> >> +re_pat_ident = '[a-zA-Z0-9_.]*'
> > 
> > If pattern identifiers are going to follow different rules,
> > doesn't this need to be documented at docs/devel/decodetree.rst?
> 
> I checked and luckily for me the opcode pattern identifiers is
> not documented <:)

The format is not documented, but the specification grammar
implies the same rules apply to all identifiers.

-- 
Eduardo