docs/devel/decodetree.rst

   1 ========================
   2 Decodetree Specification
   3 ========================
   4
   5 A *decodetree* is built from instruction *patterns*.  A pattern may
   6 represent a single architectural instruction or a group of same, depending
   7 on what is convenient for further processing.
   8
   9 Each pattern has both *fixedbits* and *fixedmask*, the combination of which
  10 describes the condition under which the pattern is matched::
  11
  12   (insn & fixedmask) == fixedbits
  13
  14 Each pattern may have *fields*, which are extracted from the insn and
  15 passed along to the translator.  Examples of such are registers,
  16 immediates, and sub-opcodes.
  17
  18 In support of patterns, one may declare *fields*, *argument sets*, and
  19 *formats*, each of which may be re-used to simplify further definitions.
  20
  21 Fields
  22 ======
  23
  24 Syntax::
  25
  26   field_def     := '%' identifier ( unnamed_field )* ( !function=identifier )?
  27   unnamed_field := number ':' ( 's' ) number
  28
  29 For *unnamed_field*, the first number is the least-significant bit position
  30 of the field and the second number is the length of the field.  If the 's' is
  31 present, the field is considered signed.  If multiple ``unnamed_fields`` are
  32 present, they are concatenated.  In this way one can define disjoint fields.
  33
  34 If ``!function`` is specified, the concatenated result is passed through the
  35 named function, taking and returning an integral value.
  36
  37 One may use ``!function`` with zero ``unnamed_fields``.  This case is called
  38 a *parameter*, and the named function is only passed the ``DisasContext``
  39 and returns an integral value extracted from there.
  40
  41 A field with no ``unnamed_fields`` and no ``!function`` is in error.
  42
  43 Field examples:
  44
  45 +---------------------------+---------------------------------------------+
  46 | Input                     | Generated code                              |
  47 +===========================+=============================================+
  48 | %disp   0:s16             | sextract(i, 0, 16)                          |
  49 +---------------------------+---------------------------------------------+
  50 | %imm9   16:6 10:3         | extract(i, 16, 6) << 3 | extract(i, 10, 3)  |
  51 +---------------------------+---------------------------------------------+
  52 | %disp12 0:s1 1:1 2:10     | sextract(i, 0, 1) << 11 |                   |
  53 |                           |    extract(i, 1, 1) << 10 |                 |
  54 |                           |    extract(i, 2, 10)                        |
  55 +---------------------------+---------------------------------------------+
  56 | %shimm8 5:s8 13:1         | expand_shimm8(sextract(i, 5, 8) << 1 |      |
  57 |   !function=expand_shimm8 |               extract(i, 13, 1))            |
  58 +---------------------------+---------------------------------------------+
  59
  60 Argument Sets
  61 =============
  62
  63 Syntax::
  64
  65   args_def    := '&' identifier ( args_elt )+ ( !extern )?
  66   args_elt    := identifier (':' identifier)?
  67
  68 Each *args_elt* defines an argument within the argument set.
  69 If the form of the *args_elt* contains a colon, the first
  70 identifier is the argument name and the second identifier is
  71 the argument type.  If the colon is missing, the argument
  72 type will be ``int``.
  73
  74 Each argument set will be rendered as a C structure "arg_$name"
  75 with each of the fields being one of the member arguments.
  76
  77 If ``!extern`` is specified, the backing structure is assumed
  78 to have been already declared, typically via a second decoder.
  79
  80 Argument sets are useful when one wants to define helper functions
  81 for the translator functions that can perform operations on a common
  82 set of arguments.  This can ensure, for instance, that the ``AND``
  83 pattern and the ``OR`` pattern put their operands into the same named
  84 structure, so that a common ``gen_logic_insn`` may be able to handle
  85 the operations common between the two.
  86
  87 Argument set examples::
  88
  89   &reg3       ra rb rc
  90   &loadstore  reg base offset
  91   &longldst   reg base offset:int64_t
  92
  93
  94 Formats
  95 =======
  96
  97 Syntax::
  98
  99   fmt_def      := '@' identifier ( fmt_elt )+
 100   fmt_elt      := fixedbit_elt | field_elt | field_ref | args_ref
 101   fixedbit_elt := [01.-]+
 102   field_elt    := identifier ':' 's'? number
 103   field_ref    := '%' identifier | identifier '=' '%' identifier
 104   args_ref     := '&' identifier
 105
 106 Defining a format is a handy way to avoid replicating groups of fields
 107 across many instruction patterns.
 108
 109 A *fixedbit_elt* describes a contiguous sequence of bits that must
 110 be 1, 0, or don't care.  The difference between '.' and '-'
 111 is that '.' means that the bit will be covered with a field or a
 112 final 0 or 1 from the pattern, and '-' means that the bit is really
 113 ignored by the cpu and will not be specified.
 114
 115 A *field_elt* describes a simple field only given a width; the position of
 116 the field is implied by its position with respect to other *fixedbit_elt*
 117 and *field_elt*.
 118
 119 If any *fixedbit_elt* or *field_elt* appear, then all bits must be defined.
 120 Padding with a *fixedbit_elt* of all '.' is an easy way to accomplish that.
 121
 122 A *field_ref* incorporates a field by reference.  This is the only way to
 123 add a complex field to a format.  A field may be renamed in the process
 124 via assignment to another identifier.  This is intended to allow the
 125 same argument set be used with disjoint named fields.
 126
 127 A single *args_ref* may specify an argument set to use for the format.
 128 The set of fields in the format must be a subset of the arguments in
 129 the argument set.  If an argument set is not specified, one will be
 130 inferred from the set of fields.
 131
 132 It is recommended, but not required, that all *field_ref* and *args_ref*
 133 appear at the end of the line, not interleaving with *fixedbit_elf* or
 134 *field_elt*.
 135
 136 Format examples::
 137
 138   @opr    ...... ra:5 rb:5 ... 0 ....... rc:5
 139   @opi    ...... ra:5 lit:8    1 ....... rc:5
 140
 141 Patterns
 142 ========
 143
 144 Syntax::
 145
 146   pat_def      := identifier ( pat_elt )+
 147   pat_elt      := fixedbit_elt | field_elt | field_ref | args_ref | fmt_ref | const_elt
 148   fmt_ref      := '@' identifier
 149   const_elt    := identifier '=' number
 150
 151 The *fixedbit_elt* and *field_elt* specifiers are unchanged from formats.
 152 A pattern that does not specify a named format will have one inferred
 153 from a referenced argument set (if present) and the set of fields.
 154
 155 A *const_elt* allows a argument to be set to a constant value.  This may
 156 come in handy when fields overlap between patterns and one has to
 157 include the values in the *fixedbit_elt* instead.
 158
 159 The decoder will call a translator function for each pattern matched.
 160
 161 Pattern examples::
 162
 163   addl_r   010000 ..... ..... .... 0000000 ..... @opr
 164   addl_i   010000 ..... ..... .... 0000000 ..... @opi
 165
 166 which will, in part, invoke::
 167
 168   trans_addl_r(ctx, &arg_opr, insn)
 169
 170 and::
 171
 172   trans_addl_i(ctx, &arg_opi, insn)
 173
 174 Pattern Groups
 175 ==============
 176
 177 Syntax::
 178
 179   group            := overlap_group | no_overlap_group
 180   overlap_group    := '{' ( pat_def | group )+ '}'
 181   no_overlap_group := '[' ( pat_def | group )+ ']'
 182
 183 A *group* begins with a lone open-brace or open-bracket, with all
 184 subsequent lines indented two spaces, and ending with a lone
 185 close-brace or close-bracket.  Groups may be nested, increasing the
 186 required indentation of the lines within the nested group to two
 187 spaces per nesting level.
 188
 189 Patterns within overlap groups are allowed to overlap.  Conflicts are
 190 resolved by selecting the patterns in order.  If all of the fixedbits
 191 for a pattern match, its translate function will be called.  If the
 192 translate function returns false, then subsequent patterns within the
 193 group will be matched.
 194
 195 Patterns within no-overlap groups are not allowed to overlap, just
 196 the same as ungrouped patterns.  Thus no-overlap groups are intended
 197 to be nested inside overlap groups.
 198
 199 The following example from PA-RISC shows specialization of the *or*
 200 instruction::
 201
 202   {
 203     {
 204       nop   000010 ----- ----- 0000 001001 0 00000
 205       copy  000010 00000 r1:5  0000 001001 0 rt:5
 206     }
 207     or      000010 rt2:5 r1:5  cf:4 001001 0 rt:5
 208   }
 209
 210 When the *cf* field is zero, the instruction has no side effects,
 211 and may be specialized.  When the *rt* field is zero, the output
 212 is discarded and so the instruction has no effect.  When the *rt2*
 213 field is zero, the operation is ``reg[r1] | 0`` and so encodes
 214 the canonical register copy operation.
 215
 216 The output from the generator might look like::
 217
 218   switch (insn & 0xfc000fe0) {
 219   case 0x08000240:
 220     /* 000010.. ........ ....0010 010..... */
 221     if ((insn & 0x0000f000) == 0x00000000) {
 222         /* 000010.. ........ 00000010 010..... */
 223         if ((insn & 0x0000001f) == 0x00000000) {
 224             /* 000010.. ........ 00000010 01000000 */
 225             extract_decode_Fmt_0(&u.f_decode0, insn);
 226             if (trans_nop(ctx, &u.f_decode0)) return true;
 227         }
 228         if ((insn & 0x03e00000) == 0x00000000) {
 229             /* 00001000 000..... 00000010 010..... */
 230             extract_decode_Fmt_1(&u.f_decode1, insn);
 231             if (trans_copy(ctx, &u.f_decode1)) return true;
 232         }
 233     }
 234     extract_decode_Fmt_2(&u.f_decode2, insn);
 235     if (trans_or(ctx, &u.f_decode2)) return true;
 236     return false;
 237   }