]>
Commit | Line | Data |
---|---|---|
c3fb76b9 | 1 | Hexagon is Qualcomm's very long instruction word (VLIW) digital signal |
375bcf38 TS |
2 | processor(DSP). We also support Hexagon Vector eXtensions (HVX). HVX |
3 | is a wide vector coprocessor designed for high performance computer vision, | |
4 | image processing, machine learning, and other workloads. | |
c3fb76b9 TS |
5 | |
6 | The following versions of the Hexagon core are supported | |
7 | Scalar core: v67 | |
8 | https://developer.qualcomm.com/downloads/qualcomm-hexagon-v67-programmer-s-reference-manual | |
375bcf38 TS |
9 | HVX extension: v66 |
10 | https://developer.qualcomm.com/downloads/qualcomm-hexagon-v66-hvx-programmer-s-reference-manual | |
c3fb76b9 TS |
11 | |
12 | We presented an overview of the project at the 2019 KVM Forum. | |
13 | https://kvmforum2019.sched.com/event/Tmwc/qemu-hexagon-automatic-translation-of-the-isa-manual-pseudcode-to-tiny-code-instructions-of-a-vliw-architecture-niccolo-izzo-revng-taylor-simpson-qualcomm-innovation-center | |
14 | ||
15 | *** Tour of the code *** | |
16 | ||
17 | The qemu-hexagon implementation is a combination of qemu and the Hexagon | |
18 | architecture library (aka archlib). The three primary directories with | |
19 | Hexagon-specific code are | |
20 | ||
21 | qemu/target/hexagon | |
22 | This has all the instruction and packet semantics | |
23 | qemu/target/hexagon/imported | |
24 | These files are imported with very little modification from archlib | |
25 | *.idef Instruction semantics definition | |
26 | macros.def Mapping of macros to instruction attributes | |
27 | encode*.def Encoding patterns for each instruction | |
28 | iclass.def Instruction class definitions used to determine | |
29 | legal VLIW slots for each instruction | |
95e11505 ADF |
30 | qemu/target/hexagon/idef-parser |
31 | Parser that, given the high-level definitions of an instruction, | |
32 | produces a C function generating equivalent tiny code instructions. | |
33 | See README.rst. | |
c3fb76b9 TS |
34 | qemu/linux-user/hexagon |
35 | Helpers for loading the ELF file and making Linux system calls, | |
36 | signals, etc | |
37 | ||
38 | We start with scripts that generate a bunch of include files. This | |
39 | is a two step process. The first step is to use the C preprocessor to expand | |
40 | macros inside the architecture definition files. This is done in | |
41 | target/hexagon/gen_semantics.c. This step produces | |
42 | <BUILD_DIR>/target/hexagon/semantics_generated.pyinc. | |
43 | That file is consumed by the following python scripts to produce the indicated | |
44 | header files in <BUILD_DIR>/target/hexagon | |
45 | gen_opcodes_def.py -> opcodes_def_generated.h.inc | |
46 | gen_op_regs.py -> op_regs_generated.h.inc | |
47 | gen_printinsn.py -> printinsn_generated.h.inc | |
48 | gen_op_attribs.py -> op_attribs_generated.h.inc | |
49 | gen_helper_protos.py -> helper_protos_generated.h.inc | |
50 | gen_shortcode.py -> shortcode_generated.h.inc | |
51 | gen_tcg_funcs.py -> tcg_funcs_generated.c.inc | |
52 | gen_tcg_func_table.py -> tcg_func_table_generated.c.inc | |
53 | gen_helper_funcs.py -> helper_funcs_generated.c.inc | |
95e11505 | 54 | gen_idef_parser_funcs.py -> idef_parser_input.h |
10849c26 | 55 | gen_analyze_funcs.py -> analyze_funcs_generated.c.inc |
c3fb76b9 TS |
56 | |
57 | Qemu helper functions have 3 parts | |
58 | DEF_HELPER declaration indicates the signature of the helper | |
59 | gen_helper_<NAME> will generate a TCG call to the helper function | |
60 | The helper implementation | |
61 | ||
62 | Here's an example of the A2_add instruction. | |
63 | Instruction tag A2_add | |
64 | Assembly syntax "Rd32=add(Rs32,Rt32)" | |
65 | Instruction semantics "{ RdV=RsV+RtV;}" | |
66 | ||
67 | By convention, the operands are identified by letter | |
68 | RdV is the destination register | |
69 | RsV, RtV are source registers | |
70 | ||
71 | The generator uses the operand naming conventions (see large comment in | |
72 | hex_common.py) to determine the signature of the helper function. Here are the | |
73 | results for A2_add | |
74 | ||
75 | helper_protos_generated.h.inc | |
76 | DEF_HELPER_3(A2_add, s32, env, s32, s32) | |
77 | ||
78 | tcg_funcs_generated.c.inc | |
79 | static void generate_A2_add( | |
80 | CPUHexagonState *env, | |
81 | DisasContext *ctx, | |
82 | Insn *insn, | |
83 | Packet *pkt) | |
84 | { | |
7a819de8 | 85 | TCGv RdV = tcg_temp_new(); |
c3fb76b9 TS |
86 | const int RdN = insn->regno[0]; |
87 | TCGv RsV = hex_gpr[insn->regno[1]]; | |
88 | TCGv RtV = hex_gpr[insn->regno[2]]; | |
89 | gen_helper_A2_add(RdV, cpu_env, RsV, RtV); | |
90 | gen_log_reg_write(RdN, RdV); | |
c3fb76b9 TS |
91 | } |
92 | ||
93 | helper_funcs_generated.c.inc | |
94 | int32_t HELPER(A2_add)(CPUHexagonState *env, int32_t RsV, int32_t RtV) | |
95 | { | |
96 | uint32_t slot __attribute__((unused)) = 4; | |
97 | int32_t RdV = 0; | |
98 | { RdV=RsV+RtV;} | |
99 | return RdV; | |
100 | } | |
101 | ||
102 | Note that generate_A2_add updates the disassembly context to be processed | |
103 | when the packet commits (see "Packet Semantics" below). | |
104 | ||
105 | The generator checks for fGEN_TCG_<tag> macro. This allows us to generate | |
106 | TCG code instead of a call to the helper. If defined, the macro takes 1 | |
107 | argument. | |
108 | C semantics (aka short code) | |
109 | ||
110 | This allows the code generator to override the auto-generated code. In some | |
111 | cases this is necessary for correct execution. We can also override for | |
112 | faster emulation. For example, calling a helper for add is more expensive | |
113 | than generating a TCG add operation. | |
114 | ||
115 | The gen_tcg.h file has any overrides. For example, we could write | |
116 | #define fGEN_TCG_A2_add(GENHLPR, SHORTCODE) \ | |
117 | tcg_gen_add_tl(RdV, RsV, RtV) | |
118 | ||
119 | The instruction semantics C code relies heavily on macros. In cases where the | |
120 | C semantics are specified only with macros, we can override the default with | |
121 | the short semantics option and #define the macros to generate TCG code. One | |
122 | example is L2_loadw_locked: | |
123 | Instruction tag L2_loadw_locked | |
124 | Assembly syntax "Rd32=memw_locked(Rs32)" | |
125 | Instruction semantics "{ fEA_REG(RsV); fLOAD_LOCKED(1,4,u,EA,RdV) }" | |
126 | ||
127 | In gen_tcg.h, we use the shortcode | |
128 | #define fGEN_TCG_L2_loadw_locked(SHORTCODE) \ | |
129 | SHORTCODE | |
130 | ||
131 | There are also cases where we brute force the TCG code generation. | |
132 | Instructions with multiple definitions are examples. These require special | |
133 | handling because qemu helpers can only return a single value. | |
134 | ||
375bcf38 TS |
135 | For HVX vectors, the generator behaves slightly differently. The wide vectors |
136 | won't fit in a TCGv or TCGv_i64, so we pass TCGv_ptr variables to pass the | |
137 | address to helper functions. Here's an example for an HVX vector-add-word | |
138 | istruction. | |
c2b33d0b | 139 | static void generate_V6_vaddw(DisasContext *ctx) |
375bcf38 | 140 | { |
c2b33d0b | 141 | Insn *insn __attribute__((unused)) = ctx->insn; |
375bcf38 TS |
142 | const int VdN = insn->regno[0]; |
143 | const intptr_t VdV_off = | |
144 | ctx_future_vreg_off(ctx, VdN, 1, true); | |
7a819de8 | 145 | TCGv_ptr VdV = tcg_temp_new_ptr(); |
375bcf38 TS |
146 | tcg_gen_addi_ptr(VdV, cpu_env, VdV_off); |
147 | const int VuN = insn->regno[1]; | |
148 | const intptr_t VuV_off = | |
149 | vreg_src_off(ctx, VuN); | |
7a819de8 | 150 | TCGv_ptr VuV = tcg_temp_new_ptr(); |
375bcf38 TS |
151 | const int VvN = insn->regno[2]; |
152 | const intptr_t VvV_off = | |
153 | vreg_src_off(ctx, VvN); | |
7a819de8 | 154 | TCGv_ptr VvV = tcg_temp_new_ptr(); |
375bcf38 TS |
155 | tcg_gen_addi_ptr(VuV, cpu_env, VuV_off); |
156 | tcg_gen_addi_ptr(VvV, cpu_env, VvV_off); | |
c2b33d0b | 157 | gen_helper_V6_vaddw(cpu_env, VdV, VuV, VvV); |
375bcf38 TS |
158 | } |
159 | ||
160 | Notice that we also generate a variable named <operand>_off for each operand of | |
161 | the instruction. This makes it easy to override the instruction semantics with | |
162 | functions from tcg-op-gvec.h. Here's the override for this instruction. | |
163 | #define fGEN_TCG_V6_vaddw(SHORTCODE) \ | |
164 | tcg_gen_gvec_add(MO_32, VdV_off, VuV_off, VvV_off, \ | |
165 | sizeof(MMVector), sizeof(MMVector)) | |
166 | ||
167 | Finally, we notice that the override doesn't use the TCGv_ptr variables, so | |
168 | we don't generate them when an override is present. Here is what we generate | |
169 | when the override is present. | |
c2b33d0b | 170 | static void generate_V6_vaddw(DisasContext *ctx) |
375bcf38 | 171 | { |
c2b33d0b | 172 | Insn *insn __attribute__((unused)) = ctx->insn; |
375bcf38 TS |
173 | const int VdN = insn->regno[0]; |
174 | const intptr_t VdV_off = | |
175 | ctx_future_vreg_off(ctx, VdN, 1, true); | |
176 | const int VuN = insn->regno[1]; | |
177 | const intptr_t VuV_off = | |
178 | vreg_src_off(ctx, VuN); | |
179 | const int VvN = insn->regno[2]; | |
180 | const intptr_t VvV_off = | |
181 | vreg_src_off(ctx, VvN); | |
182 | fGEN_TCG_V6_vaddw({ fHIDE(int i;) fVFOREACH(32, i) { VdV.w[i] = VuV.w[i] + VvV.w[i] ; } }); | |
375bcf38 TS |
183 | } |
184 | ||
10849c26 TS |
185 | We also generate an analyze_<tag> function for each instruction. Currently, |
186 | these functions record the writes to registers by calling ctx_log_*. During | |
187 | gen_start_packet, we invoke the analyze_<tag> function for each instruction in | |
188 | the packet, and we mark the implicit writes. After the analysis is performed, | |
189 | we initialize hex_new_value for each of the predicated assignments. | |
190 | ||
c3fb76b9 TS |
191 | In addition to instruction semantics, we use a generator to create the decode |
192 | tree. This generation is also a two step process. The first step is to run | |
193 | target/hexagon/gen_dectree_import.c to produce | |
194 | <BUILD_DIR>/target/hexagon/iset.py | |
195 | This file is imported by target/hexagon/dectree.py to produce | |
196 | <BUILD_DIR>/target/hexagon/dectree_generated.h.inc | |
197 | ||
198 | *** Key Files *** | |
199 | ||
200 | cpu.h | |
201 | ||
202 | This file contains the definition of the CPUHexagonState struct. It is the | |
203 | runtime information for each thread and contains stuff like the GPR and | |
204 | predicate registers. | |
205 | ||
206 | macros.h | |
375bcf38 | 207 | mmvec/macros.h |
c3fb76b9 TS |
208 | |
209 | The Hexagon arch lib relies heavily on macros for the instruction semantics. | |
210 | This is a great advantage for qemu because we can override them for different | |
211 | purposes. You will also notice there are sometimes two definitions of a macro. | |
212 | The QEMU_GENERATE variable determines whether we want the macro to generate TCG | |
213 | code. If QEMU_GENERATE is not defined, we want the macro to generate vanilla | |
214 | C code that will work in the helper implementation. | |
215 | ||
216 | translate.c | |
217 | ||
218 | The functions in this file generate TCG code for a translation block. Some | |
219 | important functions in this file are | |
220 | ||
221 | gen_start_packet - initialize the data structures for packet semantics | |
222 | gen_commit_packet - commit the register writes, stores, etc for a packet | |
223 | decode_and_translate_packet - disassemble a packet and generate code | |
224 | ||
225 | genptr.c | |
226 | gen_tcg.h | |
227 | ||
228 | These files create a function for each instruction. It is mostly composed of | |
229 | fGEN_TCG_<tag> definitions followed by including tcg_funcs_generated.c.inc. | |
230 | ||
231 | op_helper.c | |
232 | ||
233 | This file contains the implementations of all the helpers. There are a few | |
234 | general purpose helpers, but most of them are generated by including | |
235 | helper_funcs_generated.c.inc. There are also several helpers used for debugging. | |
236 | ||
237 | ||
238 | *** Packet Semantics *** | |
239 | ||
240 | VLIW packet semantics differ from serial semantics in that all input operands | |
241 | are read, then the operations are performed, then all the results are written. | |
242 | For exmaple, this packet performs a swap of registers r0 and r1 | |
243 | { r0 = r1; r1 = r0 } | |
244 | Note that the result is different if the instructions are executed serially. | |
245 | ||
246 | Packet semantics dictate that we defer any changes of state until the entire | |
247 | packet is committed. We record the results of each instruction in a side data | |
248 | structure, and update the visible processor state when we commit the packet. | |
249 | ||
250 | The data structures are divided between the runtime state and the translation | |
251 | context. | |
252 | ||
253 | During the TCG generation (see translate.[ch]), we use the DisasContext to | |
254 | track what needs to be done during packet commit. Here are the relevant | |
255 | fields | |
256 | ||
257 | reg_log list of registers written | |
258 | reg_log_idx index into ctx_reg_log | |
259 | pred_log list of predicates written | |
260 | pred_log_idx index into ctx_pred_log | |
261 | store_width width of stores (indexed by slot) | |
262 | ||
263 | During runtime, the following fields in CPUHexagonState (see cpu.h) are used | |
264 | ||
265 | new_value new value of a given register | |
266 | reg_written boolean indicating if register was written | |
267 | new_pred_value new value of a predicate register | |
268 | pred_written boolean indicating if predicate was written | |
269 | mem_log_stores record of the stores (indexed by slot) | |
270 | ||
375bcf38 TS |
271 | For Hexagon Vector eXtensions (HVX), the following fields are used |
272 | VRegs Vector registers | |
273 | future_VRegs Registers to be stored during packet commit | |
274 | tmp_VRegs Temporary registers *not* stored during commit | |
375bcf38 TS |
275 | QRegs Q (vector predicate) registers |
276 | future_QRegs Registers to be stored during packet commit | |
375bcf38 | 277 | |
c3fb76b9 TS |
278 | *** Debugging *** |
279 | ||
280 | You can turn on a lot of debugging by changing the HEX_DEBUG macro to 1 in | |
281 | internal.h. This will stream a lot of information as it generates TCG and | |
282 | executes the code. | |
283 | ||
284 | To track down nasty issues with Hexagon->TCG generation, we compare the | |
285 | execution results with actual hardware running on a Hexagon Linux target. | |
286 | Run qemu with the "-d cpu" option. Then, we can diff the results and figure | |
287 | out where qemu and hardware behave differently. | |
288 | ||
289 | The stacks are located at different locations. We handle this by changing | |
290 | env->stack_adjust in translate.c. First, set this to zero and run qemu. | |
291 | Then, change env->stack_adjust to the difference between the two stack | |
292 | locations. Then rebuild qemu and run again. That will produce a very | |
293 | clean diff. | |
294 | ||
295 | Here are some handy places to set breakpoints | |
296 | ||
297 | At the call to gen_start_packet for a given PC (note that the line number | |
298 | might change in the future) | |
299 | br translate.c:602 if ctx->base.pc_next == 0xdeadbeef | |
300 | The helper function for each instruction is named helper_<TAG>, so here's | |
301 | an example that will set a breakpoint at the start | |
302 | br helper_A2_add | |
303 | If you have the HEX_DEBUG macro set, the following will be useful | |
304 | At the start of execution of a packet for a given PC | |
305 | br helper_debug_start_packet if env->gpr[41] == 0xdeadbeef | |
306 | At the end of execution of a packet for a given PC | |
307 | br helper_debug_commit_end if env->this_PC == 0xdeadbeef |