]>
Commit | Line | Data |
---|---|---|
970d7e83 LB |
1 | ============================================================ |
2 | Kaleidoscope: Extending the Language: User-defined Operators | |
3 | ============================================================ | |
4 | ||
5 | .. contents:: | |
6 | :local: | |
7 | ||
8 | Chapter 6 Introduction | |
9 | ====================== | |
10 | ||
11 | Welcome to Chapter 6 of the "`Implementing a language with | |
12 | LLVM <index.html>`_" tutorial. At this point in our tutorial, we now | |
13 | have a fully functional language that is fairly minimal, but also | |
14 | useful. There is still one big problem with it, however. Our language | |
15 | doesn't have many useful operators (like division, logical negation, or | |
16 | even any comparisons besides less-than). | |
17 | ||
18 | This chapter of the tutorial takes a wild digression into adding | |
19 | user-defined operators to the simple and beautiful Kaleidoscope | |
20 | language. This digression now gives us a simple and ugly language in | |
21 | some ways, but also a powerful one at the same time. One of the great | |
22 | things about creating your own language is that you get to decide what | |
23 | is good or bad. In this tutorial we'll assume that it is okay to use | |
24 | this as a way to show some interesting parsing techniques. | |
25 | ||
26 | At the end of this tutorial, we'll run through an example Kaleidoscope | |
27 | application that `renders the Mandelbrot set <#example>`_. This gives an | |
28 | example of what you can build with Kaleidoscope and its feature set. | |
29 | ||
30 | User-defined Operators: the Idea | |
31 | ================================ | |
32 | ||
33 | The "operator overloading" that we will add to Kaleidoscope is more | |
34 | general than languages like C++. In C++, you are only allowed to | |
35 | redefine existing operators: you can't programatically change the | |
36 | grammar, introduce new operators, change precedence levels, etc. In this | |
37 | chapter, we will add this capability to Kaleidoscope, which will let the | |
38 | user round out the set of operators that are supported. | |
39 | ||
40 | The point of going into user-defined operators in a tutorial like this | |
41 | is to show the power and flexibility of using a hand-written parser. | |
42 | Thus far, the parser we have been implementing uses recursive descent | |
43 | for most parts of the grammar and operator precedence parsing for the | |
44 | expressions. See `Chapter 2 <OCamlLangImpl2.html>`_ for details. Without | |
45 | using operator precedence parsing, it would be very difficult to allow | |
46 | the programmer to introduce new operators into the grammar: the grammar | |
47 | is dynamically extensible as the JIT runs. | |
48 | ||
49 | The two specific features we'll add are programmable unary operators | |
50 | (right now, Kaleidoscope has no unary operators at all) as well as | |
51 | binary operators. An example of this is: | |
52 | ||
53 | :: | |
54 | ||
55 | # Logical unary not. | |
56 | def unary!(v) | |
57 | if v then | |
58 | 0 | |
59 | else | |
60 | 1; | |
61 | ||
62 | # Define > with the same precedence as <. | |
63 | def binary> 10 (LHS RHS) | |
64 | RHS < LHS; | |
65 | ||
66 | # Binary "logical or", (note that it does not "short circuit") | |
67 | def binary| 5 (LHS RHS) | |
68 | if LHS then | |
69 | 1 | |
70 | else if RHS then | |
71 | 1 | |
72 | else | |
73 | 0; | |
74 | ||
75 | # Define = with slightly lower precedence than relationals. | |
76 | def binary= 9 (LHS RHS) | |
77 | !(LHS < RHS | LHS > RHS); | |
78 | ||
79 | Many languages aspire to being able to implement their standard runtime | |
80 | library in the language itself. In Kaleidoscope, we can implement | |
81 | significant parts of the language in the library! | |
82 | ||
83 | We will break down implementation of these features into two parts: | |
84 | implementing support for user-defined binary operators and adding unary | |
85 | operators. | |
86 | ||
87 | User-defined Binary Operators | |
88 | ============================= | |
89 | ||
90 | Adding support for user-defined binary operators is pretty simple with | |
91 | our current framework. We'll first add support for the unary/binary | |
92 | keywords: | |
93 | ||
94 | .. code-block:: ocaml | |
95 | ||
96 | type token = | |
97 | ... | |
98 | (* operators *) | |
99 | | Binary | Unary | |
100 | ||
101 | ... | |
102 | ||
103 | and lex_ident buffer = parser | |
104 | ... | |
105 | | "for" -> [< 'Token.For; stream >] | |
106 | | "in" -> [< 'Token.In; stream >] | |
107 | | "binary" -> [< 'Token.Binary; stream >] | |
108 | | "unary" -> [< 'Token.Unary; stream >] | |
109 | ||
110 | This just adds lexer support for the unary and binary keywords, like we | |
111 | did in `previous chapters <OCamlLangImpl5.html#iflexer>`_. One nice | |
112 | thing about our current AST, is that we represent binary operators with | |
113 | full generalisation by using their ASCII code as the opcode. For our | |
114 | extended operators, we'll use this same representation, so we don't need | |
115 | any new AST or parser support. | |
116 | ||
117 | On the other hand, we have to be able to represent the definitions of | |
118 | these new operators, in the "def binary\| 5" part of the function | |
119 | definition. In our grammar so far, the "name" for the function | |
120 | definition is parsed as the "prototype" production and into the | |
121 | ``Ast.Prototype`` AST node. To represent our new user-defined operators | |
122 | as prototypes, we have to extend the ``Ast.Prototype`` AST node like | |
123 | this: | |
124 | ||
125 | .. code-block:: ocaml | |
126 | ||
127 | (* proto - This type represents the "prototype" for a function, which captures | |
128 | * its name, and its argument names (thus implicitly the number of arguments the | |
129 | * function takes). *) | |
130 | type proto = | |
131 | | Prototype of string * string array | |
132 | | BinOpPrototype of string * string array * int | |
133 | ||
134 | Basically, in addition to knowing a name for the prototype, we now keep | |
135 | track of whether it was an operator, and if it was, what precedence | |
136 | level the operator is at. The precedence is only used for binary | |
137 | operators (as you'll see below, it just doesn't apply for unary | |
138 | operators). Now that we have a way to represent the prototype for a | |
139 | user-defined operator, we need to parse it: | |
140 | ||
141 | .. code-block:: ocaml | |
142 | ||
143 | (* prototype | |
144 | * ::= id '(' id* ')' | |
145 | * ::= binary LETTER number? (id, id) | |
146 | * ::= unary LETTER number? (id) *) | |
147 | let parse_prototype = | |
148 | let rec parse_args accumulator = parser | |
149 | | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e | |
150 | | [< >] -> accumulator | |
151 | in | |
152 | let parse_operator = parser | |
153 | | [< 'Token.Unary >] -> "unary", 1 | |
154 | | [< 'Token.Binary >] -> "binary", 2 | |
155 | in | |
156 | let parse_binary_precedence = parser | |
157 | | [< 'Token.Number n >] -> int_of_float n | |
158 | | [< >] -> 30 | |
159 | in | |
160 | parser | |
161 | | [< 'Token.Ident id; | |
162 | 'Token.Kwd '(' ?? "expected '(' in prototype"; | |
163 | args=parse_args []; | |
164 | 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> | |
165 | (* success. *) | |
166 | Ast.Prototype (id, Array.of_list (List.rev args)) | |
167 | | [< (prefix, kind)=parse_operator; | |
168 | 'Token.Kwd op ?? "expected an operator"; | |
169 | (* Read the precedence if present. *) | |
170 | binary_precedence=parse_binary_precedence; | |
171 | 'Token.Kwd '(' ?? "expected '(' in prototype"; | |
172 | args=parse_args []; | |
173 | 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> | |
174 | let name = prefix ^ (String.make 1 op) in | |
175 | let args = Array.of_list (List.rev args) in | |
176 | ||
177 | (* Verify right number of arguments for operator. *) | |
178 | if Array.length args != kind | |
179 | then raise (Stream.Error "invalid number of operands for operator") | |
180 | else | |
181 | if kind == 1 then | |
182 | Ast.Prototype (name, args) | |
183 | else | |
184 | Ast.BinOpPrototype (name, args, binary_precedence) | |
185 | | [< >] -> | |
186 | raise (Stream.Error "expected function name in prototype") | |
187 | ||
188 | This is all fairly straightforward parsing code, and we have already | |
189 | seen a lot of similar code in the past. One interesting part about the | |
190 | code above is the couple lines that set up ``name`` for binary | |
191 | operators. This builds names like "binary@" for a newly defined "@" | |
192 | operator. This then takes advantage of the fact that symbol names in the | |
193 | LLVM symbol table are allowed to have any character in them, including | |
194 | embedded nul characters. | |
195 | ||
196 | The next interesting thing to add, is codegen support for these binary | |
197 | operators. Given our current structure, this is a simple addition of a | |
198 | default case for our existing binary operator node: | |
199 | ||
200 | .. code-block:: ocaml | |
201 | ||
202 | let codegen_expr = function | |
203 | ... | |
204 | | Ast.Binary (op, lhs, rhs) -> | |
205 | let lhs_val = codegen_expr lhs in | |
206 | let rhs_val = codegen_expr rhs in | |
207 | begin | |
208 | match op with | |
209 | | '+' -> build_add lhs_val rhs_val "addtmp" builder | |
210 | | '-' -> build_sub lhs_val rhs_val "subtmp" builder | |
211 | | '*' -> build_mul lhs_val rhs_val "multmp" builder | |
212 | | '<' -> | |
213 | (* Convert bool 0/1 to double 0.0 or 1.0 *) | |
214 | let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in | |
215 | build_uitofp i double_type "booltmp" builder | |
216 | | _ -> | |
217 | (* If it wasn't a builtin binary operator, it must be a user defined | |
218 | * one. Emit a call to it. *) | |
219 | let callee = "binary" ^ (String.make 1 op) in | |
220 | let callee = | |
221 | match lookup_function callee the_module with | |
222 | | Some callee -> callee | |
223 | | None -> raise (Error "binary operator not found!") | |
224 | in | |
225 | build_call callee [|lhs_val; rhs_val|] "binop" builder | |
226 | end | |
227 | ||
228 | As you can see above, the new code is actually really simple. It just | |
229 | does a lookup for the appropriate operator in the symbol table and | |
230 | generates a function call to it. Since user-defined operators are just | |
231 | built as normal functions (because the "prototype" boils down to a | |
232 | function with the right name) everything falls into place. | |
233 | ||
234 | The final piece of code we are missing, is a bit of top level magic: | |
235 | ||
236 | .. code-block:: ocaml | |
237 | ||
238 | let codegen_func the_fpm = function | |
239 | | Ast.Function (proto, body) -> | |
240 | Hashtbl.clear named_values; | |
241 | let the_function = codegen_proto proto in | |
242 | ||
243 | (* If this is an operator, install it. *) | |
244 | begin match proto with | |
245 | | Ast.BinOpPrototype (name, args, prec) -> | |
246 | let op = name.[String.length name - 1] in | |
247 | Hashtbl.add Parser.binop_precedence op prec; | |
248 | | _ -> () | |
249 | end; | |
250 | ||
251 | (* Create a new basic block to start insertion into. *) | |
252 | let bb = append_block context "entry" the_function in | |
253 | position_at_end bb builder; | |
254 | ... | |
255 | ||
256 | Basically, before codegening a function, if it is a user-defined | |
257 | operator, we register it in the precedence table. This allows the binary | |
258 | operator parsing logic we already have in place to handle it. Since we | |
259 | are working on a fully-general operator precedence parser, this is all | |
260 | we need to do to "extend the grammar". | |
261 | ||
262 | Now we have useful user-defined binary operators. This builds a lot on | |
263 | the previous framework we built for other operators. Adding unary | |
264 | operators is a bit more challenging, because we don't have any framework | |
265 | for it yet - lets see what it takes. | |
266 | ||
267 | User-defined Unary Operators | |
268 | ============================ | |
269 | ||
270 | Since we don't currently support unary operators in the Kaleidoscope | |
271 | language, we'll need to add everything to support them. Above, we added | |
272 | simple support for the 'unary' keyword to the lexer. In addition to | |
273 | that, we need an AST node: | |
274 | ||
275 | .. code-block:: ocaml | |
276 | ||
277 | type expr = | |
278 | ... | |
279 | (* variant for a unary operator. *) | |
280 | | Unary of char * expr | |
281 | ... | |
282 | ||
283 | This AST node is very simple and obvious by now. It directly mirrors the | |
284 | binary operator AST node, except that it only has one child. With this, | |
285 | we need to add the parsing logic. Parsing a unary operator is pretty | |
286 | simple: we'll add a new function to do it: | |
287 | ||
288 | .. code-block:: ocaml | |
289 | ||
290 | (* unary | |
291 | * ::= primary | |
292 | * ::= '!' unary *) | |
293 | and parse_unary = parser | |
294 | (* If this is a unary operator, read it. *) | |
295 | | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] -> | |
296 | Ast.Unary (op, operand) | |
297 | ||
298 | (* If the current token is not an operator, it must be a primary expr. *) | |
299 | | [< stream >] -> parse_primary stream | |
300 | ||
301 | The grammar we add is pretty straightforward here. If we see a unary | |
302 | operator when parsing a primary operator, we eat the operator as a | |
303 | prefix and parse the remaining piece as another unary operator. This | |
304 | allows us to handle multiple unary operators (e.g. "!!x"). Note that | |
305 | unary operators can't have ambiguous parses like binary operators can, | |
306 | so there is no need for precedence information. | |
307 | ||
308 | The problem with this function, is that we need to call ParseUnary from | |
309 | somewhere. To do this, we change previous callers of ParsePrimary to | |
310 | call ``parse_unary`` instead: | |
311 | ||
312 | .. code-block:: ocaml | |
313 | ||
314 | (* binoprhs | |
315 | * ::= ('+' primary)* *) | |
316 | and parse_bin_rhs expr_prec lhs stream = | |
317 | ... | |
318 | (* Parse the unary expression after the binary operator. *) | |
319 | let rhs = parse_unary stream in | |
320 | ... | |
321 | ||
322 | ... | |
323 | ||
324 | (* expression | |
325 | * ::= primary binoprhs *) | |
326 | and parse_expr = parser | |
327 | | [< lhs=parse_unary; stream >] -> parse_bin_rhs 0 lhs stream | |
328 | ||
329 | With these two simple changes, we are now able to parse unary operators | |
330 | and build the AST for them. Next up, we need to add parser support for | |
331 | prototypes, to parse the unary operator prototype. We extend the binary | |
332 | operator code above with: | |
333 | ||
334 | .. code-block:: ocaml | |
335 | ||
336 | (* prototype | |
337 | * ::= id '(' id* ')' | |
338 | * ::= binary LETTER number? (id, id) | |
339 | * ::= unary LETTER number? (id) *) | |
340 | let parse_prototype = | |
341 | let rec parse_args accumulator = parser | |
342 | | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e | |
343 | | [< >] -> accumulator | |
344 | in | |
345 | let parse_operator = parser | |
346 | | [< 'Token.Unary >] -> "unary", 1 | |
347 | | [< 'Token.Binary >] -> "binary", 2 | |
348 | in | |
349 | let parse_binary_precedence = parser | |
350 | | [< 'Token.Number n >] -> int_of_float n | |
351 | | [< >] -> 30 | |
352 | in | |
353 | parser | |
354 | | [< 'Token.Ident id; | |
355 | 'Token.Kwd '(' ?? "expected '(' in prototype"; | |
356 | args=parse_args []; | |
357 | 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> | |
358 | (* success. *) | |
359 | Ast.Prototype (id, Array.of_list (List.rev args)) | |
360 | | [< (prefix, kind)=parse_operator; | |
361 | 'Token.Kwd op ?? "expected an operator"; | |
362 | (* Read the precedence if present. *) | |
363 | binary_precedence=parse_binary_precedence; | |
364 | 'Token.Kwd '(' ?? "expected '(' in prototype"; | |
365 | args=parse_args []; | |
366 | 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> | |
367 | let name = prefix ^ (String.make 1 op) in | |
368 | let args = Array.of_list (List.rev args) in | |
369 | ||
370 | (* Verify right number of arguments for operator. *) | |
371 | if Array.length args != kind | |
372 | then raise (Stream.Error "invalid number of operands for operator") | |
373 | else | |
374 | if kind == 1 then | |
375 | Ast.Prototype (name, args) | |
376 | else | |
377 | Ast.BinOpPrototype (name, args, binary_precedence) | |
378 | | [< >] -> | |
379 | raise (Stream.Error "expected function name in prototype") | |
380 | ||
381 | As with binary operators, we name unary operators with a name that | |
382 | includes the operator character. This assists us at code generation | |
383 | time. Speaking of, the final piece we need to add is codegen support for | |
384 | unary operators. It looks like this: | |
385 | ||
386 | .. code-block:: ocaml | |
387 | ||
388 | let rec codegen_expr = function | |
389 | ... | |
390 | | Ast.Unary (op, operand) -> | |
391 | let operand = codegen_expr operand in | |
392 | let callee = "unary" ^ (String.make 1 op) in | |
393 | let callee = | |
394 | match lookup_function callee the_module with | |
395 | | Some callee -> callee | |
396 | | None -> raise (Error "unknown unary operator") | |
397 | in | |
398 | build_call callee [|operand|] "unop" builder | |
399 | ||
400 | This code is similar to, but simpler than, the code for binary | |
401 | operators. It is simpler primarily because it doesn't need to handle any | |
402 | predefined operators. | |
403 | ||
404 | Kicking the Tires | |
405 | ================= | |
406 | ||
407 | It is somewhat hard to believe, but with a few simple extensions we've | |
408 | covered in the last chapters, we have grown a real-ish language. With | |
409 | this, we can do a lot of interesting things, including I/O, math, and a | |
410 | bunch of other things. For example, we can now add a nice sequencing | |
411 | operator (printd is defined to print out the specified value and a | |
412 | newline): | |
413 | ||
414 | :: | |
415 | ||
416 | ready> extern printd(x); | |
417 | Read extern: declare double @printd(double) | |
418 | ready> def binary : 1 (x y) 0; # Low-precedence operator that ignores operands. | |
419 | .. | |
420 | ready> printd(123) : printd(456) : printd(789); | |
421 | 123.000000 | |
422 | 456.000000 | |
423 | 789.000000 | |
424 | Evaluated to 0.000000 | |
425 | ||
426 | We can also define a bunch of other "primitive" operations, such as: | |
427 | ||
428 | :: | |
429 | ||
430 | # Logical unary not. | |
431 | def unary!(v) | |
432 | if v then | |
433 | 0 | |
434 | else | |
435 | 1; | |
436 | ||
437 | # Unary negate. | |
438 | def unary-(v) | |
439 | 0-v; | |
440 | ||
441 | # Define > with the same precedence as <. | |
442 | def binary> 10 (LHS RHS) | |
443 | RHS < LHS; | |
444 | ||
445 | # Binary logical or, which does not short circuit. | |
446 | def binary| 5 (LHS RHS) | |
447 | if LHS then | |
448 | 1 | |
449 | else if RHS then | |
450 | 1 | |
451 | else | |
452 | 0; | |
453 | ||
454 | # Binary logical and, which does not short circuit. | |
455 | def binary& 6 (LHS RHS) | |
456 | if !LHS then | |
457 | 0 | |
458 | else | |
459 | !!RHS; | |
460 | ||
461 | # Define = with slightly lower precedence than relationals. | |
462 | def binary = 9 (LHS RHS) | |
463 | !(LHS < RHS | LHS > RHS); | |
464 | ||
465 | Given the previous if/then/else support, we can also define interesting | |
466 | functions for I/O. For example, the following prints out a character | |
467 | whose "density" reflects the value passed in: the lower the value, the | |
468 | denser the character: | |
469 | ||
470 | :: | |
471 | ||
472 | ready> | |
473 | ||
474 | extern putchard(char) | |
475 | def printdensity(d) | |
476 | if d > 8 then | |
477 | putchard(32) # ' ' | |
478 | else if d > 4 then | |
479 | putchard(46) # '.' | |
480 | else if d > 2 then | |
481 | putchard(43) # '+' | |
482 | else | |
483 | putchard(42); # '*' | |
484 | ... | |
485 | ready> printdensity(1): printdensity(2): printdensity(3) : | |
486 | printdensity(4): printdensity(5): printdensity(9): putchard(10); | |
487 | *++.. | |
488 | Evaluated to 0.000000 | |
489 | ||
490 | Based on these simple primitive operations, we can start to define more | |
491 | interesting things. For example, here's a little function that solves | |
492 | for the number of iterations it takes a function in the complex plane to | |
493 | converge: | |
494 | ||
495 | :: | |
496 | ||
497 | # determine whether the specific location diverges. | |
498 | # Solve for z = z^2 + c in the complex plane. | |
499 | def mandleconverger(real imag iters creal cimag) | |
500 | if iters > 255 | (real*real + imag*imag > 4) then | |
501 | iters | |
502 | else | |
503 | mandleconverger(real*real - imag*imag + creal, | |
504 | 2*real*imag + cimag, | |
505 | iters+1, creal, cimag); | |
506 | ||
507 | # return the number of iterations required for the iteration to escape | |
508 | def mandleconverge(real imag) | |
509 | mandleconverger(real, imag, 0, real, imag); | |
510 | ||
511 | This "z = z\ :sup:`2`\ + c" function is a beautiful little creature | |
512 | that is the basis for computation of the `Mandelbrot | |
513 | Set <http://en.wikipedia.org/wiki/Mandelbrot_set>`_. Our | |
514 | ``mandelconverge`` function returns the number of iterations that it | |
515 | takes for a complex orbit to escape, saturating to 255. This is not a | |
516 | very useful function by itself, but if you plot its value over a | |
517 | two-dimensional plane, you can see the Mandelbrot set. Given that we are | |
518 | limited to using putchard here, our amazing graphical output is limited, | |
519 | but we can whip together something using the density plotter above: | |
520 | ||
521 | :: | |
522 | ||
523 | # compute and plot the mandlebrot set with the specified 2 dimensional range | |
524 | # info. | |
525 | def mandelhelp(xmin xmax xstep ymin ymax ystep) | |
526 | for y = ymin, y < ymax, ystep in ( | |
527 | (for x = xmin, x < xmax, xstep in | |
528 | printdensity(mandleconverge(x,y))) | |
529 | : putchard(10) | |
530 | ) | |
531 | ||
532 | # mandel - This is a convenient helper function for plotting the mandelbrot set | |
533 | # from the specified position with the specified Magnification. | |
534 | def mandel(realstart imagstart realmag imagmag) | |
535 | mandelhelp(realstart, realstart+realmag*78, realmag, | |
536 | imagstart, imagstart+imagmag*40, imagmag); | |
537 | ||
538 | Given this, we can try plotting out the mandlebrot set! Lets try it out: | |
539 | ||
540 | :: | |
541 | ||
542 | ready> mandel(-2.3, -1.3, 0.05, 0.07); | |
543 | *******************************+++++++++++************************************* | |
544 | *************************+++++++++++++++++++++++******************************* | |
545 | **********************+++++++++++++++++++++++++++++**************************** | |
546 | *******************+++++++++++++++++++++.. ...++++++++************************* | |
547 | *****************++++++++++++++++++++++.... ...+++++++++*********************** | |
548 | ***************+++++++++++++++++++++++..... ...+++++++++********************* | |
549 | **************+++++++++++++++++++++++.... ....+++++++++******************** | |
550 | *************++++++++++++++++++++++...... .....++++++++******************* | |
551 | ************+++++++++++++++++++++....... .......+++++++****************** | |
552 | ***********+++++++++++++++++++.... ... .+++++++***************** | |
553 | **********+++++++++++++++++....... .+++++++**************** | |
554 | *********++++++++++++++........... ...+++++++*************** | |
555 | ********++++++++++++............ ...++++++++************** | |
556 | ********++++++++++... .......... .++++++++************** | |
557 | *******+++++++++..... .+++++++++************* | |
558 | *******++++++++...... ..+++++++++************* | |
559 | *******++++++....... ..+++++++++************* | |
560 | *******+++++...... ..+++++++++************* | |
561 | *******.... .... ...+++++++++************* | |
562 | *******.... . ...+++++++++************* | |
563 | *******+++++...... ...+++++++++************* | |
564 | *******++++++....... ..+++++++++************* | |
565 | *******++++++++...... .+++++++++************* | |
566 | *******+++++++++..... ..+++++++++************* | |
567 | ********++++++++++... .......... .++++++++************** | |
568 | ********++++++++++++............ ...++++++++************** | |
569 | *********++++++++++++++.......... ...+++++++*************** | |
570 | **********++++++++++++++++........ .+++++++**************** | |
571 | **********++++++++++++++++++++.... ... ..+++++++**************** | |
572 | ***********++++++++++++++++++++++....... .......++++++++***************** | |
573 | ************+++++++++++++++++++++++...... ......++++++++****************** | |
574 | **************+++++++++++++++++++++++.... ....++++++++******************** | |
575 | ***************+++++++++++++++++++++++..... ...+++++++++********************* | |
576 | *****************++++++++++++++++++++++.... ...++++++++*********************** | |
577 | *******************+++++++++++++++++++++......++++++++************************* | |
578 | *********************++++++++++++++++++++++.++++++++*************************** | |
579 | *************************+++++++++++++++++++++++******************************* | |
580 | ******************************+++++++++++++************************************ | |
581 | ******************************************************************************* | |
582 | ******************************************************************************* | |
583 | ******************************************************************************* | |
584 | Evaluated to 0.000000 | |
585 | ready> mandel(-2, -1, 0.02, 0.04); | |
586 | **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++ | |
587 | ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | |
588 | *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++. | |
589 | *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++... | |
590 | *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++..... | |
591 | ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........ | |
592 | **************++++++++++++++++++++++++++++++++++++++++++++++++++++++........... | |
593 | ************+++++++++++++++++++++++++++++++++++++++++++++++++++++.............. | |
594 | ***********++++++++++++++++++++++++++++++++++++++++++++++++++........ . | |
595 | **********++++++++++++++++++++++++++++++++++++++++++++++............. | |
596 | ********+++++++++++++++++++++++++++++++++++++++++++.................. | |
597 | *******+++++++++++++++++++++++++++++++++++++++....................... | |
598 | ******+++++++++++++++++++++++++++++++++++........................... | |
599 | *****++++++++++++++++++++++++++++++++............................ | |
600 | *****++++++++++++++++++++++++++++............................... | |
601 | ****++++++++++++++++++++++++++...... ......................... | |
602 | ***++++++++++++++++++++++++......... ...... ........... | |
603 | ***++++++++++++++++++++++............ | |
604 | **+++++++++++++++++++++.............. | |
605 | **+++++++++++++++++++................ | |
606 | *++++++++++++++++++................. | |
607 | *++++++++++++++++............ ... | |
608 | *++++++++++++++.............. | |
609 | *+++....++++................ | |
610 | *.......... ........... | |
611 | * | |
612 | *.......... ........... | |
613 | *+++....++++................ | |
614 | *++++++++++++++.............. | |
615 | *++++++++++++++++............ ... | |
616 | *++++++++++++++++++................. | |
617 | **+++++++++++++++++++................ | |
618 | **+++++++++++++++++++++.............. | |
619 | ***++++++++++++++++++++++............ | |
620 | ***++++++++++++++++++++++++......... ...... ........... | |
621 | ****++++++++++++++++++++++++++...... ......................... | |
622 | *****++++++++++++++++++++++++++++............................... | |
623 | *****++++++++++++++++++++++++++++++++............................ | |
624 | ******+++++++++++++++++++++++++++++++++++........................... | |
625 | *******+++++++++++++++++++++++++++++++++++++++....................... | |
626 | ********+++++++++++++++++++++++++++++++++++++++++++.................. | |
627 | Evaluated to 0.000000 | |
628 | ready> mandel(-0.9, -1.4, 0.02, 0.03); | |
629 | ******************************************************************************* | |
630 | ******************************************************************************* | |
631 | ******************************************************************************* | |
632 | **********+++++++++++++++++++++************************************************ | |
633 | *+++++++++++++++++++++++++++++++++++++++*************************************** | |
634 | +++++++++++++++++++++++++++++++++++++++++++++********************************** | |
635 | ++++++++++++++++++++++++++++++++++++++++++++++++++***************************** | |
636 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++************************* | |
637 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++********************** | |
638 | +++++++++++++++++++++++++++++++++.........++++++++++++++++++******************* | |
639 | +++++++++++++++++++++++++++++++.... ......+++++++++++++++++++**************** | |
640 | +++++++++++++++++++++++++++++....... ........+++++++++++++++++++************** | |
641 | ++++++++++++++++++++++++++++........ ........++++++++++++++++++++************ | |
642 | +++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++********** | |
643 | ++++++++++++++++++++++++++........... ....++++++++++++++++++++++******** | |
644 | ++++++++++++++++++++++++............. .......++++++++++++++++++++++****** | |
645 | +++++++++++++++++++++++............. ........+++++++++++++++++++++++**** | |
646 | ++++++++++++++++++++++........... ..........++++++++++++++++++++++*** | |
647 | ++++++++++++++++++++........... .........++++++++++++++++++++++* | |
648 | ++++++++++++++++++............ ...........++++++++++++++++++++ | |
649 | ++++++++++++++++............... .............++++++++++++++++++ | |
650 | ++++++++++++++................. ...............++++++++++++++++ | |
651 | ++++++++++++.................. .................++++++++++++++ | |
652 | +++++++++.................. .................+++++++++++++ | |
653 | ++++++........ . ......... ..++++++++++++ | |
654 | ++............ ...... ....++++++++++ | |
655 | .............. ...++++++++++ | |
656 | .............. ....+++++++++ | |
657 | .............. .....++++++++ | |
658 | ............. ......++++++++ | |
659 | ........... .......++++++++ | |
660 | ......... ........+++++++ | |
661 | ......... ........+++++++ | |
662 | ......... ....+++++++ | |
663 | ........ ...+++++++ | |
664 | ....... ...+++++++ | |
665 | ....+++++++ | |
666 | .....+++++++ | |
667 | ....+++++++ | |
668 | ....+++++++ | |
669 | ....+++++++ | |
670 | Evaluated to 0.000000 | |
671 | ready> ^D | |
672 | ||
673 | At this point, you may be starting to realize that Kaleidoscope is a | |
674 | real and powerful language. It may not be self-similar :), but it can be | |
675 | used to plot things that are! | |
676 | ||
677 | With this, we conclude the "adding user-defined operators" chapter of | |
678 | the tutorial. We have successfully augmented our language, adding the | |
679 | ability to extend the language in the library, and we have shown how | |
680 | this can be used to build a simple but interesting end-user application | |
681 | in Kaleidoscope. At this point, Kaleidoscope can build a variety of | |
682 | applications that are functional and can call functions with | |
683 | side-effects, but it can't actually define and mutate a variable itself. | |
684 | ||
685 | Strikingly, variable mutation is an important feature of some languages, | |
686 | and it is not at all obvious how to `add support for mutable | |
687 | variables <OCamlLangImpl7.html>`_ without having to add an "SSA | |
688 | construction" phase to your front-end. In the next chapter, we will | |
689 | describe how you can add variable mutation without building SSA in your | |
690 | front-end. | |
691 | ||
692 | Full Code Listing | |
693 | ================= | |
694 | ||
695 | Here is the complete code listing for our running example, enhanced with | |
696 | the if/then/else and for expressions.. To build this example, use: | |
697 | ||
698 | .. code-block:: bash | |
699 | ||
700 | # Compile | |
701 | ocamlbuild toy.byte | |
702 | # Run | |
703 | ./toy.byte | |
704 | ||
705 | Here is the code: | |
706 | ||
707 | \_tags: | |
708 | :: | |
709 | ||
710 | <{lexer,parser}.ml>: use_camlp4, pp(camlp4of) | |
711 | <*.{byte,native}>: g++, use_llvm, use_llvm_analysis | |
712 | <*.{byte,native}>: use_llvm_executionengine, use_llvm_target | |
713 | <*.{byte,native}>: use_llvm_scalar_opts, use_bindings | |
714 | ||
715 | myocamlbuild.ml: | |
716 | .. code-block:: ocaml | |
717 | ||
718 | open Ocamlbuild_plugin;; | |
719 | ||
720 | ocaml_lib ~extern:true "llvm";; | |
721 | ocaml_lib ~extern:true "llvm_analysis";; | |
722 | ocaml_lib ~extern:true "llvm_executionengine";; | |
723 | ocaml_lib ~extern:true "llvm_target";; | |
724 | ocaml_lib ~extern:true "llvm_scalar_opts";; | |
725 | ||
726 | flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"; A"-cclib"; A"-rdynamic"]);; | |
727 | dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];; | |
728 | ||
729 | token.ml: | |
730 | .. code-block:: ocaml | |
731 | ||
732 | (*===----------------------------------------------------------------------=== | |
733 | * Lexer Tokens | |
734 | *===----------------------------------------------------------------------===*) | |
735 | ||
736 | (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of | |
737 | * these others for known things. *) | |
738 | type token = | |
739 | (* commands *) | |
740 | | Def | Extern | |
741 | ||
742 | (* primary *) | |
743 | | Ident of string | Number of float | |
744 | ||
745 | (* unknown *) | |
746 | | Kwd of char | |
747 | ||
748 | (* control *) | |
749 | | If | Then | Else | |
750 | | For | In | |
751 | ||
752 | (* operators *) | |
753 | | Binary | Unary | |
754 | ||
755 | lexer.ml: | |
756 | .. code-block:: ocaml | |
757 | ||
758 | (*===----------------------------------------------------------------------=== | |
759 | * Lexer | |
760 | *===----------------------------------------------------------------------===*) | |
761 | ||
762 | let rec lex = parser | |
763 | (* Skip any whitespace. *) | |
764 | | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream | |
765 | ||
766 | (* identifier: [a-zA-Z][a-zA-Z0-9] *) | |
767 | | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] -> | |
768 | let buffer = Buffer.create 1 in | |
769 | Buffer.add_char buffer c; | |
770 | lex_ident buffer stream | |
771 | ||
772 | (* number: [0-9.]+ *) | |
773 | | [< ' ('0' .. '9' as c); stream >] -> | |
774 | let buffer = Buffer.create 1 in | |
775 | Buffer.add_char buffer c; | |
776 | lex_number buffer stream | |
777 | ||
778 | (* Comment until end of line. *) | |
779 | | [< ' ('#'); stream >] -> | |
780 | lex_comment stream | |
781 | ||
782 | (* Otherwise, just return the character as its ascii value. *) | |
783 | | [< 'c; stream >] -> | |
784 | [< 'Token.Kwd c; lex stream >] | |
785 | ||
786 | (* end of stream. *) | |
787 | | [< >] -> [< >] | |
788 | ||
789 | and lex_number buffer = parser | |
790 | | [< ' ('0' .. '9' | '.' as c); stream >] -> | |
791 | Buffer.add_char buffer c; | |
792 | lex_number buffer stream | |
793 | | [< stream=lex >] -> | |
794 | [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >] | |
795 | ||
796 | and lex_ident buffer = parser | |
797 | | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] -> | |
798 | Buffer.add_char buffer c; | |
799 | lex_ident buffer stream | |
800 | | [< stream=lex >] -> | |
801 | match Buffer.contents buffer with | |
802 | | "def" -> [< 'Token.Def; stream >] | |
803 | | "extern" -> [< 'Token.Extern; stream >] | |
804 | | "if" -> [< 'Token.If; stream >] | |
805 | | "then" -> [< 'Token.Then; stream >] | |
806 | | "else" -> [< 'Token.Else; stream >] | |
807 | | "for" -> [< 'Token.For; stream >] | |
808 | | "in" -> [< 'Token.In; stream >] | |
809 | | "binary" -> [< 'Token.Binary; stream >] | |
810 | | "unary" -> [< 'Token.Unary; stream >] | |
811 | | id -> [< 'Token.Ident id; stream >] | |
812 | ||
813 | and lex_comment = parser | |
814 | | [< ' ('\n'); stream=lex >] -> stream | |
815 | | [< 'c; e=lex_comment >] -> e | |
816 | | [< >] -> [< >] | |
817 | ||
818 | ast.ml: | |
819 | .. code-block:: ocaml | |
820 | ||
821 | (*===----------------------------------------------------------------------=== | |
822 | * Abstract Syntax Tree (aka Parse Tree) | |
823 | *===----------------------------------------------------------------------===*) | |
824 | ||
825 | (* expr - Base type for all expression nodes. *) | |
826 | type expr = | |
827 | (* variant for numeric literals like "1.0". *) | |
828 | | Number of float | |
829 | ||
830 | (* variant for referencing a variable, like "a". *) | |
831 | | Variable of string | |
832 | ||
833 | (* variant for a unary operator. *) | |
834 | | Unary of char * expr | |
835 | ||
836 | (* variant for a binary operator. *) | |
837 | | Binary of char * expr * expr | |
838 | ||
839 | (* variant for function calls. *) | |
840 | | Call of string * expr array | |
841 | ||
842 | (* variant for if/then/else. *) | |
843 | | If of expr * expr * expr | |
844 | ||
845 | (* variant for for/in. *) | |
846 | | For of string * expr * expr * expr option * expr | |
847 | ||
848 | (* proto - This type represents the "prototype" for a function, which captures | |
849 | * its name, and its argument names (thus implicitly the number of arguments the | |
850 | * function takes). *) | |
851 | type proto = | |
852 | | Prototype of string * string array | |
853 | | BinOpPrototype of string * string array * int | |
854 | ||
855 | (* func - This type represents a function definition itself. *) | |
856 | type func = Function of proto * expr | |
857 | ||
858 | parser.ml: | |
859 | .. code-block:: ocaml | |
860 | ||
861 | (*===---------------------------------------------------------------------=== | |
862 | * Parser | |
863 | *===---------------------------------------------------------------------===*) | |
864 | ||
865 | (* binop_precedence - This holds the precedence for each binary operator that is | |
866 | * defined *) | |
867 | let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10 | |
868 | ||
869 | (* precedence - Get the precedence of the pending binary operator token. *) | |
870 | let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1 | |
871 | ||
872 | (* primary | |
873 | * ::= identifier | |
874 | * ::= numberexpr | |
875 | * ::= parenexpr | |
876 | * ::= ifexpr | |
877 | * ::= forexpr *) | |
878 | let rec parse_primary = parser | |
879 | (* numberexpr ::= number *) | |
880 | | [< 'Token.Number n >] -> Ast.Number n | |
881 | ||
882 | (* parenexpr ::= '(' expression ')' *) | |
883 | | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e | |
884 | ||
885 | (* identifierexpr | |
886 | * ::= identifier | |
887 | * ::= identifier '(' argumentexpr ')' *) | |
888 | | [< 'Token.Ident id; stream >] -> | |
889 | let rec parse_args accumulator = parser | |
890 | | [< e=parse_expr; stream >] -> | |
891 | begin parser | |
892 | | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e | |
893 | | [< >] -> e :: accumulator | |
894 | end stream | |
895 | | [< >] -> accumulator | |
896 | in | |
897 | let rec parse_ident id = parser | |
898 | (* Call. *) | |
899 | | [< 'Token.Kwd '('; | |
900 | args=parse_args []; | |
901 | 'Token.Kwd ')' ?? "expected ')'">] -> | |
902 | Ast.Call (id, Array.of_list (List.rev args)) | |
903 | ||
904 | (* Simple variable ref. *) | |
905 | | [< >] -> Ast.Variable id | |
906 | in | |
907 | parse_ident id stream | |
908 | ||
909 | (* ifexpr ::= 'if' expr 'then' expr 'else' expr *) | |
910 | | [< 'Token.If; c=parse_expr; | |
911 | 'Token.Then ?? "expected 'then'"; t=parse_expr; | |
912 | 'Token.Else ?? "expected 'else'"; e=parse_expr >] -> | |
913 | Ast.If (c, t, e) | |
914 | ||
915 | (* forexpr | |
916 | ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *) | |
917 | | [< 'Token.For; | |
918 | 'Token.Ident id ?? "expected identifier after for"; | |
919 | 'Token.Kwd '=' ?? "expected '=' after for"; | |
920 | stream >] -> | |
921 | begin parser | |
922 | | [< | |
923 | start=parse_expr; | |
924 | 'Token.Kwd ',' ?? "expected ',' after for"; | |
925 | end_=parse_expr; | |
926 | stream >] -> | |
927 | let step = | |
928 | begin parser | |
929 | | [< 'Token.Kwd ','; step=parse_expr >] -> Some step | |
930 | | [< >] -> None | |
931 | end stream | |
932 | in | |
933 | begin parser | |
934 | | [< 'Token.In; body=parse_expr >] -> | |
935 | Ast.For (id, start, end_, step, body) | |
936 | | [< >] -> | |
937 | raise (Stream.Error "expected 'in' after for") | |
938 | end stream | |
939 | | [< >] -> | |
940 | raise (Stream.Error "expected '=' after for") | |
941 | end stream | |
942 | ||
943 | | [< >] -> raise (Stream.Error "unknown token when expecting an expression.") | |
944 | ||
945 | (* unary | |
946 | * ::= primary | |
947 | * ::= '!' unary *) | |
948 | and parse_unary = parser | |
949 | (* If this is a unary operator, read it. *) | |
950 | | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] -> | |
951 | Ast.Unary (op, operand) | |
952 | ||
953 | (* If the current token is not an operator, it must be a primary expr. *) | |
954 | | [< stream >] -> parse_primary stream | |
955 | ||
956 | (* binoprhs | |
957 | * ::= ('+' primary)* *) | |
958 | and parse_bin_rhs expr_prec lhs stream = | |
959 | match Stream.peek stream with | |
960 | (* If this is a binop, find its precedence. *) | |
961 | | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c -> | |
962 | let token_prec = precedence c in | |
963 | ||
964 | (* If this is a binop that binds at least as tightly as the current binop, | |
965 | * consume it, otherwise we are done. *) | |
966 | if token_prec < expr_prec then lhs else begin | |
967 | (* Eat the binop. *) | |
968 | Stream.junk stream; | |
969 | ||
970 | (* Parse the unary expression after the binary operator. *) | |
971 | let rhs = parse_unary stream in | |
972 | ||
973 | (* Okay, we know this is a binop. *) | |
974 | let rhs = | |
975 | match Stream.peek stream with | |
976 | | Some (Token.Kwd c2) -> | |
977 | (* If BinOp binds less tightly with rhs than the operator after | |
978 | * rhs, let the pending operator take rhs as its lhs. *) | |
979 | let next_prec = precedence c2 in | |
980 | if token_prec < next_prec | |
981 | then parse_bin_rhs (token_prec + 1) rhs stream | |
982 | else rhs | |
983 | | _ -> rhs | |
984 | in | |
985 | ||
986 | (* Merge lhs/rhs. *) | |
987 | let lhs = Ast.Binary (c, lhs, rhs) in | |
988 | parse_bin_rhs expr_prec lhs stream | |
989 | end | |
990 | | _ -> lhs | |
991 | ||
992 | (* expression | |
993 | * ::= primary binoprhs *) | |
994 | and parse_expr = parser | |
995 | | [< lhs=parse_unary; stream >] -> parse_bin_rhs 0 lhs stream | |
996 | ||
997 | (* prototype | |
998 | * ::= id '(' id* ')' | |
999 | * ::= binary LETTER number? (id, id) | |
1000 | * ::= unary LETTER number? (id) *) | |
1001 | let parse_prototype = | |
1002 | let rec parse_args accumulator = parser | |
1003 | | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e | |
1004 | | [< >] -> accumulator | |
1005 | in | |
1006 | let parse_operator = parser | |
1007 | | [< 'Token.Unary >] -> "unary", 1 | |
1008 | | [< 'Token.Binary >] -> "binary", 2 | |
1009 | in | |
1010 | let parse_binary_precedence = parser | |
1011 | | [< 'Token.Number n >] -> int_of_float n | |
1012 | | [< >] -> 30 | |
1013 | in | |
1014 | parser | |
1015 | | [< 'Token.Ident id; | |
1016 | 'Token.Kwd '(' ?? "expected '(' in prototype"; | |
1017 | args=parse_args []; | |
1018 | 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> | |
1019 | (* success. *) | |
1020 | Ast.Prototype (id, Array.of_list (List.rev args)) | |
1021 | | [< (prefix, kind)=parse_operator; | |
1022 | 'Token.Kwd op ?? "expected an operator"; | |
1023 | (* Read the precedence if present. *) | |
1024 | binary_precedence=parse_binary_precedence; | |
1025 | 'Token.Kwd '(' ?? "expected '(' in prototype"; | |
1026 | args=parse_args []; | |
1027 | 'Token.Kwd ')' ?? "expected ')' in prototype" >] -> | |
1028 | let name = prefix ^ (String.make 1 op) in | |
1029 | let args = Array.of_list (List.rev args) in | |
1030 | ||
1031 | (* Verify right number of arguments for operator. *) | |
1032 | if Array.length args != kind | |
1033 | then raise (Stream.Error "invalid number of operands for operator") | |
1034 | else | |
1035 | if kind == 1 then | |
1036 | Ast.Prototype (name, args) | |
1037 | else | |
1038 | Ast.BinOpPrototype (name, args, binary_precedence) | |
1039 | | [< >] -> | |
1040 | raise (Stream.Error "expected function name in prototype") | |
1041 | ||
1042 | (* definition ::= 'def' prototype expression *) | |
1043 | let parse_definition = parser | |
1044 | | [< 'Token.Def; p=parse_prototype; e=parse_expr >] -> | |
1045 | Ast.Function (p, e) | |
1046 | ||
1047 | (* toplevelexpr ::= expression *) | |
1048 | let parse_toplevel = parser | |
1049 | | [< e=parse_expr >] -> | |
1050 | (* Make an anonymous proto. *) | |
1051 | Ast.Function (Ast.Prototype ("", [||]), e) | |
1052 | ||
1053 | (* external ::= 'extern' prototype *) | |
1054 | let parse_extern = parser | |
1055 | | [< 'Token.Extern; e=parse_prototype >] -> e | |
1056 | ||
1057 | codegen.ml: | |
1058 | .. code-block:: ocaml | |
1059 | ||
1060 | (*===----------------------------------------------------------------------=== | |
1061 | * Code Generation | |
1062 | *===----------------------------------------------------------------------===*) | |
1063 | ||
1064 | open Llvm | |
1065 | ||
1066 | exception Error of string | |
1067 | ||
1068 | let context = global_context () | |
1069 | let the_module = create_module context "my cool jit" | |
1070 | let builder = builder context | |
1071 | let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10 | |
1072 | let double_type = double_type context | |
1073 | ||
1074 | let rec codegen_expr = function | |
1075 | | Ast.Number n -> const_float double_type n | |
1076 | | Ast.Variable name -> | |
1077 | (try Hashtbl.find named_values name with | |
1078 | | Not_found -> raise (Error "unknown variable name")) | |
1079 | | Ast.Unary (op, operand) -> | |
1080 | let operand = codegen_expr operand in | |
1081 | let callee = "unary" ^ (String.make 1 op) in | |
1082 | let callee = | |
1083 | match lookup_function callee the_module with | |
1084 | | Some callee -> callee | |
1085 | | None -> raise (Error "unknown unary operator") | |
1086 | in | |
1087 | build_call callee [|operand|] "unop" builder | |
1088 | | Ast.Binary (op, lhs, rhs) -> | |
1089 | let lhs_val = codegen_expr lhs in | |
1090 | let rhs_val = codegen_expr rhs in | |
1091 | begin | |
1092 | match op with | |
1093 | | '+' -> build_add lhs_val rhs_val "addtmp" builder | |
1094 | | '-' -> build_sub lhs_val rhs_val "subtmp" builder | |
1095 | | '*' -> build_mul lhs_val rhs_val "multmp" builder | |
1096 | | '<' -> | |
1097 | (* Convert bool 0/1 to double 0.0 or 1.0 *) | |
1098 | let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in | |
1099 | build_uitofp i double_type "booltmp" builder | |
1100 | | _ -> | |
1101 | (* If it wasn't a builtin binary operator, it must be a user defined | |
1102 | * one. Emit a call to it. *) | |
1103 | let callee = "binary" ^ (String.make 1 op) in | |
1104 | let callee = | |
1105 | match lookup_function callee the_module with | |
1106 | | Some callee -> callee | |
1107 | | None -> raise (Error "binary operator not found!") | |
1108 | in | |
1109 | build_call callee [|lhs_val; rhs_val|] "binop" builder | |
1110 | end | |
1111 | | Ast.Call (callee, args) -> | |
1112 | (* Look up the name in the module table. *) | |
1113 | let callee = | |
1114 | match lookup_function callee the_module with | |
1115 | | Some callee -> callee | |
1116 | | None -> raise (Error "unknown function referenced") | |
1117 | in | |
1118 | let params = params callee in | |
1119 | ||
1120 | (* If argument mismatch error. *) | |
1121 | if Array.length params == Array.length args then () else | |
1122 | raise (Error "incorrect # arguments passed"); | |
1123 | let args = Array.map codegen_expr args in | |
1124 | build_call callee args "calltmp" builder | |
1125 | | Ast.If (cond, then_, else_) -> | |
1126 | let cond = codegen_expr cond in | |
1127 | ||
1128 | (* Convert condition to a bool by comparing equal to 0.0 *) | |
1129 | let zero = const_float double_type 0.0 in | |
1130 | let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in | |
1131 | ||
1132 | (* Grab the first block so that we might later add the conditional branch | |
1133 | * to it at the end of the function. *) | |
1134 | let start_bb = insertion_block builder in | |
1135 | let the_function = block_parent start_bb in | |
1136 | ||
1137 | let then_bb = append_block context "then" the_function in | |
1138 | ||
1139 | (* Emit 'then' value. *) | |
1140 | position_at_end then_bb builder; | |
1141 | let then_val = codegen_expr then_ in | |
1142 | ||
1143 | (* Codegen of 'then' can change the current block, update then_bb for the | |
1144 | * phi. We create a new name because one is used for the phi node, and the | |
1145 | * other is used for the conditional branch. *) | |
1146 | let new_then_bb = insertion_block builder in | |
1147 | ||
1148 | (* Emit 'else' value. *) | |
1149 | let else_bb = append_block context "else" the_function in | |
1150 | position_at_end else_bb builder; | |
1151 | let else_val = codegen_expr else_ in | |
1152 | ||
1153 | (* Codegen of 'else' can change the current block, update else_bb for the | |
1154 | * phi. *) | |
1155 | let new_else_bb = insertion_block builder in | |
1156 | ||
1157 | (* Emit merge block. *) | |
1158 | let merge_bb = append_block context "ifcont" the_function in | |
1159 | position_at_end merge_bb builder; | |
1160 | let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in | |
1161 | let phi = build_phi incoming "iftmp" builder in | |
1162 | ||
1163 | (* Return to the start block to add the conditional branch. *) | |
1164 | position_at_end start_bb builder; | |
1165 | ignore (build_cond_br cond_val then_bb else_bb builder); | |
1166 | ||
1167 | (* Set a unconditional branch at the end of the 'then' block and the | |
1168 | * 'else' block to the 'merge' block. *) | |
1169 | position_at_end new_then_bb builder; ignore (build_br merge_bb builder); | |
1170 | position_at_end new_else_bb builder; ignore (build_br merge_bb builder); | |
1171 | ||
1172 | (* Finally, set the builder to the end of the merge block. *) | |
1173 | position_at_end merge_bb builder; | |
1174 | ||
1175 | phi | |
1176 | | Ast.For (var_name, start, end_, step, body) -> | |
1177 | (* Emit the start code first, without 'variable' in scope. *) | |
1178 | let start_val = codegen_expr start in | |
1179 | ||
1180 | (* Make the new basic block for the loop header, inserting after current | |
1181 | * block. *) | |
1182 | let preheader_bb = insertion_block builder in | |
1183 | let the_function = block_parent preheader_bb in | |
1184 | let loop_bb = append_block context "loop" the_function in | |
1185 | ||
1186 | (* Insert an explicit fall through from the current block to the | |
1187 | * loop_bb. *) | |
1188 | ignore (build_br loop_bb builder); | |
1189 | ||
1190 | (* Start insertion in loop_bb. *) | |
1191 | position_at_end loop_bb builder; | |
1192 | ||
1193 | (* Start the PHI node with an entry for start. *) | |
1194 | let variable = build_phi [(start_val, preheader_bb)] var_name builder in | |
1195 | ||
1196 | (* Within the loop, the variable is defined equal to the PHI node. If it | |
1197 | * shadows an existing variable, we have to restore it, so save it | |
1198 | * now. *) | |
1199 | let old_val = | |
1200 | try Some (Hashtbl.find named_values var_name) with Not_found -> None | |
1201 | in | |
1202 | Hashtbl.add named_values var_name variable; | |
1203 | ||
1204 | (* Emit the body of the loop. This, like any other expr, can change the | |
1205 | * current BB. Note that we ignore the value computed by the body, but | |
1206 | * don't allow an error *) | |
1207 | ignore (codegen_expr body); | |
1208 | ||
1209 | (* Emit the step value. *) | |
1210 | let step_val = | |
1211 | match step with | |
1212 | | Some step -> codegen_expr step | |
1213 | (* If not specified, use 1.0. *) | |
1214 | | None -> const_float double_type 1.0 | |
1215 | in | |
1216 | ||
1217 | let next_var = build_add variable step_val "nextvar" builder in | |
1218 | ||
1219 | (* Compute the end condition. *) | |
1220 | let end_cond = codegen_expr end_ in | |
1221 | ||
1222 | (* Convert condition to a bool by comparing equal to 0.0. *) | |
1223 | let zero = const_float double_type 0.0 in | |
1224 | let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in | |
1225 | ||
1226 | (* Create the "after loop" block and insert it. *) | |
1227 | let loop_end_bb = insertion_block builder in | |
1228 | let after_bb = append_block context "afterloop" the_function in | |
1229 | ||
1230 | (* Insert the conditional branch into the end of loop_end_bb. *) | |
1231 | ignore (build_cond_br end_cond loop_bb after_bb builder); | |
1232 | ||
1233 | (* Any new code will be inserted in after_bb. *) | |
1234 | position_at_end after_bb builder; | |
1235 | ||
1236 | (* Add a new entry to the PHI node for the backedge. *) | |
1237 | add_incoming (next_var, loop_end_bb) variable; | |
1238 | ||
1239 | (* Restore the unshadowed variable. *) | |
1240 | begin match old_val with | |
1241 | | Some old_val -> Hashtbl.add named_values var_name old_val | |
1242 | | None -> () | |
1243 | end; | |
1244 | ||
1245 | (* for expr always returns 0.0. *) | |
1246 | const_null double_type | |
1247 | ||
1248 | let codegen_proto = function | |
1249 | | Ast.Prototype (name, args) | Ast.BinOpPrototype (name, args, _) -> | |
1250 | (* Make the function type: double(double,double) etc. *) | |
1251 | let doubles = Array.make (Array.length args) double_type in | |
1252 | let ft = function_type double_type doubles in | |
1253 | let f = | |
1254 | match lookup_function name the_module with | |
1255 | | None -> declare_function name ft the_module | |
1256 | ||
1257 | (* If 'f' conflicted, there was already something named 'name'. If it | |
1258 | * has a body, don't allow redefinition or reextern. *) | |
1259 | | Some f -> | |
1260 | (* If 'f' already has a body, reject this. *) | |
1261 | if block_begin f <> At_end f then | |
1262 | raise (Error "redefinition of function"); | |
1263 | ||
1264 | (* If 'f' took a different number of arguments, reject. *) | |
1265 | if element_type (type_of f) <> ft then | |
1266 | raise (Error "redefinition of function with different # args"); | |
1267 | f | |
1268 | in | |
1269 | ||
1270 | (* Set names for all arguments. *) | |
1271 | Array.iteri (fun i a -> | |
1272 | let n = args.(i) in | |
1273 | set_value_name n a; | |
1274 | Hashtbl.add named_values n a; | |
1275 | ) (params f); | |
1276 | f | |
1277 | ||
1278 | let codegen_func the_fpm = function | |
1279 | | Ast.Function (proto, body) -> | |
1280 | Hashtbl.clear named_values; | |
1281 | let the_function = codegen_proto proto in | |
1282 | ||
1283 | (* If this is an operator, install it. *) | |
1284 | begin match proto with | |
1285 | | Ast.BinOpPrototype (name, args, prec) -> | |
1286 | let op = name.[String.length name - 1] in | |
1287 | Hashtbl.add Parser.binop_precedence op prec; | |
1288 | | _ -> () | |
1289 | end; | |
1290 | ||
1291 | (* Create a new basic block to start insertion into. *) | |
1292 | let bb = append_block context "entry" the_function in | |
1293 | position_at_end bb builder; | |
1294 | ||
1295 | try | |
1296 | let ret_val = codegen_expr body in | |
1297 | ||
1298 | (* Finish off the function. *) | |
1299 | let _ = build_ret ret_val builder in | |
1300 | ||
1301 | (* Validate the generated code, checking for consistency. *) | |
1302 | Llvm_analysis.assert_valid_function the_function; | |
1303 | ||
1304 | (* Optimize the function. *) | |
1305 | let _ = PassManager.run_function the_function the_fpm in | |
1306 | ||
1307 | the_function | |
1308 | with e -> | |
1309 | delete_function the_function; | |
1310 | raise e | |
1311 | ||
1312 | toplevel.ml: | |
1313 | .. code-block:: ocaml | |
1314 | ||
1315 | (*===----------------------------------------------------------------------=== | |
1316 | * Top-Level parsing and JIT Driver | |
1317 | *===----------------------------------------------------------------------===*) | |
1318 | ||
1319 | open Llvm | |
1320 | open Llvm_executionengine | |
1321 | ||
1322 | (* top ::= definition | external | expression | ';' *) | |
1323 | let rec main_loop the_fpm the_execution_engine stream = | |
1324 | match Stream.peek stream with | |
1325 | | None -> () | |
1326 | ||
1327 | (* ignore top-level semicolons. *) | |
1328 | | Some (Token.Kwd ';') -> | |
1329 | Stream.junk stream; | |
1330 | main_loop the_fpm the_execution_engine stream | |
1331 | ||
1332 | | Some token -> | |
1333 | begin | |
1334 | try match token with | |
1335 | | Token.Def -> | |
1336 | let e = Parser.parse_definition stream in | |
1337 | print_endline "parsed a function definition."; | |
1338 | dump_value (Codegen.codegen_func the_fpm e); | |
1339 | | Token.Extern -> | |
1340 | let e = Parser.parse_extern stream in | |
1341 | print_endline "parsed an extern."; | |
1342 | dump_value (Codegen.codegen_proto e); | |
1343 | | _ -> | |
1344 | (* Evaluate a top-level expression into an anonymous function. *) | |
1345 | let e = Parser.parse_toplevel stream in | |
1346 | print_endline "parsed a top-level expr"; | |
1347 | let the_function = Codegen.codegen_func the_fpm e in | |
1348 | dump_value the_function; | |
1349 | ||
1350 | (* JIT the function, returning a function pointer. *) | |
1351 | let result = ExecutionEngine.run_function the_function [||] | |
1352 | the_execution_engine in | |
1353 | ||
1354 | print_string "Evaluated to "; | |
1355 | print_float (GenericValue.as_float Codegen.double_type result); | |
1356 | print_newline (); | |
1357 | with Stream.Error s | Codegen.Error s -> | |
1358 | (* Skip token for error recovery. *) | |
1359 | Stream.junk stream; | |
1360 | print_endline s; | |
1361 | end; | |
1362 | print_string "ready> "; flush stdout; | |
1363 | main_loop the_fpm the_execution_engine stream | |
1364 | ||
1365 | toy.ml: | |
1366 | .. code-block:: ocaml | |
1367 | ||
1368 | (*===----------------------------------------------------------------------=== | |
1369 | * Main driver code. | |
1370 | *===----------------------------------------------------------------------===*) | |
1371 | ||
1372 | open Llvm | |
1373 | open Llvm_executionengine | |
1374 | open Llvm_target | |
1375 | open Llvm_scalar_opts | |
1376 | ||
1377 | let main () = | |
1378 | ignore (initialize_native_target ()); | |
1379 | ||
1380 | (* Install standard binary operators. | |
1381 | * 1 is the lowest precedence. *) | |
1382 | Hashtbl.add Parser.binop_precedence '<' 10; | |
1383 | Hashtbl.add Parser.binop_precedence '+' 20; | |
1384 | Hashtbl.add Parser.binop_precedence '-' 20; | |
1385 | Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *) | |
1386 | ||
1387 | (* Prime the first token. *) | |
1388 | print_string "ready> "; flush stdout; | |
1389 | let stream = Lexer.lex (Stream.of_channel stdin) in | |
1390 | ||
1391 | (* Create the JIT. *) | |
1392 | let the_execution_engine = ExecutionEngine.create Codegen.the_module in | |
1393 | let the_fpm = PassManager.create_function Codegen.the_module in | |
1394 | ||
1395 | (* Set up the optimizer pipeline. Start with registering info about how the | |
1396 | * target lays out data structures. *) | |
1397 | DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm; | |
1398 | ||
1399 | (* Do simple "peephole" optimizations and bit-twiddling optzn. *) | |
1400 | add_instruction_combination the_fpm; | |
1401 | ||
1402 | (* reassociate expressions. *) | |
1403 | add_reassociation the_fpm; | |
1404 | ||
1405 | (* Eliminate Common SubExpressions. *) | |
1406 | add_gvn the_fpm; | |
1407 | ||
1408 | (* Simplify the control flow graph (deleting unreachable blocks, etc). *) | |
1409 | add_cfg_simplification the_fpm; | |
1410 | ||
1411 | ignore (PassManager.initialize the_fpm); | |
1412 | ||
1413 | (* Run the main "interpreter loop" now. *) | |
1414 | Toplevel.main_loop the_fpm the_execution_engine stream; | |
1415 | ||
1416 | (* Print out all the generated code. *) | |
1417 | dump_module Codegen.the_module | |
1418 | ;; | |
1419 | ||
1420 | main () | |
1421 | ||
1422 | bindings.c | |
1423 | .. code-block:: c | |
1424 | ||
1425 | #include <stdio.h> | |
1426 | ||
1427 | /* putchard - putchar that takes a double and returns 0. */ | |
1428 | extern double putchard(double X) { | |
1429 | putchar((char)X); | |
1430 | return 0; | |
1431 | } | |
1432 | ||
1433 | /* printd - printf that takes a double prints it as "%f\n", returning 0. */ | |
1434 | extern double printd(double X) { | |
1435 | printf("%f\n", X); | |
1436 | return 0; | |
1437 | } | |
1438 | ||
1439 | `Next: Extending the language: mutable variables / SSA | |
1440 | construction <OCamlLangImpl7.html>`_ | |
1441 |