]>
Commit | Line | Data |
---|---|---|
878ddf1f | 1 | ======================================================================\r |
2 | \r | |
3 | CHANGES_SUMMARY.TXT\r | |
4 | \r | |
5 | A QUICK overview of changes from 1.33 in reverse order\r | |
6 | \r | |
7 | A summary of additions rather than bug fixes and minor code changes.\r | |
8 | \r | |
9 | Numbers refer to items in CHANGES_FROM_133*.TXT\r | |
10 | which may contain additional information.\r | |
11 | \r | |
12 | DISCLAIMER\r | |
13 | \r | |
14 | The software and these notes are provided "as is". They may include\r | |
15 | typographical or technical errors and their authors disclaims all\r | |
16 | liability of any kind or nature for damages due to error, fault,\r | |
17 | defect, or deficiency regardless of cause. All warranties of any\r | |
18 | kind, either express or implied, including, but not limited to, the\r | |
19 | implied warranties of merchantability and fitness for a particular\r | |
20 | purpose are disclaimed.\r | |
21 | \r | |
22 | ======================================================================\r | |
23 | \r | |
24 | #258. You can specify a user-defined base class for your parser\r | |
25 | \r | |
26 | The base class must constructor must have a signature similar to\r | |
27 | that of ANTLRParser.\r | |
28 | \r | |
29 | #253. Generation of block preamble (-preamble and -preamble_first)\r | |
30 | \r | |
31 | The antlr option -preamble causes antlr to insert the code\r | |
32 | BLOCK_PREAMBLE at the start of each rule and block.\r | |
33 | \r | |
34 | The antlr option -preamble_first is similar, but inserts the\r | |
35 | code BLOCK_PREAMBLE_FIRST(PreambleFirst_123) where the symbol\r | |
36 | PreambleFirst_123 is equivalent to the first set defined by\r | |
37 | the #FirstSetSymbol described in Item #248.\r | |
38 | \r | |
39 | #248. Generate symbol for first set of an alternative\r | |
40 | \r | |
41 | rr : #FirstSetSymbol(rr_FirstSet) ( Foo | Bar ) ;\r | |
42 | \r | |
43 | #216. Defer token fetch for C++ mode\r | |
44 | \r | |
45 | When the ANTLRParser class is built with the pre-processor option \r | |
46 | ZZDEFER_FETCH defined, the fetch of new tokens by consume() is deferred\r | |
47 | until LA(i) or LT(i) is called. \r | |
48 | \r | |
49 | #215. Use reset() to reset DLGLexerBase\r | |
50 | #188. Added pccts/h/DLG_stream_input.h\r | |
51 | #180. Added ANTLRParser::getEofToken()\r | |
52 | #173. -glms for Microsoft style filenames with -gl\r | |
53 | #170. Suppression for predicates with lookahead depth >1\r | |
54 | \r | |
55 | Consider the following grammar with -ck 2 and the predicate in rule\r | |
56 | "a" with depth 2:\r | |
57 | \r | |
58 | r1 : (ab)* "@"\r | |
59 | ;\r | |
60 | \r | |
61 | ab : a\r | |
62 | | b\r | |
63 | ;\r | |
64 | \r | |
65 | a : (A B)? => <<p(LATEXT(2))>>? A B C\r | |
66 | ;\r | |
67 | \r | |
68 | b : A B C\r | |
69 | ;\r | |
70 | \r | |
71 | Normally, the predicate would be hoisted into rule r1 in order to\r | |
72 | determine whether to call rule "ab". However it should *not* be\r | |
73 | hoisted because, even if p is false, there is a valid alternative\r | |
74 | in rule b. With "-mrhoistk on" the predicate will be suppressed.\r | |
75 | \r | |
76 | If "-info p" command line option is present the following information\r | |
77 | will appear in the generated code:\r | |
78 | \r | |
79 | while ( (LA(1)==A)\r | |
80 | #if 0\r | |
81 | \r | |
82 | Part (or all) of predicate with depth > 1 suppressed by alternative\r | |
83 | without predicate\r | |
84 | \r | |
85 | pred << p(LATEXT(2))>>?\r | |
86 | depth=k=2 ("=>" guard) rule a line 8 t1.g\r | |
87 | tree context:\r | |
88 | (root = A\r | |
89 | B\r | |
90 | )\r | |
91 | \r | |
92 | The token sequence which is suppressed: ( A B )\r | |
93 | The sequence of references which generate that sequence of tokens:\r | |
94 | \r | |
95 | 1 to ab r1/1 line 1 t1.g\r | |
96 | 2 ab ab/1 line 4 t1.g\r | |
97 | 3 to b ab/2 line 5 t1.g\r | |
98 | 4 b b/1 line 11 t1.g\r | |
99 | 5 #token A b/1 line 11 t1.g\r | |
100 | 6 #token B b/1 line 11 t1.g\r | |
101 | \r | |
102 | #endif\r | |
103 | \r | |
104 | A slightly more complicated example:\r | |
105 | \r | |
106 | r1 : (ab)* "@"\r | |
107 | ;\r | |
108 | \r | |
109 | ab : a\r | |
110 | | b\r | |
111 | ;\r | |
112 | \r | |
113 | a : (A B)? => <<p(LATEXT(2))>>? (A B | D E)\r | |
114 | ;\r | |
115 | \r | |
116 | b : <<q(LATEXT(2))>>? D E\r | |
117 | ;\r | |
118 | \r | |
119 | \r | |
120 | In this case, the sequence (D E) in rule "a" which lies behind\r | |
121 | the guard is used to suppress the predicate with context (D E)\r | |
122 | in rule b.\r | |
123 | \r | |
124 | while ( (LA(1)==A || LA(1)==D)\r | |
125 | #if 0\r | |
126 | \r | |
127 | Part (or all) of predicate with depth > 1 suppressed by alternative\r | |
128 | without predicate\r | |
129 | \r | |
130 | pred << q(LATEXT(2))>>?\r | |
131 | depth=k=2 rule b line 11 t2.g\r | |
132 | tree context:\r | |
133 | (root = D\r | |
134 | E\r | |
135 | )\r | |
136 | \r | |
137 | The token sequence which is suppressed: ( D E )\r | |
138 | The sequence of references which generate that sequence of tokens:\r | |
139 | \r | |
140 | 1 to ab r1/1 line 1 t2.g\r | |
141 | 2 ab ab/1 line 4 t2.g\r | |
142 | 3 to a ab/1 line 4 t2.g\r | |
143 | 4 a a/1 line 8 t2.g\r | |
144 | 5 #token D a/1 line 8 t2.g\r | |
145 | 6 #token E a/1 line 8 t2.g\r | |
146 | \r | |
147 | #endif\r | |
148 | &&\r | |
149 | #if 0\r | |
150 | \r | |
151 | pred << p(LATEXT(2))>>?\r | |
152 | depth=k=2 ("=>" guard) rule a line 8 t2.g\r | |
153 | tree context:\r | |
154 | (root = A\r | |
155 | B\r | |
156 | )\r | |
157 | \r | |
158 | #endif\r | |
159 | \r | |
160 | (! ( LA(1)==A && LA(2)==B ) || p(LATEXT(2)) ) {\r | |
161 | ab();\r | |
162 | ...\r | |
163 | \r | |
164 | #165. (Changed in MR13) option -newAST\r | |
165 | \r | |
166 | To create ASTs from an ANTLRTokenPtr antlr usually calls\r | |
167 | "new AST(ANTLRTokenPtr)". This option generates a call\r | |
168 | to "newAST(ANTLRTokenPtr)" instead. This allows a user\r | |
169 | to define a parser member function to create an AST object.\r | |
170 | \r | |
171 | #161. (Changed in MR13) Switch -gxt inhibits generation of tokens.h\r | |
172 | \r | |
173 | #158. (Changed in MR13) #header causes problem for pre-processors\r | |
174 | \r | |
175 | A user who runs the C pre-processor on antlr source suggested\r | |
176 | that another syntax be allowed. With MR13 such directives\r | |
177 | such as #header, #pragma, etc. may be written as "\#header",\r | |
178 | "\#pragma", etc. For escaping pre-processor directives inside\r | |
179 | a #header use something like the following:\r | |
180 | \r | |
181 | \#header\r | |
182 | <<\r | |
183 | \#include <stdio.h>\r | |
184 | >>\r | |
185 | \r | |
186 | #155. (Changed in MR13) Context behind predicates can suppress\r | |
187 | \r | |
188 | With -mrhoist enabled the context behind a guarded predicate can\r | |
189 | be used to suppress other predicates. Consider the following grammar:\r | |
190 | \r | |
191 | r0 : (r1)+;\r | |
192 | \r | |
193 | r1 : rp\r | |
194 | | rq\r | |
195 | ;\r | |
196 | rp : <<p LATEXT(1)>>? B ;\r | |
197 | rq : (A)? => <<q LATEXT(1)>>? (A|B);\r | |
198 | \r | |
199 | In earlier versions both predicates "p" and "q" would be hoisted into\r | |
200 | rule r0. With MR12c predicate p is suppressed because the context which\r | |
201 | follows predicate q includes "B" which can "cover" predicate "p". In\r | |
202 | other words, in trying to decide in r0 whether to call r1, it doesn't\r | |
203 | really matter whether p is false or true because, either way, there is\r | |
204 | a valid choice within r1.\r | |
205 | \r | |
206 | #154. (Changed in MR13) Making hoist suppression explicit using <<nohoist>>\r | |
207 | \r | |
208 | A common error, even among experienced pccts users, is to code\r | |
209 | an init-action to inhibit hoisting rather than a leading action.\r | |
210 | An init-action does not inhibit hoisting.\r | |
211 | \r | |
212 | This was coded:\r | |
213 | \r | |
214 | rule1 : <<;>> rule2\r | |
215 | \r | |
216 | This is what was meant:\r | |
217 | \r | |
218 | rule1 : <<;>> <<;>> rule2\r | |
219 | \r | |
220 | With MR13, the user can code:\r | |
221 | \r | |
222 | rule1 : <<;>> <<nohoist>> rule2\r | |
223 | \r | |
224 | The following will give an error message:\r | |
225 | \r | |
226 | rule1 : <<nohoist>> rule2\r | |
227 | \r | |
228 | If the <<nohoist>> appears as an init-action rather than a leading\r | |
229 | action an error message is issued. The meaning of an init-action\r | |
230 | containing "nohoist" is unclear: does it apply to just one\r | |
231 | alternative or to all alternatives ?\r | |
232 | \r | |
233 | #151a. Addition of ANTLRParser::getLexer(), ANTLRTokenStream::getLexer()\r | |
234 | \r | |
235 | You must manually cast the ANTLRTokenStream to your program's\r | |
236 | lexer class. Because the name of the lexer's class is not fixed.\r | |
237 | Thus it is impossible to incorporate it into the DLGLexerBase\r | |
238 | class.\r | |
239 | \r | |
240 | #151b.(Changed in MR12) ParserBlackBox member getLexer()\r | |
241 | \r | |
242 | #150. (Changed in MR12) syntaxErrCount and lexErrCount now public\r | |
243 | \r | |
244 | #149. (Changed in MR12) antlr option -info o (letter o for orphan)\r | |
245 | \r | |
246 | If there is more than one rule which is not referenced by any\r | |
247 | other rule then all such rules are listed. This is useful for\r | |
248 | alerting one to rules which are not used, but which can still\r | |
249 | contribute to ambiguity.\r | |
250 | \r | |
251 | #148. (Changed in MR11) #token names appearing in zztokens,token_tbl\r | |
252 | \r | |
253 | One can write:\r | |
254 | \r | |
255 | #token Plus ("+") "\+"\r | |
256 | #token RP ("(") "\("\r | |
257 | #token COM ("comment begin") "/\*"\r | |
258 | \r | |
259 | The string in parenthesis will be used in syntax error messages.\r | |
260 | \r | |
261 | #146. (Changed in MR11) Option -treport for locating "difficult" alts\r | |
262 | \r | |
263 | It can be difficult to determine which alternatives are causing\r | |
264 | pccts to work hard to resolve an ambiguity. In some cases the\r | |
265 | ambiguity is successfully resolved after much CPU time so there\r | |
266 | is no message at all.\r | |
267 | \r | |
268 | A rough measure of the amount of work being peformed which is\r | |
269 | independent of the CPU speed and system load is the number of\r | |
270 | tnodes created. Using "-info t" gives information about the\r | |
271 | total number of tnodes created and the peak number of tnodes.\r | |
272 | \r | |
273 | Tree Nodes: peak 1300k created 1416k lost 0\r | |
274 | \r | |
275 | It also puts in the generated C or C++ file the number of tnodes\r | |
276 | created for a rule (at the end of the rule). However this\r | |
277 | information is not sufficient to locate the alternatives within\r | |
278 | a rule which are causing the creation of tnodes.\r | |
279 | \r | |
280 | Using:\r | |
281 | \r | |
282 | antlr -treport 100000 ....\r | |
283 | \r | |
284 | causes antlr to list on stdout any alternatives which require the\r | |
285 | creation of more than 100,000 tnodes, along with the lookahead sets\r | |
286 | for those alternatives.\r | |
287 | \r | |
288 | The following is a trivial case from the ansi.g grammar which shows\r | |
289 | the format of the report. This report might be of more interest\r | |
290 | in cases where 1,000,000 tuples were created to resolve the ambiguity.\r | |
291 | \r | |
292 | -------------------------------------------------------------------------\r | |
293 | There were 0 tuples whose ambiguity could not be resolved\r | |
294 | by full lookahead\r | |
295 | There were 157 tnodes created to resolve ambiguity between:\r | |
296 | \r | |
297 | Choice 1: statement/2 line 475 file ansi.g\r | |
298 | Choice 2: statement/3 line 476 file ansi.g\r | |
299 | \r | |
300 | Intersection of lookahead[1] sets:\r | |
301 | \r | |
302 | IDENTIFIER\r | |
303 | \r | |
304 | Intersection of lookahead[2] sets:\r | |
305 | \r | |
306 | LPARENTHESIS COLON AMPERSAND MINUS\r | |
307 | STAR PLUSPLUS MINUSMINUS ONESCOMPLEMENT\r | |
308 | NOT SIZEOF OCTALINT DECIMALINT\r | |
309 | HEXADECIMALINT FLOATONE FLOATTWO IDENTIFIER\r | |
310 | STRING CHARACTER\r | |
311 | -------------------------------------------------------------------------\r | |
312 | \r | |
313 | #143. (Changed in MR11) Optional ";" at end of #token statement\r | |
314 | \r | |
315 | Fixes problem of:\r | |
316 | \r | |
317 | #token X "x"\r | |
318 | \r | |
319 | <<\r | |
320 | parser action\r | |
321 | >>\r | |
322 | \r | |
323 | Being confused with:\r | |
324 | \r | |
325 | #token X "x" <<lexical action>>\r | |
326 | \r | |
327 | #142. (Changed in MR11) class BufFileInput subclass of DLGInputStream\r | |
328 | \r | |
329 | Alexey Demakov (demakov@kazbek.ispras.ru) has supplied class\r | |
330 | BufFileInput derived from DLGInputStream which provides a\r | |
331 | function lookahead(char *string) to test characters in the\r | |
332 | input stream more than one character ahead.\r | |
333 | The class is located in pccts/h/BufFileInput.* of the kit.\r | |
334 | \r | |
335 | #140. #pred to define predicates\r | |
336 | \r | |
337 | +---------------------------------------------------+\r | |
338 | | Note: Assume "-prc on" for this entire discussion |\r | |
339 | +---------------------------------------------------+\r | |
340 | \r | |
341 | A problem with predicates is that each one is regarded as\r | |
342 | unique and capable of disambiguating cases where two\r | |
343 | alternatives have identical lookahead. For example:\r | |
344 | \r | |
345 | rule : <<pred(LATEXT(1))>>? A\r | |
346 | | <<pred(LATEXT(1))>>? A\r | |
347 | ;\r | |
348 | \r | |
349 | will not cause any error messages or warnings to be issued\r | |
350 | by earlier versions of pccts. To compare the text of the\r | |
351 | predicates is an incomplete solution.\r | |
352 | \r | |
353 | In 1.33MR11 I am introducing the #pred statement in order to\r | |
354 | solve some problems with predicates. The #pred statement allows\r | |
355 | one to give a symbolic name to a "predicate literal" or a\r | |
356 | "predicate expression" in order to refer to it in other predicate\r | |
357 | expressions or in the rules of the grammar.\r | |
358 | \r | |
359 | The predicate literal associated with a predicate symbol is C\r | |
360 | or C++ code which can be used to test the condition. A\r | |
361 | predicate expression defines a predicate symbol in terms of other\r | |
362 | predicate symbols using "!", "&&", and "||". A predicate symbol\r | |
363 | can be defined in terms of a predicate literal, a predicate\r | |
364 | expression, or *both*.\r | |
365 | \r | |
366 | When a predicate symbol is defined with both a predicate literal\r | |
367 | and a predicate expression, the predicate literal is used to generate\r | |
368 | code, but the predicate expression is used to check for two\r | |
369 | alternatives with identical predicates in both alternatives.\r | |
370 | \r | |
371 | Here are some examples of #pred statements:\r | |
372 | \r | |
373 | #pred IsLabel <<isLabel(LATEXT(1))>>?\r | |
374 | #pred IsLocalVar <<isLocalVar(LATEXT(1))>>?\r | |
375 | #pred IsGlobalVar <<isGlobalVar(LATEXT(1)>>?\r | |
376 | #pred IsVar <<isVar(LATEXT(1))>>? IsLocalVar || IsGlobalVar\r | |
377 | #pred IsScoped <<isScoped(LATEXT(1))>>? IsLabel || IsLocalVar\r | |
378 | \r | |
379 | I hope that the use of EBNF notation to describe the syntax of the\r | |
380 | #pred statement will not cause problems for my readers (joke).\r | |
381 | \r | |
382 | predStatement : "#pred"\r | |
383 | CapitalizedName\r | |
384 | (\r | |
385 | "<<predicate_literal>>?"\r | |
386 | | "<<predicate_literal>>?" predOrExpr\r | |
387 | | predOrExpr\r | |
388 | )\r | |
389 | ;\r | |
390 | \r | |
391 | predOrExpr : predAndExpr ( "||" predAndExpr ) * ;\r | |
392 | \r | |
393 | predAndExpr : predPrimary ( "&&" predPrimary ) * ;\r | |
394 | \r | |
395 | predPrimary : CapitalizedName\r | |
396 | | "!" predPrimary\r | |
397 | | "(" predOrExpr ")"\r | |
398 | ;\r | |
399 | \r | |
400 | What is the purpose of this nonsense ?\r | |
401 | \r | |
402 | To understand how predicate symbols help, you need to realize that\r | |
403 | predicate symbols are used in two different ways with two different\r | |
404 | goals.\r | |
405 | \r | |
406 | a. Allow simplification of predicates which have been combined\r | |
407 | during predicate hoisting.\r | |
408 | \r | |
409 | b. Allow recognition of identical predicates which can't disambiguate\r | |
410 | alternatives with common lookahead.\r | |
411 | \r | |
412 | First we will discuss goal (a). Consider the following rule:\r | |
413 | \r | |
414 | rule0: rule1\r | |
415 | | ID\r | |
416 | | ...\r | |
417 | ;\r | |
418 | \r | |
419 | rule1: rule2\r | |
420 | | rule3\r | |
421 | ;\r | |
422 | \r | |
423 | rule2: <<isX(LATEXT(1))>>? ID ;\r | |
424 | rule3: <<!isX(LATEXT(1)>>? ID ;\r | |
425 | \r | |
426 | When the predicates in rule2 and rule3 are combined by hoisting\r | |
427 | to create a prediction expression for rule1 the result is:\r | |
428 | \r | |
429 | if ( LA(1)==ID\r | |
430 | && ( isX(LATEXT(1) || !isX(LATEXT(1) ) ) { rule1(); ...\r | |
431 | \r | |
432 | This is inefficient, but more importantly, can lead to false\r | |
433 | assumptions that the predicate expression distinguishes the rule1\r | |
434 | alternative with some other alternative with lookahead ID. In\r | |
435 | MR11 one can write:\r | |
436 | \r | |
437 | #pred IsX <<isX(LATEXT(1))>>?\r | |
438 | \r | |
439 | ...\r | |
440 | \r | |
441 | rule2: <<IsX>>? ID ;\r | |
442 | rule3: <<!IsX>>? ID ;\r | |
443 | \r | |
444 | During hoisting MR11 recognizes this as a special case and\r | |
445 | eliminates the predicates. The result is a prediction\r | |
446 | expression like the following:\r | |
447 | \r | |
448 | if ( LA(1)==ID ) { rule1(); ...\r | |
449 | \r | |
450 | Please note that the following cases which appear to be equivalent\r | |
451 | *cannot* be simplified by MR11 during hoisting because the hoisting\r | |
452 | logic only checks for a "!" in the predicate action, not in the\r | |
453 | predicate expression for a predicate symbol.\r | |
454 | \r | |
455 | *Not* equivalent and is not simplified during hoisting:\r | |
456 | \r | |
457 | #pred IsX <<isX(LATEXT(1))>>?\r | |
458 | #pred NotX <<!isX(LATEXT(1))>>?\r | |
459 | ...\r | |
460 | rule2: <<IsX>>? ID ;\r | |
461 | rule3: <<NotX>>? ID ;\r | |
462 | \r | |
463 | *Not* equivalent and is not simplified during hoisting:\r | |
464 | \r | |
465 | #pred IsX <<isX(LATEXT(1))>>?\r | |
466 | #pred NotX !IsX\r | |
467 | ...\r | |
468 | rule2: <<IsX>>? ID ;\r | |
469 | rule3: <<NotX>>? ID ;\r | |
470 | \r | |
471 | Now we will discuss goal (b).\r | |
472 | \r | |
473 | When antlr discovers that there is a lookahead ambiguity between\r | |
474 | two alternatives it attempts to resolve the ambiguity by searching\r | |
475 | for predicates in both alternatives. In the past any predicate\r | |
476 | would do, even if the same one appeared in both alternatives:\r | |
477 | \r | |
478 | rule: <<p(LATEXT(1))>>? X\r | |
479 | | <<p(LATEXT(1))>>? X\r | |
480 | ;\r | |
481 | \r | |
482 | The #pred statement is a start towards solving this problem.\r | |
483 | During ambiguity resolution (*not* predicate hoisting) the\r | |
484 | predicates for the two alternatives are expanded and compared.\r | |
485 | Consider the following example:\r | |
486 | \r | |
487 | #pred Upper <<isUpper(LATEXT(1))>>?\r | |
488 | #pred Lower <<isLower(LATEXT(1))>>?\r | |
489 | #pred Alpha <<isAlpha(LATEXT(1))>>? Upper || Lower\r | |
490 | \r | |
491 | rule0: rule1\r | |
492 | | <<Alpha>>? ID\r | |
493 | ;\r | |
494 | \r | |
495 | rule1:\r | |
496 | | rule2\r | |
497 | | rule3\r | |
498 | ...\r | |
499 | ;\r | |
500 | \r | |
501 | rule2: <<Upper>>? ID;\r | |
502 | rule3: <<Lower>>? ID;\r | |
503 | \r | |
504 | The definition of #pred Alpha expresses:\r | |
505 | \r | |
506 | a. to test the predicate use the C code "isAlpha(LATEXT(1))"\r | |
507 | \r | |
508 | b. to analyze the predicate use the information that\r | |
509 | Alpha is equivalent to the union of Upper and Lower,\r | |
510 | \r | |
511 | During ambiguity resolution the definition of Alpha is expanded\r | |
512 | into "Upper || Lower" and compared with the predicate in the other\r | |
513 | alternative, which is also "Upper || Lower". Because they are\r | |
514 | identical MR11 will report a problem.\r | |
515 | \r | |
516 | -------------------------------------------------------------------------\r | |
517 | t10.g, line 5: warning: the predicates used to disambiguate rule rule0\r | |
518 | (file t10.g alt 1 line 5 and alt 2 line 6)\r | |
519 | are identical when compared without context and may have no\r | |
520 | resolving power for some lookahead sequences.\r | |
521 | -------------------------------------------------------------------------\r | |
522 | \r | |
523 | If you use the "-info p" option the output file will contain:\r | |
524 | \r | |
525 | +----------------------------------------------------------------------+\r | |
526 | |#if 0 |\r | |
527 | | |\r | |
528 | |The following predicates are identical when compared without |\r | |
529 | | lookahead context information. For some ambiguous lookahead |\r | |
530 | | sequences they may not have any power to resolve the ambiguity. |\r | |
531 | | |\r | |
532 | |Choice 1: rule0/1 alt 1 line 5 file t10.g |\r | |
533 | | |\r | |
534 | | The original predicate for choice 1 with available context |\r | |
535 | | information: |\r | |
536 | | |\r | |
537 | | OR expr |\r | |
538 | | |\r | |
539 | | pred << Upper>>? |\r | |
540 | | depth=k=1 rule rule2 line 14 t10.g |\r | |
541 | | set context: |\r | |
542 | | ID |\r | |
543 | | |\r | |
544 | | pred << Lower>>? |\r | |
545 | | depth=k=1 rule rule3 line 15 t10.g |\r | |
546 | | set context: |\r | |
547 | | ID |\r | |
548 | | |\r | |
549 | | The predicate for choice 1 after expansion (but without context |\r | |
550 | | information): |\r | |
551 | | |\r | |
552 | | OR expr |\r | |
553 | | |\r | |
554 | | pred << isUpper(LATEXT(1))>>? |\r | |
555 | | depth=k=1 rule line 1 t10.g |\r | |
556 | | |\r | |
557 | | pred << isLower(LATEXT(1))>>? |\r | |
558 | | depth=k=1 rule line 2 t10.g |\r | |
559 | | |\r | |
560 | | |\r | |
561 | |Choice 2: rule0/2 alt 2 line 6 file t10.g |\r | |
562 | | |\r | |
563 | | The original predicate for choice 2 with available context |\r | |
564 | | information: |\r | |
565 | | |\r | |
566 | | pred << Alpha>>? |\r | |
567 | | depth=k=1 rule rule0 line 6 t10.g |\r | |
568 | | set context: |\r | |
569 | | ID |\r | |
570 | | |\r | |
571 | | The predicate for choice 2 after expansion (but without context |\r | |
572 | | information): |\r | |
573 | | |\r | |
574 | | OR expr |\r | |
575 | | |\r | |
576 | | pred << isUpper(LATEXT(1))>>? |\r | |
577 | | depth=k=1 rule line 1 t10.g |\r | |
578 | | |\r | |
579 | | pred << isLower(LATEXT(1))>>? |\r | |
580 | | depth=k=1 rule line 2 t10.g |\r | |
581 | | |\r | |
582 | | |\r | |
583 | |#endif |\r | |
584 | +----------------------------------------------------------------------+\r | |
585 | \r | |
586 | The comparison of the predicates for the two alternatives takes\r | |
587 | place without context information, which means that in some cases\r | |
588 | the predicates will be considered identical even though they operate\r | |
589 | on disjoint lookahead sets. Consider:\r | |
590 | \r | |
591 | #pred Alpha\r | |
592 | \r | |
593 | rule1: <<Alpha>>? ID\r | |
594 | | <<Alpha>>? Label\r | |
595 | ;\r | |
596 | \r | |
597 | Because the comparison of predicates takes place without context\r | |
598 | these will be considered identical. The reason for comparing\r | |
599 | without context is that otherwise it would be necessary to re-evaluate\r | |
600 | the entire predicate expression for each possible lookahead sequence.\r | |
601 | This would require more code to be written and more CPU time during\r | |
602 | grammar analysis, and it is not yet clear whether anyone will even make\r | |
603 | use of the new #pred facility.\r | |
604 | \r | |
605 | A temporary workaround might be to use different #pred statements\r | |
606 | for predicates you know have different context. This would avoid\r | |
607 | extraneous warnings.\r | |
608 | \r | |
609 | The above example might be termed a "false positive". Comparison\r | |
610 | without context will also lead to "false negatives". Consider the\r | |
611 | following example:\r | |
612 | \r | |
613 | #pred Alpha\r | |
614 | #pred Beta\r | |
615 | \r | |
616 | rule1: <<Alpha>>? A\r | |
617 | | rule2\r | |
618 | ;\r | |
619 | \r | |
620 | rule2: <<Alpha>>? A\r | |
621 | | <<Beta>>? B\r | |
622 | ;\r | |
623 | \r | |
624 | The predicate used for alt 2 of rule1 is (Alpha || Beta). This\r | |
625 | appears to be different than the predicate Alpha used for alt1.\r | |
626 | However, the context of Beta is B. Thus when the lookahead is A\r | |
627 | Beta will have no resolving power and Alpha will be used for both\r | |
628 | alternatives. Using the same predicate for both alternatives isn't\r | |
629 | very helpful, but this will not be detected with 1.33MR11.\r | |
630 | \r | |
631 | To properly handle this the predicate expression would have to be\r | |
632 | evaluated for each distinct lookahead context.\r | |
633 | \r | |
634 | To determine whether two predicate expressions are identical is\r | |
635 | difficult. The routine may fail to identify identical predicates.\r | |
636 | \r | |
637 | The #pred feature also compares predicates to see if a choice between\r | |
638 | alternatives which is resolved by a predicate which makes the second\r | |
639 | choice unreachable. Consider the following example:\r | |
640 | \r | |
641 | #pred A <<A(LATEXT(1)>>?\r | |
642 | #pred B <<B(LATEXT(1)>>?\r | |
643 | #pred A_or_B A || B\r | |
644 | \r | |
645 | r : s\r | |
646 | | t\r | |
647 | ;\r | |
648 | s : <<A_or_B>>? ID\r | |
649 | ;\r | |
650 | t : <<A>>? ID\r | |
651 | ;\r | |
652 | \r | |
653 | ----------------------------------------------------------------------------\r | |
654 | t11.g, line 5: warning: the predicate used to disambiguate the\r | |
655 | first choice of rule r\r | |
656 | (file t11.g alt 1 line 5 and alt 2 line 6)\r | |
657 | appears to "cover" the second predicate when compared without context.\r | |
658 | The second predicate may have no resolving power for some lookahead\r | |
659 | sequences.\r | |
660 | ----------------------------------------------------------------------------\r | |
661 | \r | |
662 | #132. (Changed in 1.33MR11) Recognition of identical predicates in alts\r | |
663 | \r | |
664 | Prior to 1.33MR11, there would be no ambiguity warning when the\r | |
665 | very same predicate was used to disambiguate both alternatives:\r | |
666 | \r | |
667 | test: ref B\r | |
668 | | ref C\r | |
669 | ;\r | |
670 | \r | |
671 | ref : <<pred(LATEXT(1)>>? A\r | |
672 | \r | |
673 | In 1.33MR11 this will cause the warning:\r | |
674 | \r | |
675 | warning: the predicates used to disambiguate rule test\r | |
676 | (file v98.g alt 1 line 1 and alt 2 line 2)\r | |
677 | are identical and have no resolving power\r | |
678 | \r | |
679 | ----------------- Note -----------------\r | |
680 | \r | |
681 | This is different than the following case\r | |
682 | \r | |
683 | test: <<pred(LATEXT(1))>>? A B\r | |
684 | | <<pred(LATEXT(1)>>? A C\r | |
685 | ;\r | |
686 | \r | |
687 | In this case there are two distinct predicates\r | |
688 | which have exactly the same text. In the first\r | |
689 | example there are two references to the same\r | |
690 | predicate. The problem represented by this\r | |
691 | grammar will be addressed later.\r | |
692 | \r | |
693 | \r | |
694 | #127. (Changed in 1.33MR11)\r | |
695 | \r | |
696 | Count Syntax Errors Count DLG Errors\r | |
697 | ------------------- ----------------\r | |
698 | \r | |
699 | C++ mode ANTLRParser:: DLGLexerBase::\r | |
700 | syntaxErrCount lexErrCount\r | |
701 | C mode zzSyntaxErrCount zzLexErrCount\r | |
702 | \r | |
703 | The C mode variables are global and initialized to 0.\r | |
704 | They are *not* reset to 0 automatically when antlr is\r | |
705 | restarted.\r | |
706 | \r | |
707 | The C++ mode variables are public. They are initialized\r | |
708 | to 0 by the constructors. They are *not* reset to 0 by the\r | |
709 | ANTLRParser::init() method.\r | |
710 | \r | |
711 | Suggested by Reinier van den Born (reinier@vnet.ibm.com).\r | |
712 | \r | |
713 | #126. (Changed in 1.33MR11) Addition of #first <<...>>\r | |
714 | \r | |
715 | The #first <<...>> inserts the specified text in the output\r | |
716 | files before any other #include statements required by pccts.\r | |
717 | The only things before the #first text are comments and\r | |
718 | a #define ANTLR_VERSION.\r | |
719 | \r | |
720 | Requested by and Esa Pulkkinen (esap@cs.tut.fi) and Alexin\r | |
721 | Zoltan (alexin@inf.u-szeged.hu).\r | |
722 | \r | |
723 | #124. A Note on the New "&&" Style Guarded Predicates\r | |
724 | \r | |
725 | I've been asked several times, "What is the difference between\r | |
726 | the old "=>" style guard predicates and the new style "&&" guard\r | |
727 | predicates, and how do you choose one over the other" ?\r | |
728 | \r | |
729 | The main difference is that the "=>" does not apply the\r | |
730 | predicate if the context guard doesn't match, whereas\r | |
731 | the && form always does. What is the significance ?\r | |
732 | \r | |
733 | If you have a predicate which is not on the "leading edge"\r | |
734 | it is cannot be hoisted. Suppose you need a predicate that\r | |
735 | looks at LA(2). You must introduce it manually. The\r | |
736 | classic example is:\r | |
737 | \r | |
738 | castExpr :\r | |
739 | LP typeName RP\r | |
740 | | ....\r | |
741 | ;\r | |
742 | \r | |
743 | typeName : <<isTypeName(LATEXT(1))>>? ID\r | |
744 | | STRUCT ID\r | |
745 | ;\r | |
746 | \r | |
747 | The problem is that isTypeName() isn't on the leading edge\r | |
748 | of typeName, so it won't be hoisted into castExpr to help\r | |
749 | make a decision on which production to choose.\r | |
750 | \r | |
751 | The *first* attempt to fix it is this:\r | |
752 | \r | |
753 | castExpr :\r | |
754 | <<isTypeName(LATEXT(2))>>?\r | |
755 | LP typeName RP\r | |
756 | | ....\r | |
757 | ;\r | |
758 | \r | |
759 | Unfortunately, this won't work because it ignores\r | |
760 | the problem of STRUCT. The solution is to apply\r | |
761 | isTypeName() in castExpr if LA(2) is an ID and\r | |
762 | don't apply it when LA(2) is STRUCT:\r | |
763 | \r | |
764 | castExpr :\r | |
765 | (LP ID)? => <<isTypeName(LATEXT(2))>>?\r | |
766 | LP typeName RP\r | |
767 | | ....\r | |
768 | ;\r | |
769 | \r | |
770 | In conclusion, the "=>" style guarded predicate is\r | |
771 | useful when:\r | |
772 | \r | |
773 | a. the tokens required for the predicate\r | |
774 | are not on the leading edge\r | |
775 | b. there are alternatives in the expression\r | |
776 | selected by the predicate for which the\r | |
777 | predicate is inappropriate\r | |
778 | \r | |
779 | If (b) were false, then one could use a simple\r | |
780 | predicate (assuming "-prc on"):\r | |
781 | \r | |
782 | castExpr :\r | |
783 | <<isTypeName(LATEXT(2))>>?\r | |
784 | LP typeName RP\r | |
785 | | ....\r | |
786 | ;\r | |
787 | \r | |
788 | typeName : <<isTypeName(LATEXT(1))>>? ID\r | |
789 | ;\r | |
790 | \r | |
791 | So, when do you use the "&&" style guarded predicate ?\r | |
792 | \r | |
793 | The new-style "&&" predicate should always be used with\r | |
794 | predicate context. The context guard is in ADDITION to\r | |
795 | the automatically computed context. Thus it useful for\r | |
796 | predicates which depend on the token type for reasons\r | |
797 | other than context.\r | |
798 | \r | |
799 | The following example is contributed by Reinier van den Born\r | |
800 | (reinier@vnet.ibm.com).\r | |
801 | \r | |
802 | +-------------------------------------------------------------------------+\r | |
803 | | This grammar has two ways to call functions: |\r | |
804 | | |\r | |
805 | | - a "standard" call syntax with parens and comma separated args |\r | |
806 | | - a shell command like syntax (no parens and spacing separated args) |\r | |
807 | | |\r | |
808 | | The former also allows a variable to hold the name of the function, |\r | |
809 | | the latter can also be used to call external commands. |\r | |
810 | | |\r | |
811 | | The grammar (simplified) looks like this: |\r | |
812 | | |\r | |
813 | | fun_call : ID "(" { expr ("," expr)* } ")" |\r | |
814 | | /* ID is function name */ |\r | |
815 | | | "@" ID "(" { expr ("," expr)* } ")" |\r | |
816 | | /* ID is var containing fun name */ |\r | |
817 | | ; |\r | |
818 | | |\r | |
819 | | command : ID expr* /* ID is function name */ |\r | |
820 | | | path expr* /* path is external command name */ |\r | |
821 | | ; |\r | |
822 | | |\r | |
823 | | path : ID /* left out slashes and such */ |\r | |
824 | | | "@" ID /* ID is environment var */ |\r | |
825 | | ; |\r | |
826 | | |\r | |
827 | | expr : .... |\r | |
828 | | | "(" expr ")"; |\r | |
829 | | |\r | |
830 | | call : fun_call |\r | |
831 | | | command |\r | |
832 | | ; |\r | |
833 | | |\r | |
834 | | Obviously the call is wildly ambiguous. This is more or less how this |\r | |
835 | | is to be resolved: |\r | |
836 | | |\r | |
837 | | A call begins with an ID or an @ followed by an ID. |\r | |
838 | | |\r | |
839 | | If it is an ID and if it is an ext. command name -> command |\r | |
840 | | if followed by a paren -> fun_call |\r | |
841 | | otherwise -> command |\r | |
842 | | |\r | |
843 | | If it is an @ and if the ID is a var name -> fun_call |\r | |
844 | | otherwise -> command |\r | |
845 | | |\r | |
846 | | One can implement these rules quite neatly using && predicates: |\r | |
847 | | |\r | |
848 | | call : ("@" ID)? && <<isVarName(LT(2))>>? fun_call |\r | |
849 | | | (ID)? && <<isExtCmdName>>? command |\r | |
850 | | | (ID "(")? fun_call |\r | |
851 | | | command |\r | |
852 | | ; |\r | |
853 | | |\r | |
854 | | This can be done better, so it is not an ideal example, but it |\r | |
855 | | conveys the principle. |\r | |
856 | +-------------------------------------------------------------------------+\r | |
857 | \r | |
858 | #122. (Changed in 1.33MR11) Member functions to reset DLG in C++ mode\r | |
859 | \r | |
860 | void DLGFileReset(FILE *f) { input = f; found_eof = 0; }\r | |
861 | void DLGStringReset(DLGChar *s) { input = s; p = &input[0]; }\r | |
862 | \r | |
863 | Supplied by R.A. Nelson (cowboy@VNET.IBM.COM)\r | |
864 | \r | |
865 | #119. (Changed in 1.33MR11) Ambiguity aid for grammars\r | |
866 | \r | |
867 | The user can ask for additional information on ambiguities reported\r | |
868 | by antlr to stdout. At the moment, only one ambiguity report can\r | |
869 | be created in an antlr run.\r | |
870 | \r | |
871 | This feature is enabled using the "-aa" (Ambiguity Aid) option.\r | |
872 | \r | |
873 | The following options control the reporting of ambiguities:\r | |
874 | \r | |
875 | -aa ruleName Selects reporting by name of rule\r | |
876 | -aa lineNumber Selects reporting by line number\r | |
877 | (file name not compared)\r | |
878 | \r | |
879 | -aam Selects "multiple" reporting for a token\r | |
880 | in the intersection set of the\r | |
881 | alternatives.\r | |
882 | \r | |
883 | For instance, the token ID may appear dozens\r | |
884 | of times in various paths as the program\r | |
885 | explores the rules which are reachable from\r | |
886 | the point of an ambiguity. With option -aam\r | |
887 | every possible path the search program\r | |
888 | encounters is reported.\r | |
889 | \r | |
890 | Without -aam only the first encounter is\r | |
891 | reported. This may result in incomplete\r | |
892 | information, but the information may be\r | |
893 | sufficient and much shorter.\r | |
894 | \r | |
895 | -aad depth Selects the depth of the search.\r | |
896 | The default value is 1.\r | |
897 | \r | |
898 | The number of paths to be searched, and the\r | |
899 | size of the report can grow geometrically\r | |
900 | with the -ck value if a full search for all\r | |
901 | contributions to the source of the ambiguity\r | |
902 | is explored.\r | |
903 | \r | |
904 | The depth represents the number of tokens\r | |
905 | in the lookahead set which are matched against\r | |
906 | the set of ambiguous tokens. A depth of 1\r | |
907 | means that the search stops when a lookahead\r | |
908 | sequence of just one token is matched.\r | |
909 | \r | |
910 | A k=1 ck=6 grammar might generate 5,000 items\r | |
911 | in a report if a full depth 6 search is made\r | |
912 | with the Ambiguity Aid. The source of the\r | |
913 | problem may be in the first token and obscured\r | |
914 | by the volume of data - I hesitate to call\r | |
915 | it information.\r | |
916 | \r | |
917 | When the user selects a depth > 1, the search\r | |
918 | is first performed at depth=1 for both\r | |
919 | alternatives, then depth=2 for both alternatives,\r | |
920 | etc.\r | |
921 | \r | |
922 | Sample output for rule grammar in antlr.g itself:\r | |
923 | \r | |
924 | +---------------------------------------------------------------------+\r | |
925 | | Ambiguity Aid |\r | |
926 | | |\r | |
927 | | Choice 1: grammar/70 line 632 file a.g |\r | |
928 | | Choice 2: grammar/82 line 644 file a.g |\r | |
929 | | |\r | |
930 | | Intersection of lookahead[1] sets: |\r | |
931 | | |\r | |
932 | | "\}" "class" "#errclass" "#tokclass" |\r | |
933 | | |\r | |
934 | | Choice:1 Depth:1 Group:1 ("#errclass") |\r | |
935 | | 1 in (...)* block grammar/70 line 632 a.g |\r | |
936 | | 2 to error grammar/73 line 635 a.g |\r | |
937 | | 3 error error/1 line 894 a.g |\r | |
938 | | 4 #token "#errclass" error/2 line 895 a.g |\r | |
939 | | |\r | |
940 | | Choice:1 Depth:1 Group:2 ("#tokclass") |\r | |
941 | | 2 to tclass grammar/74 line 636 a.g |\r | |
942 | | 3 tclass tclass/1 line 937 a.g |\r | |
943 | | 4 #token "#tokclass" tclass/2 line 938 a.g |\r | |
944 | | |\r | |
945 | | Choice:1 Depth:1 Group:3 ("class") |\r | |
946 | | 2 to class_def grammar/75 line 637 a.g |\r | |
947 | | 3 class_def class_def/1 line 669 a.g |\r | |
948 | | 4 #token "class" class_def/3 line 671 a.g |\r | |
949 | | |\r | |
950 | | Choice:1 Depth:1 Group:4 ("\}") |\r | |
951 | | 2 #token "\}" grammar/76 line 638 a.g |\r | |
952 | | |\r | |
953 | | Choice:2 Depth:1 Group:5 ("#errclass") |\r | |
954 | | 1 in (...)* block grammar/83 line 645 a.g |\r | |
955 | | 2 to error grammar/93 line 655 a.g |\r | |
956 | | 3 error error/1 line 894 a.g |\r | |
957 | | 4 #token "#errclass" error/2 line 895 a.g |\r | |
958 | | |\r | |
959 | | Choice:2 Depth:1 Group:6 ("#tokclass") |\r | |
960 | | 2 to tclass grammar/94 line 656 a.g |\r | |
961 | | 3 tclass tclass/1 line 937 a.g |\r | |
962 | | 4 #token "#tokclass" tclass/2 line 938 a.g |\r | |
963 | | |\r | |
964 | | Choice:2 Depth:1 Group:7 ("class") |\r | |
965 | | 2 to class_def grammar/95 line 657 a.g |\r | |
966 | | 3 class_def class_def/1 line 669 a.g |\r | |
967 | | 4 #token "class" class_def/3 line 671 a.g |\r | |
968 | | |\r | |
969 | | Choice:2 Depth:1 Group:8 ("\}") |\r | |
970 | | 2 #token "\}" grammar/96 line 658 a.g |\r | |
971 | +---------------------------------------------------------------------+\r | |
972 | \r | |
973 | For a linear lookahead set ambiguity (where k=1 or for k>1 but\r | |
974 | when all lookahead sets [i] with i<k all have degree one) the\r | |
975 | reports appear in the following order:\r | |
976 | \r | |
977 | for (depth=1 ; depth <= "-aad depth" ; depth++) {\r | |
978 | for (alternative=1; alternative <=2 ; alternative++) {\r | |
979 | while (matches-are-found) {\r | |
980 | group++;\r | |
981 | print-report\r | |
982 | };\r | |
983 | };\r | |
984 | };\r | |
985 | \r | |
986 | For reporting a k-tuple ambiguity, the reports appear in the\r | |
987 | following order:\r | |
988 | \r | |
989 | for (depth=1 ; depth <= "-aad depth" ; depth++) {\r | |
990 | while (matches-are-found) {\r | |
991 | for (alternative=1; alternative <=2 ; alternative++) {\r | |
992 | group++;\r | |
993 | print-report\r | |
994 | };\r | |
995 | };\r | |
996 | };\r | |
997 | \r | |
998 | This is because matches are generated in different ways for\r | |
999 | linear lookahead and k-tuples.\r | |
1000 | \r | |
1001 | #117. (Changed in 1.33MR10) new EXPERIMENTAL predicate hoisting code\r | |
1002 | \r | |
1003 | The hoisting of predicates into rules to create prediction\r | |
1004 | expressions is a problem in antlr. Consider the following\r | |
1005 | example (k=1 with -prc on):\r | |
1006 | \r | |
1007 | start : (a)* "@" ;\r | |
1008 | a : b | c ;\r | |
1009 | b : <<isUpper(LATEXT(1))>>? A ;\r | |
1010 | c : A ;\r | |
1011 | \r | |
1012 | Prior to 1.33MR10 the code generated for "start" would resemble:\r | |
1013 | \r | |
1014 | while {\r | |
1015 | if (LA(1)==A &&\r | |
1016 | (!LA(1)==A || isUpper())) {\r | |
1017 | a();\r | |
1018 | }\r | |
1019 | };\r | |
1020 | \r | |
1021 | This code is wrong because it makes rule "c" unreachable from\r | |
1022 | "start". The essence of the problem is that antlr fails to\r | |
1023 | recognize that there can be a valid alternative within "a" even\r | |
1024 | when the predicate <<isUpper(LATEXT(1))>>? is false.\r | |
1025 | \r | |
1026 | In 1.33MR10 with -mrhoist the hoisting of the predicate into\r | |
1027 | "start" is suppressed because it recognizes that "c" can\r | |
1028 | cover all the cases where the predicate is false:\r | |
1029 | \r | |
1030 | while {\r | |
1031 | if (LA(1)==A) {\r | |
1032 | a();\r | |
1033 | }\r | |
1034 | };\r | |
1035 | \r | |
1036 | With the antlr "-info p" switch the user will receive information\r | |
1037 | about the predicate suppression in the generated file:\r | |
1038 | \r | |
1039 | --------------------------------------------------------------\r | |
1040 | #if 0\r | |
1041 | \r | |
1042 | Hoisting of predicate suppressed by alternative without predicate.\r | |
1043 | The alt without the predicate includes all cases where\r | |
1044 | the predicate is false.\r | |
1045 | \r | |
1046 | WITH predicate: line 7 v1.g\r | |
1047 | WITHOUT predicate: line 7 v1.g\r | |
1048 | \r | |
1049 | The context set for the predicate:\r | |
1050 | \r | |
1051 | A\r | |
1052 | \r | |
1053 | The lookahead set for the alt WITHOUT the semantic predicate:\r | |
1054 | \r | |
1055 | A\r | |
1056 | \r | |
1057 | The predicate:\r | |
1058 | \r | |
1059 | pred << isUpper(LATEXT(1))>>?\r | |
1060 | depth=k=1 rule b line 9 v1.g\r | |
1061 | set context:\r | |
1062 | A\r | |
1063 | tree context: null\r | |
1064 | \r | |
1065 | Chain of referenced rules:\r | |
1066 | \r | |
1067 | #0 in rule start (line 5 v1.g) to rule a\r | |
1068 | #1 in rule a (line 7 v1.g)\r | |
1069 | \r | |
1070 | #endif\r | |
1071 | --------------------------------------------------------------\r | |
1072 | \r | |
1073 | A predicate can be suppressed by a combination of alternatives\r | |
1074 | which, taken together, cover a predicate:\r | |
1075 | \r | |
1076 | start : (a)* "@" ;\r | |
1077 | \r | |
1078 | a : b | ca | cb | cc ;\r | |
1079 | \r | |
1080 | b : <<isUpper(LATEXT(1))>>? ( A | B | C ) ;\r | |
1081 | \r | |
1082 | ca : A ;\r | |
1083 | cb : B ;\r | |
1084 | cc : C ;\r | |
1085 | \r | |
1086 | Consider a more complex example in which "c" covers only part of\r | |
1087 | a predicate:\r | |
1088 | \r | |
1089 | start : (a)* "@" ;\r | |
1090 | \r | |
1091 | a : b\r | |
1092 | | c\r | |
1093 | ;\r | |
1094 | \r | |
1095 | b : <<isUpper(LATEXT(1))>>?\r | |
1096 | ( A\r | |
1097 | | X\r | |
1098 | );\r | |
1099 | \r | |
1100 | c : A\r | |
1101 | ;\r | |
1102 | \r | |
1103 | Prior to 1.33MR10 the code generated for "start" would resemble:\r | |
1104 | \r | |
1105 | while {\r | |
1106 | if ( (LA(1)==A || LA(1)==X) &&\r | |
1107 | (! (LA(1)==A || LA(1)==X) || isUpper()) {\r | |
1108 | a();\r | |
1109 | }\r | |
1110 | };\r | |
1111 | \r | |
1112 | With 1.33MR10 and -mrhoist the predicate context is restricted to\r | |
1113 | the non-covered lookahead. The code resembles:\r | |
1114 | \r | |
1115 | while {\r | |
1116 | if ( (LA(1)==A || LA(1)==X) &&\r | |
1117 | (! (LA(1)==X) || isUpper()) {\r | |
1118 | a();\r | |
1119 | }\r | |
1120 | };\r | |
1121 | \r | |
1122 | With the antlr "-info p" switch the user will receive information\r | |
1123 | about the predicate restriction in the generated file:\r | |
1124 | \r | |
1125 | --------------------------------------------------------------\r | |
1126 | #if 0\r | |
1127 | \r | |
1128 | Restricting the context of a predicate because of overlap\r | |
1129 | in the lookahead set between the alternative with the\r | |
1130 | semantic predicate and one without\r | |
1131 | Without this restriction the alternative without the predicate\r | |
1132 | could not be reached when input matched the context of the\r | |
1133 | predicate and the predicate was false.\r | |
1134 | \r | |
1135 | WITH predicate: line 11 v4.g\r | |
1136 | WITHOUT predicate: line 12 v4.g\r | |
1137 | \r | |
1138 | The original context set for the predicate:\r | |
1139 | \r | |
1140 | A X\r | |
1141 | \r | |
1142 | The lookahead set for the alt WITHOUT the semantic predicate:\r | |
1143 | \r | |
1144 | A\r | |
1145 | \r | |
1146 | The intersection of the two sets\r | |
1147 | \r | |
1148 | A\r | |
1149 | \r | |
1150 | The original predicate:\r | |
1151 | \r | |
1152 | pred << isUpper(LATEXT(1))>>?\r | |
1153 | depth=k=1 rule b line 15 v4.g\r | |
1154 | set context:\r | |
1155 | A X\r | |
1156 | tree context: null\r | |
1157 | \r | |
1158 | The new (modified) form of the predicate:\r | |
1159 | \r | |
1160 | pred << isUpper(LATEXT(1))>>?\r | |
1161 | depth=k=1 rule b line 15 v4.g\r | |
1162 | set context:\r | |
1163 | X\r | |
1164 | tree context: null\r | |
1165 | \r | |
1166 | #endif\r | |
1167 | --------------------------------------------------------------\r | |
1168 | \r | |
1169 | The bad news about -mrhoist:\r | |
1170 | \r | |
1171 | (a) -mrhoist does not analyze predicates with lookahead\r | |
1172 | depth > 1.\r | |
1173 | \r | |
1174 | (b) -mrhoist does not look past a guarded predicate to\r | |
1175 | find context which might cover other predicates.\r | |
1176 | \r | |
1177 | For these cases you might want to use syntactic predicates.\r | |
1178 | When a semantic predicate fails during guess mode the guess\r | |
1179 | fails and the next alternative is tried.\r | |
1180 | \r | |
1181 | Limitation (a) is illustrated by the following example:\r | |
1182 | \r | |
1183 | start : (stmt)* EOF ;\r | |
1184 | \r | |
1185 | stmt : cast\r | |
1186 | | expr\r | |
1187 | ;\r | |
1188 | cast : <<isTypename(LATEXT(2))>>? LP ID RP ;\r | |
1189 | \r | |
1190 | expr : LP ID RP ;\r | |
1191 | \r | |
1192 | This is not much different from the first example, except that\r | |
1193 | it requires two tokens of lookahead context to determine what\r | |
1194 | to do. This predicate is NOT suppressed because the current version\r | |
1195 | is unable to handle predicates with depth > 1.\r | |
1196 | \r | |
1197 | A predicate can be combined with other predicates during hoisting.\r | |
1198 | In those cases the depth=1 predicates are still handled. Thus,\r | |
1199 | in the following example the isUpper() predicate will be suppressed\r | |
1200 | by line #4 when hoisted from "bizarre" into "start", but will still\r | |
1201 | be present in "bizarre" in order to predict "stmt".\r | |
1202 | \r | |
1203 | start : (bizarre)* EOF ; // #1\r | |
1204 | // #2\r | |
1205 | bizarre : stmt // #3\r | |
1206 | | A // #4\r | |
1207 | ;\r | |
1208 | \r | |
1209 | stmt : cast\r | |
1210 | | expr\r | |
1211 | ;\r | |
1212 | \r | |
1213 | cast : <<isTypename(LATEXT(2))>>? LP ID RP ;\r | |
1214 | \r | |
1215 | expr : LP ID RP ;\r | |
1216 | | <<isUpper(LATEXT(1))>>? A\r | |
1217 | \r | |
1218 | Limitation (b) is illustrated by the following example of a\r | |
1219 | context guarded predicate:\r | |
1220 | \r | |
1221 | rule : (A)? <<p>>? // #1\r | |
1222 | (A // #2\r | |
1223 | |B // #3\r | |
1224 | ) // #4\r | |
1225 | | <<q>> B // #5\r | |
1226 | ;\r | |
1227 | \r | |
1228 | Recall that this means that when the lookahead is NOT A then\r | |
1229 | the predicate "p" is ignored and it attempts to match "A|B".\r | |
1230 | Ideally, the "B" at line #3 should suppress predicate "q".\r | |
1231 | However, the current version does not attempt to look past\r | |
1232 | the guard predicate to find context which might suppress other\r | |
1233 | predicates.\r | |
1234 | \r | |
1235 | In some cases -mrhoist will lead to the reporting of ambiguities\r | |
1236 | which were not visible before:\r | |
1237 | \r | |
1238 | start : (a)* "@";\r | |
1239 | a : bc | d;\r | |
1240 | bc : b | c ;\r | |
1241 | \r | |
1242 | b : <<isUpper(LATEXT(1))>>? A;\r | |
1243 | c : A ;\r | |
1244 | \r | |
1245 | d : A ;\r | |
1246 | \r | |
1247 | In this case there is a true ambiguity in "a" between "bc" and "d"\r | |
1248 | which can both match "A". Without -mrhoist the predicate in "b"\r | |
1249 | is hoisted into "a" and there is no ambiguity reported. However,\r | |
1250 | with -mrhoist, the predicate in "b" is suppressed by "c" (as it\r | |
1251 | should be) making the ambiguity in "a" apparent.\r | |
1252 | \r | |
1253 | The motivations for these changes were hoisting problems reported\r | |
1254 | by Reinier van den Born (reinier@vnet.ibm.com) and several others.\r | |
1255 | \r | |
1256 | #113. (Changed in 1.33MR10) new context guarded pred: (g)? && <<p>>? expr\r | |
1257 | \r | |
1258 | The existing context guarded predicate:\r | |
1259 | \r | |
1260 | rule : (guard)? => <<p>>? expr\r | |
1261 | | next_alternative\r | |
1262 | ;\r | |
1263 | \r | |
1264 | generates code which resembles:\r | |
1265 | \r | |
1266 | if (lookahead(expr) && (!guard || pred)) {\r | |
1267 | expr()\r | |
1268 | } else ....\r | |
1269 | \r | |
1270 | This is not suitable for some applications because it allows\r | |
1271 | expr() to be invoked when the predicate is false. This is\r | |
1272 | intentional because it is meant to mimic automatically computed\r | |
1273 | predicate context.\r | |
1274 | \r | |
1275 | The new context guarded predicate uses the guard information\r | |
1276 | differently because it has a different goal. Consider:\r | |
1277 | \r | |
1278 | rule : (guard)? && <<p>>? expr\r | |
1279 | | next_alternative\r | |
1280 | ;\r | |
1281 | \r | |
1282 | The new style of context guarded predicate is equivalent to:\r | |
1283 | \r | |
1284 | rule : <<guard==true && pred>>? expr\r | |
1285 | | next_alternative\r | |
1286 | ;\r | |
1287 | \r | |
1288 | It generates code which resembles:\r | |
1289 | \r | |
1290 | if (lookahead(expr) && guard && pred) {\r | |
1291 | expr();\r | |
1292 | } else ...\r | |
1293 | \r | |
1294 | Both forms of guarded predicates severely restrict the form of\r | |
1295 | the context guard: it can contain no rule references, no\r | |
1296 | (...)*, no (...)+, and no {...}. It may contain token and\r | |
1297 | token class references, and alternation ("|").\r | |
1298 | \r | |
1299 | Addition for 1.33MR11: in the token expression all tokens must\r | |
1300 | be at the same height of the token tree:\r | |
1301 | \r | |
1302 | (A ( B | C))? && ... is ok (all height 2)\r | |
1303 | (A ( B | ))? && ... is not ok (some 1, some 2)\r | |
1304 | (A B C D | E F G H)? && ... is ok (all height 4)\r | |
1305 | (A B C D | E )? && ... is not ok (some 4, some 1)\r | |
1306 | \r | |
1307 | This restriction is required in order to properly compute the lookahead\r | |
1308 | set for expressions like:\r | |
1309 | \r | |
1310 | rule1 : (A B C)? && <<pred>>? rule2 ;\r | |
1311 | rule2 : (A|X) (B|Y) (C|Z);\r | |
1312 | \r | |
1313 | This addition was suggested by Rienier van den Born (reinier@vnet.ibm.com)\r | |
1314 | \r | |
1315 | #109. (Changed in 1.33MR10) improved trace information\r | |
1316 | \r | |
1317 | The quality of the trace information provided by the "-gd"\r | |
1318 | switch has been improved significantly. Here is an example\r | |
1319 | of the output from a test program. It shows the rule name,\r | |
1320 | the first token of lookahead, the call depth, and the guess\r | |
1321 | status:\r | |
1322 | \r | |
1323 | exit rule gusxx {"?"} depth 2\r | |
1324 | enter rule gusxx {"?"} depth 2\r | |
1325 | enter rule gus1 {"o"} depth 3 guessing\r | |
1326 | guess done - returning to rule gus1 {"o"} at depth 3\r | |
1327 | (guess mode continues - an enclosing guess is still active)\r | |
1328 | guess done - returning to rule gus1 {"Z"} at depth 3\r | |
1329 | (guess mode continues - an enclosing guess is still active)\r | |
1330 | exit rule gus1 {"Z"} depth 3 guessing\r | |
1331 | guess done - returning to rule gusxx {"o"} at depth 2 (guess mode ends)\r | |
1332 | enter rule gus1 {"o"} depth 3\r | |
1333 | guess done - returning to rule gus1 {"o"} at depth 3 (guess mode ends)\r | |
1334 | guess done - returning to rule gus1 {"Z"} at depth 3 (guess mode ends)\r | |
1335 | exit rule gus1 {"Z"} depth 3\r | |
1336 | line 1: syntax error at "Z" missing SC\r | |
1337 | ...\r | |
1338 | \r | |
1339 | Rule trace reporting is controlled by the value of the integer\r | |
1340 | [zz]traceOptionValue: when it is positive tracing is enabled,\r | |
1341 | otherwise it is disabled. Tracing during guess mode is controlled\r | |
1342 | by the value of the integer [zz]traceGuessOptionValue. When\r | |
1343 | it is positive AND [zz]traceOptionValue is positive rule trace\r | |
1344 | is reported in guess mode.\r | |
1345 | \r | |
1346 | The values of [zz]traceOptionValue and [zz]traceGuessOptionValue\r | |
1347 | can be adjusted by subroutine calls listed below.\r | |
1348 | \r | |
1349 | Depending on the presence or absence of the antlr -gd switch\r | |
1350 | the variable [zz]traceOptionValueDefault is set to 0 or 1. When\r | |
1351 | the parser is initialized or [zz]traceReset() is called the\r | |
1352 | value of [zz]traceOptionValueDefault is copied to [zz]traceOptionValue.\r | |
1353 | The value of [zz]traceGuessOptionValue is always initialzed to 1,\r | |
1354 | but, as noted earlier, nothing will be reported unless\r | |
1355 | [zz]traceOptionValue is also positive.\r | |
1356 | \r | |
1357 | When the parser state is saved/restored the value of the trace\r | |
1358 | variables are also saved/restored. If a restore causes a change in\r | |
1359 | reporting behavior from on to off or vice versa this will be reported.\r | |
1360 | \r | |
1361 | When the -gd option is selected, the macro "#define zzTRACE_RULES"\r | |
1362 | is added to appropriate output files.\r | |
1363 | \r | |
1364 | C++ mode\r | |
1365 | --------\r | |
1366 | int traceOption(int delta)\r | |
1367 | int traceGuessOption(int delta)\r | |
1368 | void traceReset()\r | |
1369 | int traceOptionValueDefault\r | |
1370 | \r | |
1371 | C mode\r | |
1372 | --------\r | |
1373 | int zzTraceOption(int delta)\r | |
1374 | int zzTraceGuessOption(int delta)\r | |
1375 | void zzTraceReset()\r | |
1376 | int zzTraceOptionValueDefault\r | |
1377 | \r | |
1378 | The argument "delta" is added to the traceOptionValue. To\r | |
1379 | turn on trace when inside a particular rule one:\r | |
1380 | \r | |
1381 | rule : <<traceOption(+1);>>\r | |
1382 | (\r | |
1383 | rest-of-rule\r | |
1384 | )\r | |
1385 | <<traceOption(-1);>>\r | |
1386 | ; /* fail clause */ <<traceOption(-1);>>\r | |
1387 | \r | |
1388 | One can use the same idea to turn *off* tracing within a\r | |
1389 | rule by using a delta of (-1).\r | |
1390 | \r | |
1391 | An improvement in the rule trace was suggested by Sramji\r | |
1392 | Ramanathan (ps@kumaran.com).\r | |
1393 | \r | |
1394 | #108. A Note on Deallocation of Variables Allocated in Guess Mode\r | |
1395 | \r | |
1396 | NOTE\r | |
1397 | ------------------------------------------------------\r | |
1398 | This mechanism only works for heap allocated variables\r | |
1399 | ------------------------------------------------------\r | |
1400 | \r | |
1401 | The rewrite of the trace provides the machinery necessary\r | |
1402 | to properly free variables or undo actions following a\r | |
1403 | failed guess.\r | |
1404 | \r | |
1405 | The macro zzUSER_GUESS_HOOK(guessSeq,zzrv) is expanded\r | |
1406 | as part of the zzGUESS macro. When a guess is opened\r | |
1407 | the value of zzrv is 0. When a longjmp() is executed to\r | |
1408 | undo the guess, the value of zzrv will be 1.\r | |
1409 | \r | |
1410 | The macro zzUSER_GUESS_DONE_HOOK(guessSeq) is expanded\r | |
1411 | as part of the zzGUESS_DONE macro. This is executed\r | |
1412 | whether the guess succeeds or fails as part of closing\r | |
1413 | the guess.\r | |
1414 | \r | |
1415 | The guessSeq is a sequence number which is assigned to each\r | |
1416 | guess and is incremented by 1 for each guess which becomes\r | |
1417 | active. It is needed by the user to associate the start of\r | |
1418 | a guess with the failure and/or completion (closing) of a\r | |
1419 | guess.\r | |
1420 | \r | |
1421 | Guesses are nested. They must be closed in the reverse\r | |
1422 | of the order that they are opened.\r | |
1423 | \r | |
1424 | In order to free memory used by a variable during a guess\r | |
1425 | a user must write a routine which can be called to\r | |
1426 | register the variable along with the current guess sequence\r | |
1427 | number provided by the zzUSER_GUESS_HOOK macro. If the guess\r | |
1428 | fails, all variables tagged with the corresponding guess\r | |
1429 | sequence number should be released. This is ugly, but\r | |
1430 | it would require a major rewrite of antlr 1.33 to use\r | |
1431 | some mechanism other than setjmp()/longjmp().\r | |
1432 | \r | |
1433 | The order of calls for a *successful* guess would be:\r | |
1434 | \r | |
1435 | zzUSER_GUESS_HOOK(guessSeq,0);\r | |
1436 | zzUSER_GUESS_DONE_HOOK(guessSeq);\r | |
1437 | \r | |
1438 | The order of calls for a *failed* guess would be:\r | |
1439 | \r | |
1440 | zzUSER_GUESS_HOOK(guessSeq,0);\r | |
1441 | zzUSER_GUESS_HOOK(guessSeq,1);\r | |
1442 | zzUSER_GUESS_DONE_HOOK(guessSeq);\r | |
1443 | \r | |
1444 | The default definitions of these macros are empty strings.\r | |
1445 | \r | |
1446 | Here is an example in C++ mode. The zzUSER_GUESS_HOOK and\r | |
1447 | zzUSER_GUESS_DONE_HOOK macros and myGuessHook() routine\r | |
1448 | can be used without change in both C and C++ versions.\r | |
1449 | \r | |
1450 | ----------------------------------------------------------------------\r | |
1451 | <<\r | |
1452 | \r | |
1453 | #include "AToken.h"\r | |
1454 | \r | |
1455 | typedef ANTLRCommonToken ANTLRToken;\r | |
1456 | \r | |
1457 | #include "DLGLexer.h"\r | |
1458 | \r | |
1459 | int main() {\r | |
1460 | \r | |
1461 | {\r | |
1462 | DLGFileInput in(stdin);\r | |
1463 | DLGLexer lexer(&in,2000);\r | |
1464 | ANTLRTokenBuffer pipe(&lexer,1);\r | |
1465 | ANTLRCommonToken aToken;\r | |
1466 | P parser(&pipe);\r | |
1467 | \r | |
1468 | lexer.setToken(&aToken);\r | |
1469 | parser.init();\r | |
1470 | parser.start();\r | |
1471 | };\r | |
1472 | \r | |
1473 | fclose(stdin);\r | |
1474 | fclose(stdout);\r | |
1475 | return 0;\r | |
1476 | }\r | |
1477 | \r | |
1478 | >>\r | |
1479 | \r | |
1480 | <<\r | |
1481 | char *s=NULL;\r | |
1482 | \r | |
1483 | #undef zzUSER_GUESS_HOOK\r | |
1484 | #define zzUSER_GUESS_HOOK(guessSeq,zzrv) myGuessHook(guessSeq,zzrv);\r | |
1485 | #undef zzUSER_GUESS_DONE_HOOK\r | |
1486 | #define zzUSER_GUESS_DONE_HOOK(guessSeq) myGuessHook(guessSeq,2);\r | |
1487 | \r | |
1488 | void myGuessHook(int guessSeq,int zzrv) {\r | |
1489 | if (zzrv == 0) {\r | |
1490 | fprintf(stderr,"User hook: starting guess #%d\n",guessSeq);\r | |
1491 | } else if (zzrv == 1) {\r | |
1492 | free (s);\r | |
1493 | s=NULL;\r | |
1494 | fprintf(stderr,"User hook: failed guess #%d\n",guessSeq);\r | |
1495 | } else if (zzrv == 2) {\r | |
1496 | free (s);\r | |
1497 | s=NULL;\r | |
1498 | fprintf(stderr,"User hook: ending guess #%d\n",guessSeq);\r | |
1499 | };\r | |
1500 | }\r | |
1501 | \r | |
1502 | >>\r | |
1503 | \r | |
1504 | #token A "a"\r | |
1505 | #token "[\t \ \n]" <<skip();>>\r | |
1506 | \r | |
1507 | class P {\r | |
1508 | \r | |
1509 | start : (top)+\r | |
1510 | ;\r | |
1511 | \r | |
1512 | top : (which) ? <<fprintf(stderr,"%s is a which\n",s); free(s); s=NULL; >>\r | |
1513 | | other <<fprintf(stderr,"%s is an other\n",s); free(s); s=NULL; >>\r | |
1514 | ; <<if (s != NULL) free(s); s=NULL; >>\r | |
1515 | \r | |
1516 | which : which2\r | |
1517 | ;\r | |
1518 | \r | |
1519 | which2 : which3\r | |
1520 | ;\r | |
1521 | which3\r | |
1522 | : (label)? <<fprintf(stderr,"%s is a label\n",s);>>\r | |
1523 | | (global)? <<fprintf(stderr,"%s is a global\n",s);>>\r | |
1524 | | (exclamation)? <<fprintf(stderr,"%s is an exclamation\n",s);>>\r | |
1525 | ;\r | |
1526 | \r | |
1527 | label : <<s=strdup(LT(1)->getText());>> A ":" ;\r | |
1528 | \r | |
1529 | global : <<s=strdup(LT(1)->getText());>> A "::" ;\r | |
1530 | \r | |
1531 | exclamation : <<s=strdup(LT(1)->getText());>> A "!" ;\r | |
1532 | \r | |
1533 | other : <<s=strdup(LT(1)->getText());>> "other" ;\r | |
1534 | \r | |
1535 | }\r | |
1536 | ----------------------------------------------------------------------\r | |
1537 | \r | |
1538 | This is a silly example, but illustrates the idea. For the input\r | |
1539 | "a ::" with tracing enabled the output begins:\r | |
1540 | \r | |
1541 | ----------------------------------------------------------------------\r | |
1542 | enter rule "start" depth 1\r | |
1543 | enter rule "top" depth 2\r | |
1544 | User hook: starting guess #1\r | |
1545 | enter rule "which" depth 3 guessing\r | |
1546 | enter rule "which2" depth 4 guessing\r | |
1547 | enter rule "which3" depth 5 guessing\r | |
1548 | User hook: starting guess #2\r | |
1549 | enter rule "label" depth 6 guessing\r | |
1550 | guess failed\r | |
1551 | User hook: failed guess #2\r | |
1552 | guess done - returning to rule "which3" at depth 5 (guess mode continues\r | |
1553 | - an enclosing guess is still active)\r | |
1554 | User hook: ending guess #2\r | |
1555 | User hook: starting guess #3\r | |
1556 | enter rule "global" depth 6 guessing\r | |
1557 | exit rule "global" depth 6 guessing\r | |
1558 | guess done - returning to rule "which3" at depth 5 (guess mode continues\r | |
1559 | - an enclosing guess is still active)\r | |
1560 | User hook: ending guess #3\r | |
1561 | enter rule "global" depth 6 guessing\r | |
1562 | exit rule "global" depth 6 guessing\r | |
1563 | exit rule "which3" depth 5 guessing\r | |
1564 | exit rule "which2" depth 4 guessing\r | |
1565 | exit rule "which" depth 3 guessing\r | |
1566 | guess done - returning to rule "top" at depth 2 (guess mode ends)\r | |
1567 | User hook: ending guess #1\r | |
1568 | enter rule "which" depth 3\r | |
1569 | .....\r | |
1570 | ----------------------------------------------------------------------\r | |
1571 | \r | |
1572 | Remember:\r | |
1573 | \r | |
1574 | (a) Only init-actions are executed during guess mode.\r | |
1575 | (b) A rule can be invoked multiple times during guess mode.\r | |
1576 | (c) If the guess succeeds the rule will be called once more\r | |
1577 | without guess mode so that normal actions will be executed.\r | |
1578 | This means that the init-action might need to distinguish\r | |
1579 | between guess mode and non-guess mode using the variable\r | |
1580 | [zz]guessing.\r | |
1581 | \r | |
1582 | #101. (Changed in 1.33MR10) antlr -info command line switch\r | |
1583 | \r | |
1584 | -info\r | |
1585 | \r | |
1586 | p - extra predicate information in generated file\r | |
1587 | \r | |
1588 | t - information about tnode use:\r | |
1589 | at the end of each rule in generated file\r | |
1590 | summary on stderr at end of program\r | |
1591 | \r | |
1592 | m - monitor progress\r | |
1593 | prints name of each rule as it is started\r | |
1594 | flushes output at start of each rule\r | |
1595 | \r | |
1596 | f - first/follow set information to stdout\r | |
1597 | \r | |
1598 | 0 - no operation (added in 1.33MR11)\r | |
1599 | \r | |
1600 | The options may be combined and may appear in any order.\r | |
1601 | For example:\r | |
1602 | \r | |
1603 | antlr -info ptm -CC -gt -mrhoist on mygrammar.g\r | |
1604 | \r | |
1605 | #100a. (Changed in 1.33MR10) Predicate tree simplification\r | |
1606 | \r | |
1607 | When the same predicates can be referenced in more than one\r | |
1608 | alternative of a block large predicate trees can be formed.\r | |
1609 | \r | |
1610 | The difference that these optimizations make is so dramatic\r | |
1611 | that I have decided to use it even when -mrhoist is not selected.\r | |
1612 | \r | |
1613 | Consider the following grammar:\r | |
1614 | \r | |
1615 | start : ( all )* ;\r | |
1616 | \r | |
1617 | all : a\r | |
1618 | | d\r | |
1619 | | e\r | |
1620 | | f\r | |
1621 | ;\r | |
1622 | \r | |
1623 | a : c A B\r | |
1624 | | c A C\r | |
1625 | ;\r | |
1626 | \r | |
1627 | c : <<AAA(LATEXT(2))>>?\r | |
1628 | ;\r | |
1629 | \r | |
1630 | d : <<BBB(LATEXT(2))>>? B C\r | |
1631 | ;\r | |
1632 | \r | |
1633 | e : <<CCC(LATEXT(2))>>? B C\r | |
1634 | ;\r | |
1635 | \r | |
1636 | f : e X Y\r | |
1637 | ;\r | |
1638 | \r | |
1639 | In rule "a" there is a reference to rule "c" in both alternatives.\r | |
1640 | The length of the predicate AAA is k=2 and it can be followed in\r | |
1641 | alternative 1 only by (A B) while in alternative 2 it can be\r | |
1642 | followed only by (A C). Thus they do not have identical context.\r | |
1643 | \r | |
1644 | In rule "all" the alternatives which refer to rules "e" and "f" allow\r | |
1645 | elimination of the duplicate reference to predicate CCC.\r | |
1646 | \r | |
1647 | The table below summarized the kind of simplification performed by\r | |
1648 | 1.33MR10. In the table, X and Y stand for single predicates\r | |
1649 | (not trees).\r | |
1650 | \r | |
1651 | (OR X (OR Y (OR Z))) => (OR X Y Z)\r | |
1652 | (AND X (AND Y (AND Z))) => (AND X Y Z)\r | |
1653 | \r | |
1654 | (OR X (... (OR X Y) ... )) => (OR X (... Y ... ))\r | |
1655 | (AND X (... (AND X Y) ... )) => (AND X (... Y ... ))\r | |
1656 | (OR X (... (AND X Y) ... )) => (OR X (... ... ))\r | |
1657 | (AND X (... (OR X Y) ... )) => (AND X (... ... ))\r | |
1658 | \r | |
1659 | (AND X) => X\r | |
1660 | (OR X) => X\r | |
1661 | \r | |
1662 | In a test with a complex grammar for a real application, a predicate\r | |
1663 | tree with six OR nodes and 12 leaves was reduced to "(OR X Y Z)".\r | |
1664 | \r | |
1665 | In 1.33MR10 there is a greater effort to release memory used\r | |
1666 | by predicates once they are no longer in use.\r | |
1667 | \r | |
1668 | #100b. (Changed in 1.33MR10) Suppression of extra predicate tests\r | |
1669 | \r | |
1670 | The following optimizations require that -mrhoist be selected.\r | |
1671 | \r | |
1672 | It is relatively easy to optimize the code generated for predicate\r | |
1673 | gates when they are of the form:\r | |
1674 | \r | |
1675 | (AND X Y Z ...)\r | |
1676 | or (OR X Y Z ...)\r | |
1677 | \r | |
1678 | where X, Y, Z, and "..." represent individual predicates (leaves) not\r | |
1679 | predicate trees.\r | |
1680 | \r | |
1681 | If the predicate is an AND the contexts of the X, Y, Z, etc. are\r | |
1682 | ANDed together to create a single Tree context for the group and\r | |
1683 | context tests for the individual predicates are suppressed:\r | |
1684 | \r | |
1685 | --------------------------------------------------\r | |
1686 | Note: This was incorrect. The contexts should be\r | |
1687 | ORed together. This has been fixed. A more \r | |
1688 | complete description is available in item #152.\r | |
1689 | ---------------------------------------------------\r | |
1690 | \r | |
1691 | Optimization 1: (AND X Y Z ...)\r | |
1692 | \r | |
1693 | Suppose the context for Xtest is LA(1)==LP and the context for\r | |
1694 | Ytest is LA(1)==LP && LA(2)==ID.\r | |
1695 | \r | |
1696 | Without the optimization the code would resemble:\r | |
1697 | \r | |
1698 | if (lookaheadContext &&\r | |
1699 | !(LA(1)==LP && LA(1)==LP && LA(2)==ID) ||\r | |
1700 | ( (! LA(1)==LP || Xtest) &&\r | |
1701 | (! (LA(1)==LP || LA(2)==ID) || Xtest)\r | |
1702 | )) {...\r | |
1703 | \r | |
1704 | With the -mrhoist optimization the code would resemble:\r | |
1705 | \r | |
1706 | if (lookaheadContext &&\r | |
1707 | ! (LA(1)==LP && LA(2)==ID) || (Xtest && Ytest) {...\r | |
1708 | \r | |
1709 | Optimization 2: (OR X Y Z ...) with identical contexts\r | |
1710 | \r | |
1711 | Suppose the context for Xtest is LA(1)==ID and for Ytest\r | |
1712 | the context is also LA(1)==ID.\r | |
1713 | \r | |
1714 | Without the optimization the code would resemble:\r | |
1715 | \r | |
1716 | if (lookaheadContext &&\r | |
1717 | ! (LA(1)==ID || LA(1)==ID) ||\r | |
1718 | (LA(1)==ID && Xtest) ||\r | |
1719 | (LA(1)==ID && Ytest) {...\r | |
1720 | \r | |
1721 | With the -mrhoist optimization the code would resemble:\r | |
1722 | \r | |
1723 | if (lookaheadContext &&\r | |
1724 | (! LA(1)==ID) || (Xtest || Ytest) {...\r | |
1725 | \r | |
1726 | Optimization 3: (OR X Y Z ...) with distinct contexts\r | |
1727 | \r | |
1728 | Suppose the context for Xtest is LA(1)==ID and for Ytest\r | |
1729 | the context is LA(1)==LP.\r | |
1730 | \r | |
1731 | Without the optimization the code would resemble:\r | |
1732 | \r | |
1733 | if (lookaheadContext &&\r | |
1734 | ! (LA(1)==ID || LA(1)==LP) ||\r | |
1735 | (LA(1)==ID && Xtest) ||\r | |
1736 | (LA(1)==LP && Ytest) {...\r | |
1737 | \r | |
1738 | With the -mrhoist optimization the code would resemble:\r | |
1739 | \r | |
1740 | if (lookaheadContext &&\r | |
1741 | (zzpf=0,\r | |
1742 | (LA(1)==ID && (zzpf=1) && Xtest) ||\r | |
1743 | (LA(1)==LP && (zzpf=1) && Ytest) ||\r | |
1744 | !zzpf) {\r | |
1745 | \r | |
1746 | These may appear to be of similar complexity at first,\r | |
1747 | but the non-optimized version contains two tests of each\r | |
1748 | context while the optimized version contains only one\r | |
1749 | such test, as well as eliminating some of the inverted\r | |
1750 | logic (" !(...) || ").\r | |
1751 | \r | |
1752 | Optimization 4: Computation of predicate gate trees\r | |
1753 | \r | |
1754 | When generating code for the gates of predicate expressions\r | |
1755 | antlr 1.33 vanilla uses a recursive procedure to generate\r | |
1756 | "&&" and "||" expressions for testing the lookahead. As each\r | |
1757 | layer of the predicate tree is exposed a new set of "&&" and\r | |
1758 | "||" expressions on the lookahead are generated. In many\r | |
1759 | cases the lookahead being tested has already been tested.\r | |
1760 | \r | |
1761 | With -mrhoist a lookahead tree is computed for the entire\r | |
1762 | lookahead expression. This means that predicates with identical\r | |
1763 | context or context which is a subset of another predicate's\r | |
1764 | context disappear.\r | |
1765 | \r | |
1766 | This is especially important for predicates formed by rules\r | |
1767 | like the following:\r | |
1768 | \r | |
1769 | uppperCaseVowel : <<isUpperCase(LATEXT(1))>>? vowel;\r | |
1770 | vowel: : <<isVowel(LATEXT(1))>>? LETTERS;\r | |
1771 | \r | |
1772 | These predicates are combined using AND since both must be\r | |
1773 | satisfied for rule upperCaseVowel. They have identical\r | |
1774 | context which makes this optimization very effective.\r | |
1775 | \r | |
1776 | The affect of Items #100a and #100b together can be dramatic. In\r | |
1777 | a very large (but real world) grammar one particular predicate\r | |
1778 | expression was reduced from an (unreadable) 50 predicate leaves,\r | |
1779 | 195 LA(1) terms, and 5500 characters to an (easily comprehensible)\r | |
1780 | 3 predicate leaves (all different) and a *single* LA(1) term.\r | |
1781 | \r | |
1782 | #98. (Changed in 1.33MR10) Option "-info p"\r | |
1783 | \r | |
1784 | When the user selects option "-info p" the program will generate\r | |
1785 | detailed information about predicates. If the user selects\r | |
1786 | "-mrhoist on" additional detail will be provided explaining\r | |
1787 | the promotion and suppression of predicates. The output is part\r | |
1788 | of the generated file and sandwiched between #if 0/#endif statements.\r | |
1789 | \r | |
1790 | Consider the following k=1 grammar:\r | |
1791 | \r | |
1792 | start : ( all ) * ;\r | |
1793 | \r | |
1794 | all : ( a\r | |
1795 | | b\r | |
1796 | )\r | |
1797 | ;\r | |
1798 | \r | |
1799 | a : c B\r | |
1800 | ;\r | |
1801 | \r | |
1802 | c : <<LATEXT(1)>>?\r | |
1803 | | B\r | |
1804 | ;\r | |
1805 | \r | |
1806 | b : <<LATEXT(1)>>? X\r | |
1807 | ;\r | |
1808 | \r | |
1809 | Below is an excerpt of the output for rule "start" for the three\r | |
1810 | predicate options (off, on, and maintenance release style hoisting).\r | |
1811 | \r | |
1812 | For those who do not wish to use the "-mrhoist on" option for code\r | |
1813 | generation the option can be used in a "diagnostic" mode to provide\r | |
1814 | valuable information:\r | |
1815 | \r | |
1816 | a. where one should insert null actions to inhibit hoisting\r | |
1817 | b. a chain of rule references which shows where predicates are\r | |
1818 | being hoisted\r | |
1819 | \r | |
1820 | ======================================================================\r | |
1821 | Example of "-info p" with "-mrhoist on"\r | |
1822 | ======================================================================\r | |
1823 | #if 0\r | |
1824 | \r | |
1825 | Hoisting of predicate suppressed by alternative without predicate.\r | |
1826 | The alt without the predicate includes all cases where the\r | |
1827 | predicate is false.\r | |
1828 | \r | |
1829 | WITH predicate: line 11 v36.g\r | |
1830 | WITHOUT predicate: line 12 v36.g\r | |
1831 | \r | |
1832 | The context set for the predicate:\r | |
1833 | \r | |
1834 | B\r | |
1835 | \r | |
1836 | The lookahead set for alt WITHOUT the semantic predicate:\r | |
1837 | \r | |
1838 | B\r | |
1839 | \r | |
1840 | The predicate:\r | |
1841 | \r | |
1842 | pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g\r | |
1843 | \r | |
1844 | set context:\r | |
1845 | B\r | |
1846 | tree context: null\r | |
1847 | \r | |
1848 | Chain of referenced rules:\r | |
1849 | \r | |
1850 | #0 in rule start (line 1 v36.g) to rule all\r | |
1851 | #1 in rule all (line 3 v36.g) to rule a\r | |
1852 | #2 in rule a (line 8 v36.g) to rule c\r | |
1853 | #3 in rule c (line 11 v36.g)\r | |
1854 | \r | |
1855 | #endif\r | |
1856 | &&\r | |
1857 | #if 0\r | |
1858 | \r | |
1859 | pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g\r | |
1860 | \r | |
1861 | set context:\r | |
1862 | X\r | |
1863 | tree context: null\r | |
1864 | \r | |
1865 | #endif\r | |
1866 | ======================================================================\r | |
1867 | Example of "-info p" with the default -prc setting ( "-prc off")\r | |
1868 | ======================================================================\r | |
1869 | #if 0\r | |
1870 | \r | |
1871 | OR\r | |
1872 | pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g\r | |
1873 | \r | |
1874 | set context:\r | |
1875 | nil\r | |
1876 | tree context: null\r | |
1877 | \r | |
1878 | pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g\r | |
1879 | \r | |
1880 | set context:\r | |
1881 | nil\r | |
1882 | tree context: null\r | |
1883 | \r | |
1884 | #endif\r | |
1885 | ======================================================================\r | |
1886 | Example of "-info p" with "-prc on" and "-mrhoist off"\r | |
1887 | ======================================================================\r | |
1888 | #if 0\r | |
1889 | \r | |
1890 | OR\r | |
1891 | pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g\r | |
1892 | \r | |
1893 | set context:\r | |
1894 | B\r | |
1895 | tree context: null\r | |
1896 | \r | |
1897 | pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g\r | |
1898 | \r | |
1899 | set context:\r | |
1900 | X\r | |
1901 | tree context: null\r | |
1902 | \r | |
1903 | #endif\r | |
1904 | ======================================================================\r | |
1905 | \r | |
1906 | #60. (Changed in 1.33MR7) Major changes to exception handling\r | |
1907 | \r | |
1908 | There were significant problems in the handling of exceptions\r | |
1909 | in 1.33 vanilla. The general problem is that it can only\r | |
1910 | process one level of exception handler. For example, a named\r | |
1911 | exception handler, an exception handler for an alternative, or\r | |
1912 | an exception for a subrule always went to the rule's exception\r | |
1913 | handler if there was no "catch" which matched the exception.\r | |
1914 | \r | |
1915 | In 1.33MR7 the exception handlers properly "nest". If an\r | |
1916 | exception handler does not have a matching "catch" then the\r | |
1917 | nextmost outer exception handler is checked for an appropriate\r | |
1918 | "catch" clause, and so on until an exception handler with an\r | |
1919 | appropriate "catch" is found.\r | |
1920 | \r | |
1921 | There are still undesirable features in the way exception\r | |
1922 | handlers are implemented, but I do not have time to fix them\r | |
1923 | at the moment:\r | |
1924 | \r | |
1925 | The exception handlers for alternatives are outside the\r | |
1926 | block containing the alternative. This makes it impossible\r | |
1927 | to access variables declared in a block or to resume the\r | |
1928 | parse by "falling through". The parse can still be easily\r | |
1929 | resumed in other ways, but not in the most natural fashion.\r | |
1930 | \r | |
1931 | This results in an inconsistentcy between named exception\r | |
1932 | handlers and exception handlers for alternatives. When\r | |
1933 | an exception handler for an alternative "falls through"\r | |
1934 | it goes to the nextmost outer handler - not the "normal\r | |
1935 | action".\r | |
1936 | \r | |
1937 | A major difference between 1.33MR7 and 1.33 vanilla is\r | |
1938 | the default action after an exception is caught:\r | |
1939 | \r | |
1940 | 1.33 Vanilla\r | |
1941 | ------------\r | |
1942 | In 1.33 vanilla the signal value is set to zero ("NoSignal")\r | |
1943 | and the code drops through to the code following the exception.\r | |
1944 | For named exception handlers this is the "normal action".\r | |
1945 | For alternative exception handlers this is the rule's handler.\r | |
1946 | \r | |
1947 | 1.33MR7\r | |
1948 | -------\r | |
1949 | In 1.33MR7 the signal value is NOT automatically set to zero.\r | |
1950 | \r | |
1951 | There are two cases:\r | |
1952 | \r | |
1953 | For named exception handlers: if the signal value has been\r | |
1954 | set to zero the code drops through to the "normal action".\r | |
1955 | \r | |
1956 | For all other cases the code branches to the nextmost outer\r | |
1957 | exception handler until it reaches the handler for the rule.\r | |
1958 | \r | |
1959 | The following macros have been defined for convenience:\r | |
1960 | \r | |
1961 | C/C++ Mode Name\r | |
1962 | --------------------\r | |
1963 | (zz)suppressSignal\r | |
1964 | set signal & return signal arg to 0 ("NoSignal")\r | |
1965 | (zz)setSignal(intValue)\r | |
1966 | set signal & return signal arg to some value\r | |
1967 | (zz)exportSignal\r | |
1968 | copy the signal value to the return signal arg\r | |
1969 | \r | |
1970 | I'm not sure why PCCTS make a distinction between the local\r | |
1971 | signal value and the return signal argument, but I'm loathe\r | |
1972 | to change the code. The burden of copying the local signal\r | |
1973 | value to the return signal argument can be given to the\r | |
1974 | default signal handler, I suppose.\r | |
1975 | \r | |
1976 | #53. (Explanation for 1.33MR6) What happens after an exception is caught ?\r | |
1977 | \r | |
1978 | The Book is silent about what happens after an exception\r | |
1979 | is caught.\r | |
1980 | \r | |
1981 | The following code fragment prints "Error Action" followed\r | |
1982 | by "Normal Action".\r | |
1983 | \r | |
1984 | test : Word ex:Number <<printf("Normal Action\n");>>\r | |
1985 | exception[ex]\r | |
1986 | catch NoViableAlt:\r | |
1987 | <<printf("Error Action\n");>>\r | |
1988 | ;\r | |
1989 | \r | |
1990 | The reason for "Normal Action" is that the normal flow of the\r | |
1991 | program after a user-written exception handler is to "drop through".\r | |
1992 | In the case of an exception handler for a rule this results in\r | |
1993 | the exection of a "return" statement. In the case of an\r | |
1994 | exception handler attached to an alternative, rule, or token\r | |
1995 | this is the code that would have executed had there been no\r | |
1996 | exception.\r | |
1997 | \r | |
1998 | The user can achieve the desired result by using a "return"\r | |
1999 | statement.\r | |
2000 | \r | |
2001 | test : Word ex:Number <<printf("Normal Action\n");>>\r | |
2002 | exception[ex]\r | |
2003 | catch NoViableAlt:\r | |
2004 | <<printf("Error Action\n"); return;>>\r | |
2005 | ;\r | |
2006 | \r | |
2007 | The most powerful mechanism for recovery from parse errors\r | |
2008 | in pccts is syntactic predicates because they provide\r | |
2009 | backtracking. Exceptions allow "return", "break",\r | |
2010 | "consumeUntil(...)", "goto _handler", "goto _fail", and\r | |
2011 | changing the _signal value.\r | |
2012 | \r | |
2013 | #41. (Added in 1.33MR6) antlr -stdout\r | |
2014 | \r | |
2015 | Using "antlr -stdout ..." forces the text that would\r | |
2016 | normally go to the grammar.c or grammar.cpp file to\r | |
2017 | stdout.\r | |
2018 | \r | |
2019 | #40. (Added in 1.33MR6) antlr -tab to change tab stops\r | |
2020 | \r | |
2021 | Using "antlr -tab number ..." changes the tab stops\r | |
2022 | for the grammar.c or grammar.cpp file. The number\r | |
2023 | must be between 0 and 8. Using 0 gives tab characters,\r | |
2024 | values between 1 and 8 give the appropriate number of\r | |
2025 | space characters.\r | |
2026 | \r | |
2027 | #34. (Added to 1.33MR1) Add public DLGLexerBase::set_line(int newValue)\r | |
2028 | \r | |
2029 | Previously there was no public function for changing the line\r | |
2030 | number maintained by the lexer.\r | |
2031 | \r | |
2032 | #28. (Added to 1.33MR1) More control over DLG header\r | |
2033 | \r | |
2034 | Version 1.33MR1 adds the following directives to PCCTS\r | |
2035 | for C++ mode:\r | |
2036 | \r | |
2037 | #lexprefix <<source code>>\r | |
2038 | \r | |
2039 | Adds source code to the DLGLexer.h file\r | |
2040 | after the #include "DLexerBase.h" but\r | |
2041 | before the start of the class definition.\r | |
2042 | \r | |
2043 | #lexmember <<source code>>\r | |
2044 | \r | |
2045 | Adds source code to the DLGLexer.h file\r | |
2046 | as part of the DLGLexer class body. It\r | |
2047 | appears immediately after the start of\r | |
2048 | the class and a "public: statement.\r | |
2049 | \r |