]> git.proxmox.com Git - mirror_edk2.git/blame - EdkCompatibilityPkg/Other/Maintained/Tools/Pccts/CHANGES_SUMMARY.txt
Cleanup to remover CommonHeader.h files
[mirror_edk2.git] / EdkCompatibilityPkg / Other / Maintained / Tools / Pccts / CHANGES_SUMMARY.txt
CommitLineData
3eb9473e 1======================================================================\r
2\r
3 CHANGES_SUMMARY.TXT\r
4\r
5 A QUICK overview of changes from 1.33 in reverse order\r
6\r
7 A summary of additions rather than bug fixes and minor code changes.\r
8\r
9 Numbers refer to items in CHANGES_FROM_133*.TXT\r
10 which may contain additional information.\r
11\r
12 DISCLAIMER\r
13\r
14 The software and these notes are provided "as is". They may include\r
15 typographical or technical errors and their authors disclaims all\r
16 liability of any kind or nature for damages due to error, fault,\r
17 defect, or deficiency regardless of cause. All warranties of any\r
18 kind, either express or implied, including, but not limited to, the\r
19 implied warranties of merchantability and fitness for a particular\r
20 purpose are disclaimed.\r
21\r
22======================================================================\r
23\r
24#258. You can specify a user-defined base class for your parser\r
25\r
26 The base class must constructor must have a signature similar to\r
27 that of ANTLRParser.\r
28\r
29#253. Generation of block preamble (-preamble and -preamble_first)\r
30\r
31 The antlr option -preamble causes antlr to insert the code\r
32 BLOCK_PREAMBLE at the start of each rule and block.\r
33\r
34 The antlr option -preamble_first is similar, but inserts the\r
35 code BLOCK_PREAMBLE_FIRST(PreambleFirst_123) where the symbol\r
36 PreambleFirst_123 is equivalent to the first set defined by\r
37 the #FirstSetSymbol described in Item #248.\r
38\r
39#248. Generate symbol for first set of an alternative\r
40\r
41 rr : #FirstSetSymbol(rr_FirstSet) ( Foo | Bar ) ;\r
42\r
43#216. Defer token fetch for C++ mode\r
44\r
45 When the ANTLRParser class is built with the pre-processor option \r
46 ZZDEFER_FETCH defined, the fetch of new tokens by consume() is deferred\r
47 until LA(i) or LT(i) is called. \r
48\r
49#215. Use reset() to reset DLGLexerBase\r
50#188. Added pccts/h/DLG_stream_input.h\r
51#180. Added ANTLRParser::getEofToken()\r
52#173. -glms for Microsoft style filenames with -gl\r
53#170. Suppression for predicates with lookahead depth >1\r
54\r
55 Consider the following grammar with -ck 2 and the predicate in rule\r
56 "a" with depth 2:\r
57\r
58 r1 : (ab)* "@"\r
59 ;\r
60\r
61 ab : a\r
62 | b\r
63 ;\r
64\r
65 a : (A B)? => <<p(LATEXT(2))>>? A B C\r
66 ;\r
67\r
68 b : A B C\r
69 ;\r
70\r
71 Normally, the predicate would be hoisted into rule r1 in order to\r
72 determine whether to call rule "ab". However it should *not* be\r
73 hoisted because, even if p is false, there is a valid alternative\r
74 in rule b. With "-mrhoistk on" the predicate will be suppressed.\r
75\r
76 If "-info p" command line option is present the following information\r
77 will appear in the generated code:\r
78\r
79 while ( (LA(1)==A)\r
80 #if 0\r
81\r
82 Part (or all) of predicate with depth > 1 suppressed by alternative\r
83 without predicate\r
84\r
85 pred << p(LATEXT(2))>>?\r
86 depth=k=2 ("=>" guard) rule a line 8 t1.g\r
87 tree context:\r
88 (root = A\r
89 B\r
90 )\r
91\r
92 The token sequence which is suppressed: ( A B )\r
93 The sequence of references which generate that sequence of tokens:\r
94\r
95 1 to ab r1/1 line 1 t1.g\r
96 2 ab ab/1 line 4 t1.g\r
97 3 to b ab/2 line 5 t1.g\r
98 4 b b/1 line 11 t1.g\r
99 5 #token A b/1 line 11 t1.g\r
100 6 #token B b/1 line 11 t1.g\r
101\r
102 #endif\r
103\r
104 A slightly more complicated example:\r
105\r
106 r1 : (ab)* "@"\r
107 ;\r
108\r
109 ab : a\r
110 | b\r
111 ;\r
112\r
113 a : (A B)? => <<p(LATEXT(2))>>? (A B | D E)\r
114 ;\r
115\r
116 b : <<q(LATEXT(2))>>? D E\r
117 ;\r
118\r
119\r
120 In this case, the sequence (D E) in rule "a" which lies behind\r
121 the guard is used to suppress the predicate with context (D E)\r
122 in rule b.\r
123\r
124 while ( (LA(1)==A || LA(1)==D)\r
125 #if 0\r
126\r
127 Part (or all) of predicate with depth > 1 suppressed by alternative\r
128 without predicate\r
129\r
130 pred << q(LATEXT(2))>>?\r
131 depth=k=2 rule b line 11 t2.g\r
132 tree context:\r
133 (root = D\r
134 E\r
135 )\r
136\r
137 The token sequence which is suppressed: ( D E )\r
138 The sequence of references which generate that sequence of tokens:\r
139\r
140 1 to ab r1/1 line 1 t2.g\r
141 2 ab ab/1 line 4 t2.g\r
142 3 to a ab/1 line 4 t2.g\r
143 4 a a/1 line 8 t2.g\r
144 5 #token D a/1 line 8 t2.g\r
145 6 #token E a/1 line 8 t2.g\r
146\r
147 #endif\r
148 &&\r
149 #if 0\r
150\r
151 pred << p(LATEXT(2))>>?\r
152 depth=k=2 ("=>" guard) rule a line 8 t2.g\r
153 tree context:\r
154 (root = A\r
155 B\r
156 )\r
157\r
158 #endif\r
159\r
160 (! ( LA(1)==A && LA(2)==B ) || p(LATEXT(2)) ) {\r
161 ab();\r
162 ...\r
163\r
164#165. (Changed in MR13) option -newAST\r
165\r
166 To create ASTs from an ANTLRTokenPtr antlr usually calls\r
167 "new AST(ANTLRTokenPtr)". This option generates a call\r
168 to "newAST(ANTLRTokenPtr)" instead. This allows a user\r
169 to define a parser member function to create an AST object.\r
170\r
171#161. (Changed in MR13) Switch -gxt inhibits generation of tokens.h\r
172\r
173#158. (Changed in MR13) #header causes problem for pre-processors\r
174\r
175 A user who runs the C pre-processor on antlr source suggested\r
176 that another syntax be allowed. With MR13 such directives\r
177 such as #header, #pragma, etc. may be written as "\#header",\r
178 "\#pragma", etc. For escaping pre-processor directives inside\r
179 a #header use something like the following:\r
180\r
181 \#header\r
182 <<\r
183 \#include <stdio.h>\r
184 >>\r
185\r
186#155. (Changed in MR13) Context behind predicates can suppress\r
187\r
188 With -mrhoist enabled the context behind a guarded predicate can\r
189 be used to suppress other predicates. Consider the following grammar:\r
190\r
191 r0 : (r1)+;\r
192\r
193 r1 : rp\r
194 | rq\r
195 ;\r
196 rp : <<p LATEXT(1)>>? B ;\r
197 rq : (A)? => <<q LATEXT(1)>>? (A|B);\r
198\r
199 In earlier versions both predicates "p" and "q" would be hoisted into\r
200 rule r0. With MR12c predicate p is suppressed because the context which\r
201 follows predicate q includes "B" which can "cover" predicate "p". In\r
202 other words, in trying to decide in r0 whether to call r1, it doesn't\r
203 really matter whether p is false or true because, either way, there is\r
204 a valid choice within r1.\r
205\r
206#154. (Changed in MR13) Making hoist suppression explicit using <<nohoist>>\r
207\r
208 A common error, even among experienced pccts users, is to code\r
209 an init-action to inhibit hoisting rather than a leading action.\r
210 An init-action does not inhibit hoisting.\r
211\r
212 This was coded:\r
213\r
214 rule1 : <<;>> rule2\r
215\r
216 This is what was meant:\r
217\r
218 rule1 : <<;>> <<;>> rule2\r
219\r
220 With MR13, the user can code:\r
221\r
222 rule1 : <<;>> <<nohoist>> rule2\r
223\r
224 The following will give an error message:\r
225\r
226 rule1 : <<nohoist>> rule2\r
227\r
228 If the <<nohoist>> appears as an init-action rather than a leading\r
229 action an error message is issued. The meaning of an init-action\r
230 containing "nohoist" is unclear: does it apply to just one\r
231 alternative or to all alternatives ?\r
232\r
233#151a. Addition of ANTLRParser::getLexer(), ANTLRTokenStream::getLexer()\r
234\r
235 You must manually cast the ANTLRTokenStream to your program's\r
236 lexer class. Because the name of the lexer's class is not fixed.\r
237 Thus it is impossible to incorporate it into the DLGLexerBase\r
238 class.\r
239\r
240#151b.(Changed in MR12) ParserBlackBox member getLexer()\r
241\r
242#150. (Changed in MR12) syntaxErrCount and lexErrCount now public\r
243\r
244#149. (Changed in MR12) antlr option -info o (letter o for orphan)\r
245\r
246 If there is more than one rule which is not referenced by any\r
247 other rule then all such rules are listed. This is useful for\r
248 alerting one to rules which are not used, but which can still\r
249 contribute to ambiguity.\r
250\r
251#148. (Changed in MR11) #token names appearing in zztokens,token_tbl\r
252\r
253 One can write:\r
254\r
255 #token Plus ("+") "\+"\r
256 #token RP ("(") "\("\r
257 #token COM ("comment begin") "/\*"\r
258\r
259 The string in parenthesis will be used in syntax error messages.\r
260\r
261#146. (Changed in MR11) Option -treport for locating "difficult" alts\r
262\r
263 It can be difficult to determine which alternatives are causing\r
264 pccts to work hard to resolve an ambiguity. In some cases the\r
265 ambiguity is successfully resolved after much CPU time so there\r
266 is no message at all.\r
267\r
268 A rough measure of the amount of work being peformed which is\r
269 independent of the CPU speed and system load is the number of\r
270 tnodes created. Using "-info t" gives information about the\r
271 total number of tnodes created and the peak number of tnodes.\r
272\r
273 Tree Nodes: peak 1300k created 1416k lost 0\r
274\r
275 It also puts in the generated C or C++ file the number of tnodes\r
276 created for a rule (at the end of the rule). However this\r
277 information is not sufficient to locate the alternatives within\r
278 a rule which are causing the creation of tnodes.\r
279\r
280 Using:\r
281\r
282 antlr -treport 100000 ....\r
283\r
284 causes antlr to list on stdout any alternatives which require the\r
285 creation of more than 100,000 tnodes, along with the lookahead sets\r
286 for those alternatives.\r
287\r
288 The following is a trivial case from the ansi.g grammar which shows\r
289 the format of the report. This report might be of more interest\r
290 in cases where 1,000,000 tuples were created to resolve the ambiguity.\r
291\r
292 -------------------------------------------------------------------------\r
293 There were 0 tuples whose ambiguity could not be resolved\r
294 by full lookahead\r
295 There were 157 tnodes created to resolve ambiguity between:\r
296\r
297 Choice 1: statement/2 line 475 file ansi.g\r
298 Choice 2: statement/3 line 476 file ansi.g\r
299\r
300 Intersection of lookahead[1] sets:\r
301\r
302 IDENTIFIER\r
303\r
304 Intersection of lookahead[2] sets:\r
305\r
306 LPARENTHESIS COLON AMPERSAND MINUS\r
307 STAR PLUSPLUS MINUSMINUS ONESCOMPLEMENT\r
308 NOT SIZEOF OCTALINT DECIMALINT\r
309 HEXADECIMALINT FLOATONE FLOATTWO IDENTIFIER\r
310 STRING CHARACTER\r
311 -------------------------------------------------------------------------\r
312\r
313#143. (Changed in MR11) Optional ";" at end of #token statement\r
314\r
315 Fixes problem of:\r
316\r
317 #token X "x"\r
318\r
319 <<\r
320 parser action\r
321 >>\r
322\r
323 Being confused with:\r
324\r
325 #token X "x" <<lexical action>>\r
326\r
327#142. (Changed in MR11) class BufFileInput subclass of DLGInputStream\r
328\r
329 Alexey Demakov (demakov@kazbek.ispras.ru) has supplied class\r
330 BufFileInput derived from DLGInputStream which provides a\r
331 function lookahead(char *string) to test characters in the\r
332 input stream more than one character ahead.\r
333 The class is located in pccts/h/BufFileInput.* of the kit.\r
334\r
335#140. #pred to define predicates\r
336\r
337 +---------------------------------------------------+\r
338 | Note: Assume "-prc on" for this entire discussion |\r
339 +---------------------------------------------------+\r
340\r
341 A problem with predicates is that each one is regarded as\r
342 unique and capable of disambiguating cases where two\r
343 alternatives have identical lookahead. For example:\r
344\r
345 rule : <<pred(LATEXT(1))>>? A\r
346 | <<pred(LATEXT(1))>>? A\r
347 ;\r
348\r
349 will not cause any error messages or warnings to be issued\r
350 by earlier versions of pccts. To compare the text of the\r
351 predicates is an incomplete solution.\r
352\r
353 In 1.33MR11 I am introducing the #pred statement in order to\r
354 solve some problems with predicates. The #pred statement allows\r
355 one to give a symbolic name to a "predicate literal" or a\r
356 "predicate expression" in order to refer to it in other predicate\r
357 expressions or in the rules of the grammar.\r
358\r
359 The predicate literal associated with a predicate symbol is C\r
360 or C++ code which can be used to test the condition. A\r
361 predicate expression defines a predicate symbol in terms of other\r
362 predicate symbols using "!", "&&", and "||". A predicate symbol\r
363 can be defined in terms of a predicate literal, a predicate\r
364 expression, or *both*.\r
365\r
366 When a predicate symbol is defined with both a predicate literal\r
367 and a predicate expression, the predicate literal is used to generate\r
368 code, but the predicate expression is used to check for two\r
369 alternatives with identical predicates in both alternatives.\r
370\r
371 Here are some examples of #pred statements:\r
372\r
373 #pred IsLabel <<isLabel(LATEXT(1))>>?\r
374 #pred IsLocalVar <<isLocalVar(LATEXT(1))>>?\r
375 #pred IsGlobalVar <<isGlobalVar(LATEXT(1)>>?\r
376 #pred IsVar <<isVar(LATEXT(1))>>? IsLocalVar || IsGlobalVar\r
377 #pred IsScoped <<isScoped(LATEXT(1))>>? IsLabel || IsLocalVar\r
378\r
379 I hope that the use of EBNF notation to describe the syntax of the\r
380 #pred statement will not cause problems for my readers (joke).\r
381\r
382 predStatement : "#pred"\r
383 CapitalizedName\r
384 (\r
385 "<<predicate_literal>>?"\r
386 | "<<predicate_literal>>?" predOrExpr\r
387 | predOrExpr\r
388 )\r
389 ;\r
390\r
391 predOrExpr : predAndExpr ( "||" predAndExpr ) * ;\r
392\r
393 predAndExpr : predPrimary ( "&&" predPrimary ) * ;\r
394\r
395 predPrimary : CapitalizedName\r
396 | "!" predPrimary\r
397 | "(" predOrExpr ")"\r
398 ;\r
399\r
400 What is the purpose of this nonsense ?\r
401\r
402 To understand how predicate symbols help, you need to realize that\r
403 predicate symbols are used in two different ways with two different\r
404 goals.\r
405\r
406 a. Allow simplification of predicates which have been combined\r
407 during predicate hoisting.\r
408\r
409 b. Allow recognition of identical predicates which can't disambiguate\r
410 alternatives with common lookahead.\r
411\r
412 First we will discuss goal (a). Consider the following rule:\r
413\r
414 rule0: rule1\r
415 | ID\r
416 | ...\r
417 ;\r
418\r
419 rule1: rule2\r
420 | rule3\r
421 ;\r
422\r
423 rule2: <<isX(LATEXT(1))>>? ID ;\r
424 rule3: <<!isX(LATEXT(1)>>? ID ;\r
425\r
426 When the predicates in rule2 and rule3 are combined by hoisting\r
427 to create a prediction expression for rule1 the result is:\r
428\r
429 if ( LA(1)==ID\r
430 && ( isX(LATEXT(1) || !isX(LATEXT(1) ) ) { rule1(); ...\r
431\r
432 This is inefficient, but more importantly, can lead to false\r
433 assumptions that the predicate expression distinguishes the rule1\r
434 alternative with some other alternative with lookahead ID. In\r
435 MR11 one can write:\r
436\r
437 #pred IsX <<isX(LATEXT(1))>>?\r
438\r
439 ...\r
440\r
441 rule2: <<IsX>>? ID ;\r
442 rule3: <<!IsX>>? ID ;\r
443\r
444 During hoisting MR11 recognizes this as a special case and\r
445 eliminates the predicates. The result is a prediction\r
446 expression like the following:\r
447\r
448 if ( LA(1)==ID ) { rule1(); ...\r
449\r
450 Please note that the following cases which appear to be equivalent\r
451 *cannot* be simplified by MR11 during hoisting because the hoisting\r
452 logic only checks for a "!" in the predicate action, not in the\r
453 predicate expression for a predicate symbol.\r
454\r
455 *Not* equivalent and is not simplified during hoisting:\r
456\r
457 #pred IsX <<isX(LATEXT(1))>>?\r
458 #pred NotX <<!isX(LATEXT(1))>>?\r
459 ...\r
460 rule2: <<IsX>>? ID ;\r
461 rule3: <<NotX>>? ID ;\r
462\r
463 *Not* equivalent and is not simplified during hoisting:\r
464\r
465 #pred IsX <<isX(LATEXT(1))>>?\r
466 #pred NotX !IsX\r
467 ...\r
468 rule2: <<IsX>>? ID ;\r
469 rule3: <<NotX>>? ID ;\r
470\r
471 Now we will discuss goal (b).\r
472\r
473 When antlr discovers that there is a lookahead ambiguity between\r
474 two alternatives it attempts to resolve the ambiguity by searching\r
475 for predicates in both alternatives. In the past any predicate\r
476 would do, even if the same one appeared in both alternatives:\r
477\r
478 rule: <<p(LATEXT(1))>>? X\r
479 | <<p(LATEXT(1))>>? X\r
480 ;\r
481\r
482 The #pred statement is a start towards solving this problem.\r
483 During ambiguity resolution (*not* predicate hoisting) the\r
484 predicates for the two alternatives are expanded and compared.\r
485 Consider the following example:\r
486\r
487 #pred Upper <<isUpper(LATEXT(1))>>?\r
488 #pred Lower <<isLower(LATEXT(1))>>?\r
489 #pred Alpha <<isAlpha(LATEXT(1))>>? Upper || Lower\r
490\r
491 rule0: rule1\r
492 | <<Alpha>>? ID\r
493 ;\r
494\r
495 rule1:\r
496 | rule2\r
497 | rule3\r
498 ...\r
499 ;\r
500\r
501 rule2: <<Upper>>? ID;\r
502 rule3: <<Lower>>? ID;\r
503\r
504 The definition of #pred Alpha expresses:\r
505\r
506 a. to test the predicate use the C code "isAlpha(LATEXT(1))"\r
507\r
508 b. to analyze the predicate use the information that\r
509 Alpha is equivalent to the union of Upper and Lower,\r
510\r
511 During ambiguity resolution the definition of Alpha is expanded\r
512 into "Upper || Lower" and compared with the predicate in the other\r
513 alternative, which is also "Upper || Lower". Because they are\r
514 identical MR11 will report a problem.\r
515\r
516 -------------------------------------------------------------------------\r
517 t10.g, line 5: warning: the predicates used to disambiguate rule rule0\r
518 (file t10.g alt 1 line 5 and alt 2 line 6)\r
519 are identical when compared without context and may have no\r
520 resolving power for some lookahead sequences.\r
521 -------------------------------------------------------------------------\r
522\r
523 If you use the "-info p" option the output file will contain:\r
524\r
525 +----------------------------------------------------------------------+\r
526 |#if 0 |\r
527 | |\r
528 |The following predicates are identical when compared without |\r
529 | lookahead context information. For some ambiguous lookahead |\r
530 | sequences they may not have any power to resolve the ambiguity. |\r
531 | |\r
532 |Choice 1: rule0/1 alt 1 line 5 file t10.g |\r
533 | |\r
534 | The original predicate for choice 1 with available context |\r
535 | information: |\r
536 | |\r
537 | OR expr |\r
538 | |\r
539 | pred << Upper>>? |\r
540 | depth=k=1 rule rule2 line 14 t10.g |\r
541 | set context: |\r
542 | ID |\r
543 | |\r
544 | pred << Lower>>? |\r
545 | depth=k=1 rule rule3 line 15 t10.g |\r
546 | set context: |\r
547 | ID |\r
548 | |\r
549 | The predicate for choice 1 after expansion (but without context |\r
550 | information): |\r
551 | |\r
552 | OR expr |\r
553 | |\r
554 | pred << isUpper(LATEXT(1))>>? |\r
555 | depth=k=1 rule line 1 t10.g |\r
556 | |\r
557 | pred << isLower(LATEXT(1))>>? |\r
558 | depth=k=1 rule line 2 t10.g |\r
559 | |\r
560 | |\r
561 |Choice 2: rule0/2 alt 2 line 6 file t10.g |\r
562 | |\r
563 | The original predicate for choice 2 with available context |\r
564 | information: |\r
565 | |\r
566 | pred << Alpha>>? |\r
567 | depth=k=1 rule rule0 line 6 t10.g |\r
568 | set context: |\r
569 | ID |\r
570 | |\r
571 | The predicate for choice 2 after expansion (but without context |\r
572 | information): |\r
573 | |\r
574 | OR expr |\r
575 | |\r
576 | pred << isUpper(LATEXT(1))>>? |\r
577 | depth=k=1 rule line 1 t10.g |\r
578 | |\r
579 | pred << isLower(LATEXT(1))>>? |\r
580 | depth=k=1 rule line 2 t10.g |\r
581 | |\r
582 | |\r
583 |#endif |\r
584 +----------------------------------------------------------------------+\r
585\r
586 The comparison of the predicates for the two alternatives takes\r
587 place without context information, which means that in some cases\r
588 the predicates will be considered identical even though they operate\r
589 on disjoint lookahead sets. Consider:\r
590\r
591 #pred Alpha\r
592\r
593 rule1: <<Alpha>>? ID\r
594 | <<Alpha>>? Label\r
595 ;\r
596\r
597 Because the comparison of predicates takes place without context\r
598 these will be considered identical. The reason for comparing\r
599 without context is that otherwise it would be necessary to re-evaluate\r
600 the entire predicate expression for each possible lookahead sequence.\r
601 This would require more code to be written and more CPU time during\r
602 grammar analysis, and it is not yet clear whether anyone will even make\r
603 use of the new #pred facility.\r
604\r
605 A temporary workaround might be to use different #pred statements\r
606 for predicates you know have different context. This would avoid\r
607 extraneous warnings.\r
608\r
609 The above example might be termed a "false positive". Comparison\r
610 without context will also lead to "false negatives". Consider the\r
611 following example:\r
612\r
613 #pred Alpha\r
614 #pred Beta\r
615\r
616 rule1: <<Alpha>>? A\r
617 | rule2\r
618 ;\r
619\r
620 rule2: <<Alpha>>? A\r
621 | <<Beta>>? B\r
622 ;\r
623\r
624 The predicate used for alt 2 of rule1 is (Alpha || Beta). This\r
625 appears to be different than the predicate Alpha used for alt1.\r
626 However, the context of Beta is B. Thus when the lookahead is A\r
627 Beta will have no resolving power and Alpha will be used for both\r
628 alternatives. Using the same predicate for both alternatives isn't\r
629 very helpful, but this will not be detected with 1.33MR11.\r
630\r
631 To properly handle this the predicate expression would have to be\r
632 evaluated for each distinct lookahead context.\r
633\r
634 To determine whether two predicate expressions are identical is\r
635 difficult. The routine may fail to identify identical predicates.\r
636\r
637 The #pred feature also compares predicates to see if a choice between\r
638 alternatives which is resolved by a predicate which makes the second\r
639 choice unreachable. Consider the following example:\r
640\r
641 #pred A <<A(LATEXT(1)>>?\r
642 #pred B <<B(LATEXT(1)>>?\r
643 #pred A_or_B A || B\r
644\r
645 r : s\r
646 | t\r
647 ;\r
648 s : <<A_or_B>>? ID\r
649 ;\r
650 t : <<A>>? ID\r
651 ;\r
652\r
653 ----------------------------------------------------------------------------\r
654 t11.g, line 5: warning: the predicate used to disambiguate the\r
655 first choice of rule r\r
656 (file t11.g alt 1 line 5 and alt 2 line 6)\r
657 appears to "cover" the second predicate when compared without context.\r
658 The second predicate may have no resolving power for some lookahead\r
659 sequences.\r
660 ----------------------------------------------------------------------------\r
661\r
662#132. (Changed in 1.33MR11) Recognition of identical predicates in alts\r
663\r
664 Prior to 1.33MR11, there would be no ambiguity warning when the\r
665 very same predicate was used to disambiguate both alternatives:\r
666\r
667 test: ref B\r
668 | ref C\r
669 ;\r
670\r
671 ref : <<pred(LATEXT(1)>>? A\r
672\r
673 In 1.33MR11 this will cause the warning:\r
674\r
675 warning: the predicates used to disambiguate rule test\r
676 (file v98.g alt 1 line 1 and alt 2 line 2)\r
677 are identical and have no resolving power\r
678\r
679 ----------------- Note -----------------\r
680\r
681 This is different than the following case\r
682\r
683 test: <<pred(LATEXT(1))>>? A B\r
684 | <<pred(LATEXT(1)>>? A C\r
685 ;\r
686\r
687 In this case there are two distinct predicates\r
688 which have exactly the same text. In the first\r
689 example there are two references to the same\r
690 predicate. The problem represented by this\r
691 grammar will be addressed later.\r
692\r
693\r
694#127. (Changed in 1.33MR11)\r
695\r
696 Count Syntax Errors Count DLG Errors\r
697 ------------------- ----------------\r
698\r
699 C++ mode ANTLRParser:: DLGLexerBase::\r
700 syntaxErrCount lexErrCount\r
701 C mode zzSyntaxErrCount zzLexErrCount\r
702\r
703 The C mode variables are global and initialized to 0.\r
704 They are *not* reset to 0 automatically when antlr is\r
705 restarted.\r
706\r
707 The C++ mode variables are public. They are initialized\r
708 to 0 by the constructors. They are *not* reset to 0 by the\r
709 ANTLRParser::init() method.\r
710\r
711 Suggested by Reinier van den Born (reinier@vnet.ibm.com).\r
712\r
713#126. (Changed in 1.33MR11) Addition of #first <<...>>\r
714\r
715 The #first <<...>> inserts the specified text in the output\r
716 files before any other #include statements required by pccts.\r
717 The only things before the #first text are comments and\r
718 a #define ANTLR_VERSION.\r
719\r
720 Requested by and Esa Pulkkinen (esap@cs.tut.fi) and Alexin\r
721 Zoltan (alexin@inf.u-szeged.hu).\r
722\r
723#124. A Note on the New "&&" Style Guarded Predicates\r
724\r
725 I've been asked several times, "What is the difference between\r
726 the old "=>" style guard predicates and the new style "&&" guard\r
727 predicates, and how do you choose one over the other" ?\r
728\r
729 The main difference is that the "=>" does not apply the\r
730 predicate if the context guard doesn't match, whereas\r
731 the && form always does. What is the significance ?\r
732\r
733 If you have a predicate which is not on the "leading edge"\r
734 it is cannot be hoisted. Suppose you need a predicate that\r
735 looks at LA(2). You must introduce it manually. The\r
736 classic example is:\r
737\r
738 castExpr :\r
739 LP typeName RP\r
740 | ....\r
741 ;\r
742\r
743 typeName : <<isTypeName(LATEXT(1))>>? ID\r
744 | STRUCT ID\r
745 ;\r
746\r
747 The problem is that isTypeName() isn't on the leading edge\r
748 of typeName, so it won't be hoisted into castExpr to help\r
749 make a decision on which production to choose.\r
750\r
751 The *first* attempt to fix it is this:\r
752\r
753 castExpr :\r
754 <<isTypeName(LATEXT(2))>>?\r
755 LP typeName RP\r
756 | ....\r
757 ;\r
758\r
759 Unfortunately, this won't work because it ignores\r
760 the problem of STRUCT. The solution is to apply\r
761 isTypeName() in castExpr if LA(2) is an ID and\r
762 don't apply it when LA(2) is STRUCT:\r
763\r
764 castExpr :\r
765 (LP ID)? => <<isTypeName(LATEXT(2))>>?\r
766 LP typeName RP\r
767 | ....\r
768 ;\r
769\r
770 In conclusion, the "=>" style guarded predicate is\r
771 useful when:\r
772\r
773 a. the tokens required for the predicate\r
774 are not on the leading edge\r
775 b. there are alternatives in the expression\r
776 selected by the predicate for which the\r
777 predicate is inappropriate\r
778\r
779 If (b) were false, then one could use a simple\r
780 predicate (assuming "-prc on"):\r
781\r
782 castExpr :\r
783 <<isTypeName(LATEXT(2))>>?\r
784 LP typeName RP\r
785 | ....\r
786 ;\r
787\r
788 typeName : <<isTypeName(LATEXT(1))>>? ID\r
789 ;\r
790\r
791 So, when do you use the "&&" style guarded predicate ?\r
792\r
793 The new-style "&&" predicate should always be used with\r
794 predicate context. The context guard is in ADDITION to\r
795 the automatically computed context. Thus it useful for\r
796 predicates which depend on the token type for reasons\r
797 other than context.\r
798\r
799 The following example is contributed by Reinier van den Born\r
800 (reinier@vnet.ibm.com).\r
801\r
802 +-------------------------------------------------------------------------+\r
803 | This grammar has two ways to call functions: |\r
804 | |\r
805 | - a "standard" call syntax with parens and comma separated args |\r
806 | - a shell command like syntax (no parens and spacing separated args) |\r
807 | |\r
808 | The former also allows a variable to hold the name of the function, |\r
809 | the latter can also be used to call external commands. |\r
810 | |\r
811 | The grammar (simplified) looks like this: |\r
812 | |\r
813 | fun_call : ID "(" { expr ("," expr)* } ")" |\r
814 | /* ID is function name */ |\r
815 | | "@" ID "(" { expr ("," expr)* } ")" |\r
816 | /* ID is var containing fun name */ |\r
817 | ; |\r
818 | |\r
819 | command : ID expr* /* ID is function name */ |\r
820 | | path expr* /* path is external command name */ |\r
821 | ; |\r
822 | |\r
823 | path : ID /* left out slashes and such */ |\r
824 | | "@" ID /* ID is environment var */ |\r
825 | ; |\r
826 | |\r
827 | expr : .... |\r
828 | | "(" expr ")"; |\r
829 | |\r
830 | call : fun_call |\r
831 | | command |\r
832 | ; |\r
833 | |\r
834 | Obviously the call is wildly ambiguous. This is more or less how this |\r
835 | is to be resolved: |\r
836 | |\r
837 | A call begins with an ID or an @ followed by an ID. |\r
838 | |\r
839 | If it is an ID and if it is an ext. command name -> command |\r
840 | if followed by a paren -> fun_call |\r
841 | otherwise -> command |\r
842 | |\r
843 | If it is an @ and if the ID is a var name -> fun_call |\r
844 | otherwise -> command |\r
845 | |\r
846 | One can implement these rules quite neatly using && predicates: |\r
847 | |\r
848 | call : ("@" ID)? && <<isVarName(LT(2))>>? fun_call |\r
849 | | (ID)? && <<isExtCmdName>>? command |\r
850 | | (ID "(")? fun_call |\r
851 | | command |\r
852 | ; |\r
853 | |\r
854 | This can be done better, so it is not an ideal example, but it |\r
855 | conveys the principle. |\r
856 +-------------------------------------------------------------------------+\r
857\r
858#122. (Changed in 1.33MR11) Member functions to reset DLG in C++ mode\r
859\r
860 void DLGFileReset(FILE *f) { input = f; found_eof = 0; }\r
861 void DLGStringReset(DLGChar *s) { input = s; p = &input[0]; }\r
862\r
863 Supplied by R.A. Nelson (cowboy@VNET.IBM.COM)\r
864\r
865#119. (Changed in 1.33MR11) Ambiguity aid for grammars\r
866\r
867 The user can ask for additional information on ambiguities reported\r
868 by antlr to stdout. At the moment, only one ambiguity report can\r
869 be created in an antlr run.\r
870\r
871 This feature is enabled using the "-aa" (Ambiguity Aid) option.\r
872\r
873 The following options control the reporting of ambiguities:\r
874\r
875 -aa ruleName Selects reporting by name of rule\r
876 -aa lineNumber Selects reporting by line number\r
877 (file name not compared)\r
878\r
879 -aam Selects "multiple" reporting for a token\r
880 in the intersection set of the\r
881 alternatives.\r
882\r
883 For instance, the token ID may appear dozens\r
884 of times in various paths as the program\r
885 explores the rules which are reachable from\r
886 the point of an ambiguity. With option -aam\r
887 every possible path the search program\r
888 encounters is reported.\r
889\r
890 Without -aam only the first encounter is\r
891 reported. This may result in incomplete\r
892 information, but the information may be\r
893 sufficient and much shorter.\r
894\r
895 -aad depth Selects the depth of the search.\r
896 The default value is 1.\r
897\r
898 The number of paths to be searched, and the\r
899 size of the report can grow geometrically\r
900 with the -ck value if a full search for all\r
901 contributions to the source of the ambiguity\r
902 is explored.\r
903\r
904 The depth represents the number of tokens\r
905 in the lookahead set which are matched against\r
906 the set of ambiguous tokens. A depth of 1\r
907 means that the search stops when a lookahead\r
908 sequence of just one token is matched.\r
909\r
910 A k=1 ck=6 grammar might generate 5,000 items\r
911 in a report if a full depth 6 search is made\r
912 with the Ambiguity Aid. The source of the\r
913 problem may be in the first token and obscured\r
914 by the volume of data - I hesitate to call\r
915 it information.\r
916\r
917 When the user selects a depth > 1, the search\r
918 is first performed at depth=1 for both\r
919 alternatives, then depth=2 for both alternatives,\r
920 etc.\r
921\r
922 Sample output for rule grammar in antlr.g itself:\r
923\r
924 +---------------------------------------------------------------------+\r
925 | Ambiguity Aid |\r
926 | |\r
927 | Choice 1: grammar/70 line 632 file a.g |\r
928 | Choice 2: grammar/82 line 644 file a.g |\r
929 | |\r
930 | Intersection of lookahead[1] sets: |\r
931 | |\r
932 | "\}" "class" "#errclass" "#tokclass" |\r
933 | |\r
934 | Choice:1 Depth:1 Group:1 ("#errclass") |\r
935 | 1 in (...)* block grammar/70 line 632 a.g |\r
936 | 2 to error grammar/73 line 635 a.g |\r
937 | 3 error error/1 line 894 a.g |\r
938 | 4 #token "#errclass" error/2 line 895 a.g |\r
939 | |\r
940 | Choice:1 Depth:1 Group:2 ("#tokclass") |\r
941 | 2 to tclass grammar/74 line 636 a.g |\r
942 | 3 tclass tclass/1 line 937 a.g |\r
943 | 4 #token "#tokclass" tclass/2 line 938 a.g |\r
944 | |\r
945 | Choice:1 Depth:1 Group:3 ("class") |\r
946 | 2 to class_def grammar/75 line 637 a.g |\r
947 | 3 class_def class_def/1 line 669 a.g |\r
948 | 4 #token "class" class_def/3 line 671 a.g |\r
949 | |\r
950 | Choice:1 Depth:1 Group:4 ("\}") |\r
951 | 2 #token "\}" grammar/76 line 638 a.g |\r
952 | |\r
953 | Choice:2 Depth:1 Group:5 ("#errclass") |\r
954 | 1 in (...)* block grammar/83 line 645 a.g |\r
955 | 2 to error grammar/93 line 655 a.g |\r
956 | 3 error error/1 line 894 a.g |\r
957 | 4 #token "#errclass" error/2 line 895 a.g |\r
958 | |\r
959 | Choice:2 Depth:1 Group:6 ("#tokclass") |\r
960 | 2 to tclass grammar/94 line 656 a.g |\r
961 | 3 tclass tclass/1 line 937 a.g |\r
962 | 4 #token "#tokclass" tclass/2 line 938 a.g |\r
963 | |\r
964 | Choice:2 Depth:1 Group:7 ("class") |\r
965 | 2 to class_def grammar/95 line 657 a.g |\r
966 | 3 class_def class_def/1 line 669 a.g |\r
967 | 4 #token "class" class_def/3 line 671 a.g |\r
968 | |\r
969 | Choice:2 Depth:1 Group:8 ("\}") |\r
970 | 2 #token "\}" grammar/96 line 658 a.g |\r
971 +---------------------------------------------------------------------+\r
972\r
973 For a linear lookahead set ambiguity (where k=1 or for k>1 but\r
974 when all lookahead sets [i] with i<k all have degree one) the\r
975 reports appear in the following order:\r
976\r
977 for (depth=1 ; depth <= "-aad depth" ; depth++) {\r
978 for (alternative=1; alternative <=2 ; alternative++) {\r
979 while (matches-are-found) {\r
980 group++;\r
981 print-report\r
982 };\r
983 };\r
984 };\r
985\r
986 For reporting a k-tuple ambiguity, the reports appear in the\r
987 following order:\r
988\r
989 for (depth=1 ; depth <= "-aad depth" ; depth++) {\r
990 while (matches-are-found) {\r
991 for (alternative=1; alternative <=2 ; alternative++) {\r
992 group++;\r
993 print-report\r
994 };\r
995 };\r
996 };\r
997\r
998 This is because matches are generated in different ways for\r
999 linear lookahead and k-tuples.\r
1000\r
1001#117. (Changed in 1.33MR10) new EXPERIMENTAL predicate hoisting code\r
1002\r
1003 The hoisting of predicates into rules to create prediction\r
1004 expressions is a problem in antlr. Consider the following\r
1005 example (k=1 with -prc on):\r
1006\r
1007 start : (a)* "@" ;\r
1008 a : b | c ;\r
1009 b : <<isUpper(LATEXT(1))>>? A ;\r
1010 c : A ;\r
1011\r
1012 Prior to 1.33MR10 the code generated for "start" would resemble:\r
1013\r
1014 while {\r
1015 if (LA(1)==A &&\r
1016 (!LA(1)==A || isUpper())) {\r
1017 a();\r
1018 }\r
1019 };\r
1020\r
1021 This code is wrong because it makes rule "c" unreachable from\r
1022 "start". The essence of the problem is that antlr fails to\r
1023 recognize that there can be a valid alternative within "a" even\r
1024 when the predicate <<isUpper(LATEXT(1))>>? is false.\r
1025\r
1026 In 1.33MR10 with -mrhoist the hoisting of the predicate into\r
1027 "start" is suppressed because it recognizes that "c" can\r
1028 cover all the cases where the predicate is false:\r
1029\r
1030 while {\r
1031 if (LA(1)==A) {\r
1032 a();\r
1033 }\r
1034 };\r
1035\r
1036 With the antlr "-info p" switch the user will receive information\r
1037 about the predicate suppression in the generated file:\r
1038\r
1039 --------------------------------------------------------------\r
1040 #if 0\r
1041\r
1042 Hoisting of predicate suppressed by alternative without predicate.\r
1043 The alt without the predicate includes all cases where\r
1044 the predicate is false.\r
1045\r
1046 WITH predicate: line 7 v1.g\r
1047 WITHOUT predicate: line 7 v1.g\r
1048\r
1049 The context set for the predicate:\r
1050\r
1051 A\r
1052\r
1053 The lookahead set for the alt WITHOUT the semantic predicate:\r
1054\r
1055 A\r
1056\r
1057 The predicate:\r
1058\r
1059 pred << isUpper(LATEXT(1))>>?\r
1060 depth=k=1 rule b line 9 v1.g\r
1061 set context:\r
1062 A\r
1063 tree context: null\r
1064\r
1065 Chain of referenced rules:\r
1066\r
1067 #0 in rule start (line 5 v1.g) to rule a\r
1068 #1 in rule a (line 7 v1.g)\r
1069\r
1070 #endif\r
1071 --------------------------------------------------------------\r
1072\r
1073 A predicate can be suppressed by a combination of alternatives\r
1074 which, taken together, cover a predicate:\r
1075\r
1076 start : (a)* "@" ;\r
1077\r
1078 a : b | ca | cb | cc ;\r
1079\r
1080 b : <<isUpper(LATEXT(1))>>? ( A | B | C ) ;\r
1081\r
1082 ca : A ;\r
1083 cb : B ;\r
1084 cc : C ;\r
1085\r
1086 Consider a more complex example in which "c" covers only part of\r
1087 a predicate:\r
1088\r
1089 start : (a)* "@" ;\r
1090\r
1091 a : b\r
1092 | c\r
1093 ;\r
1094\r
1095 b : <<isUpper(LATEXT(1))>>?\r
1096 ( A\r
1097 | X\r
1098 );\r
1099\r
1100 c : A\r
1101 ;\r
1102\r
1103 Prior to 1.33MR10 the code generated for "start" would resemble:\r
1104\r
1105 while {\r
1106 if ( (LA(1)==A || LA(1)==X) &&\r
1107 (! (LA(1)==A || LA(1)==X) || isUpper()) {\r
1108 a();\r
1109 }\r
1110 };\r
1111\r
1112 With 1.33MR10 and -mrhoist the predicate context is restricted to\r
1113 the non-covered lookahead. The code resembles:\r
1114\r
1115 while {\r
1116 if ( (LA(1)==A || LA(1)==X) &&\r
1117 (! (LA(1)==X) || isUpper()) {\r
1118 a();\r
1119 }\r
1120 };\r
1121\r
1122 With the antlr "-info p" switch the user will receive information\r
1123 about the predicate restriction in the generated file:\r
1124\r
1125 --------------------------------------------------------------\r
1126 #if 0\r
1127\r
1128 Restricting the context of a predicate because of overlap\r
1129 in the lookahead set between the alternative with the\r
1130 semantic predicate and one without\r
1131 Without this restriction the alternative without the predicate\r
1132 could not be reached when input matched the context of the\r
1133 predicate and the predicate was false.\r
1134\r
1135 WITH predicate: line 11 v4.g\r
1136 WITHOUT predicate: line 12 v4.g\r
1137\r
1138 The original context set for the predicate:\r
1139\r
1140 A X\r
1141\r
1142 The lookahead set for the alt WITHOUT the semantic predicate:\r
1143\r
1144 A\r
1145\r
1146 The intersection of the two sets\r
1147\r
1148 A\r
1149\r
1150 The original predicate:\r
1151\r
1152 pred << isUpper(LATEXT(1))>>?\r
1153 depth=k=1 rule b line 15 v4.g\r
1154 set context:\r
1155 A X\r
1156 tree context: null\r
1157\r
1158 The new (modified) form of the predicate:\r
1159\r
1160 pred << isUpper(LATEXT(1))>>?\r
1161 depth=k=1 rule b line 15 v4.g\r
1162 set context:\r
1163 X\r
1164 tree context: null\r
1165\r
1166 #endif\r
1167 --------------------------------------------------------------\r
1168\r
1169 The bad news about -mrhoist:\r
1170\r
1171 (a) -mrhoist does not analyze predicates with lookahead\r
1172 depth > 1.\r
1173\r
1174 (b) -mrhoist does not look past a guarded predicate to\r
1175 find context which might cover other predicates.\r
1176\r
1177 For these cases you might want to use syntactic predicates.\r
1178 When a semantic predicate fails during guess mode the guess\r
1179 fails and the next alternative is tried.\r
1180\r
1181 Limitation (a) is illustrated by the following example:\r
1182\r
1183 start : (stmt)* EOF ;\r
1184\r
1185 stmt : cast\r
1186 | expr\r
1187 ;\r
1188 cast : <<isTypename(LATEXT(2))>>? LP ID RP ;\r
1189\r
1190 expr : LP ID RP ;\r
1191\r
1192 This is not much different from the first example, except that\r
1193 it requires two tokens of lookahead context to determine what\r
1194 to do. This predicate is NOT suppressed because the current version\r
1195 is unable to handle predicates with depth > 1.\r
1196\r
1197 A predicate can be combined with other predicates during hoisting.\r
1198 In those cases the depth=1 predicates are still handled. Thus,\r
1199 in the following example the isUpper() predicate will be suppressed\r
1200 by line #4 when hoisted from "bizarre" into "start", but will still\r
1201 be present in "bizarre" in order to predict "stmt".\r
1202\r
1203 start : (bizarre)* EOF ; // #1\r
1204 // #2\r
1205 bizarre : stmt // #3\r
1206 | A // #4\r
1207 ;\r
1208\r
1209 stmt : cast\r
1210 | expr\r
1211 ;\r
1212\r
1213 cast : <<isTypename(LATEXT(2))>>? LP ID RP ;\r
1214\r
1215 expr : LP ID RP ;\r
1216 | <<isUpper(LATEXT(1))>>? A\r
1217\r
1218 Limitation (b) is illustrated by the following example of a\r
1219 context guarded predicate:\r
1220\r
1221 rule : (A)? <<p>>? // #1\r
1222 (A // #2\r
1223 |B // #3\r
1224 ) // #4\r
1225 | <<q>> B // #5\r
1226 ;\r
1227\r
1228 Recall that this means that when the lookahead is NOT A then\r
1229 the predicate "p" is ignored and it attempts to match "A|B".\r
1230 Ideally, the "B" at line #3 should suppress predicate "q".\r
1231 However, the current version does not attempt to look past\r
1232 the guard predicate to find context which might suppress other\r
1233 predicates.\r
1234\r
1235 In some cases -mrhoist will lead to the reporting of ambiguities\r
1236 which were not visible before:\r
1237\r
1238 start : (a)* "@";\r
1239 a : bc | d;\r
1240 bc : b | c ;\r
1241\r
1242 b : <<isUpper(LATEXT(1))>>? A;\r
1243 c : A ;\r
1244\r
1245 d : A ;\r
1246\r
1247 In this case there is a true ambiguity in "a" between "bc" and "d"\r
1248 which can both match "A". Without -mrhoist the predicate in "b"\r
1249 is hoisted into "a" and there is no ambiguity reported. However,\r
1250 with -mrhoist, the predicate in "b" is suppressed by "c" (as it\r
1251 should be) making the ambiguity in "a" apparent.\r
1252\r
1253 The motivations for these changes were hoisting problems reported\r
1254 by Reinier van den Born (reinier@vnet.ibm.com) and several others.\r
1255\r
1256#113. (Changed in 1.33MR10) new context guarded pred: (g)? && <<p>>? expr\r
1257\r
1258 The existing context guarded predicate:\r
1259\r
1260 rule : (guard)? => <<p>>? expr\r
1261 | next_alternative\r
1262 ;\r
1263\r
1264 generates code which resembles:\r
1265\r
1266 if (lookahead(expr) && (!guard || pred)) {\r
1267 expr()\r
1268 } else ....\r
1269\r
1270 This is not suitable for some applications because it allows\r
1271 expr() to be invoked when the predicate is false. This is\r
1272 intentional because it is meant to mimic automatically computed\r
1273 predicate context.\r
1274\r
1275 The new context guarded predicate uses the guard information\r
1276 differently because it has a different goal. Consider:\r
1277\r
1278 rule : (guard)? && <<p>>? expr\r
1279 | next_alternative\r
1280 ;\r
1281\r
1282 The new style of context guarded predicate is equivalent to:\r
1283\r
1284 rule : <<guard==true && pred>>? expr\r
1285 | next_alternative\r
1286 ;\r
1287\r
1288 It generates code which resembles:\r
1289\r
1290 if (lookahead(expr) && guard && pred) {\r
1291 expr();\r
1292 } else ...\r
1293\r
1294 Both forms of guarded predicates severely restrict the form of\r
1295 the context guard: it can contain no rule references, no\r
1296 (...)*, no (...)+, and no {...}. It may contain token and\r
1297 token class references, and alternation ("|").\r
1298\r
1299 Addition for 1.33MR11: in the token expression all tokens must\r
1300 be at the same height of the token tree:\r
1301\r
1302 (A ( B | C))? && ... is ok (all height 2)\r
1303 (A ( B | ))? && ... is not ok (some 1, some 2)\r
1304 (A B C D | E F G H)? && ... is ok (all height 4)\r
1305 (A B C D | E )? && ... is not ok (some 4, some 1)\r
1306\r
1307 This restriction is required in order to properly compute the lookahead\r
1308 set for expressions like:\r
1309\r
1310 rule1 : (A B C)? && <<pred>>? rule2 ;\r
1311 rule2 : (A|X) (B|Y) (C|Z);\r
1312\r
1313 This addition was suggested by Rienier van den Born (reinier@vnet.ibm.com)\r
1314\r
1315#109. (Changed in 1.33MR10) improved trace information\r
1316\r
1317 The quality of the trace information provided by the "-gd"\r
1318 switch has been improved significantly. Here is an example\r
1319 of the output from a test program. It shows the rule name,\r
1320 the first token of lookahead, the call depth, and the guess\r
1321 status:\r
1322\r
1323 exit rule gusxx {"?"} depth 2\r
1324 enter rule gusxx {"?"} depth 2\r
1325 enter rule gus1 {"o"} depth 3 guessing\r
1326 guess done - returning to rule gus1 {"o"} at depth 3\r
1327 (guess mode continues - an enclosing guess is still active)\r
1328 guess done - returning to rule gus1 {"Z"} at depth 3\r
1329 (guess mode continues - an enclosing guess is still active)\r
1330 exit rule gus1 {"Z"} depth 3 guessing\r
1331 guess done - returning to rule gusxx {"o"} at depth 2 (guess mode ends)\r
1332 enter rule gus1 {"o"} depth 3\r
1333 guess done - returning to rule gus1 {"o"} at depth 3 (guess mode ends)\r
1334 guess done - returning to rule gus1 {"Z"} at depth 3 (guess mode ends)\r
1335 exit rule gus1 {"Z"} depth 3\r
1336 line 1: syntax error at "Z" missing SC\r
1337 ...\r
1338\r
1339 Rule trace reporting is controlled by the value of the integer\r
1340 [zz]traceOptionValue: when it is positive tracing is enabled,\r
1341 otherwise it is disabled. Tracing during guess mode is controlled\r
1342 by the value of the integer [zz]traceGuessOptionValue. When\r
1343 it is positive AND [zz]traceOptionValue is positive rule trace\r
1344 is reported in guess mode.\r
1345\r
1346 The values of [zz]traceOptionValue and [zz]traceGuessOptionValue\r
1347 can be adjusted by subroutine calls listed below.\r
1348\r
1349 Depending on the presence or absence of the antlr -gd switch\r
1350 the variable [zz]traceOptionValueDefault is set to 0 or 1. When\r
1351 the parser is initialized or [zz]traceReset() is called the\r
1352 value of [zz]traceOptionValueDefault is copied to [zz]traceOptionValue.\r
1353 The value of [zz]traceGuessOptionValue is always initialzed to 1,\r
1354 but, as noted earlier, nothing will be reported unless\r
1355 [zz]traceOptionValue is also positive.\r
1356\r
1357 When the parser state is saved/restored the value of the trace\r
1358 variables are also saved/restored. If a restore causes a change in\r
1359 reporting behavior from on to off or vice versa this will be reported.\r
1360\r
1361 When the -gd option is selected, the macro "#define zzTRACE_RULES"\r
1362 is added to appropriate output files.\r
1363\r
1364 C++ mode\r
1365 --------\r
1366 int traceOption(int delta)\r
1367 int traceGuessOption(int delta)\r
1368 void traceReset()\r
1369 int traceOptionValueDefault\r
1370\r
1371 C mode\r
1372 --------\r
1373 int zzTraceOption(int delta)\r
1374 int zzTraceGuessOption(int delta)\r
1375 void zzTraceReset()\r
1376 int zzTraceOptionValueDefault\r
1377\r
1378 The argument "delta" is added to the traceOptionValue. To\r
1379 turn on trace when inside a particular rule one:\r
1380\r
1381 rule : <<traceOption(+1);>>\r
1382 (\r
1383 rest-of-rule\r
1384 )\r
1385 <<traceOption(-1);>>\r
1386 ; /* fail clause */ <<traceOption(-1);>>\r
1387\r
1388 One can use the same idea to turn *off* tracing within a\r
1389 rule by using a delta of (-1).\r
1390\r
1391 An improvement in the rule trace was suggested by Sramji\r
1392 Ramanathan (ps@kumaran.com).\r
1393\r
1394#108. A Note on Deallocation of Variables Allocated in Guess Mode\r
1395\r
1396 NOTE\r
1397 ------------------------------------------------------\r
1398 This mechanism only works for heap allocated variables\r
1399 ------------------------------------------------------\r
1400\r
1401 The rewrite of the trace provides the machinery necessary\r
1402 to properly free variables or undo actions following a\r
1403 failed guess.\r
1404\r
1405 The macro zzUSER_GUESS_HOOK(guessSeq,zzrv) is expanded\r
1406 as part of the zzGUESS macro. When a guess is opened\r
1407 the value of zzrv is 0. When a longjmp() is executed to\r
1408 undo the guess, the value of zzrv will be 1.\r
1409\r
1410 The macro zzUSER_GUESS_DONE_HOOK(guessSeq) is expanded\r
1411 as part of the zzGUESS_DONE macro. This is executed\r
1412 whether the guess succeeds or fails as part of closing\r
1413 the guess.\r
1414\r
1415 The guessSeq is a sequence number which is assigned to each\r
1416 guess and is incremented by 1 for each guess which becomes\r
1417 active. It is needed by the user to associate the start of\r
1418 a guess with the failure and/or completion (closing) of a\r
1419 guess.\r
1420\r
1421 Guesses are nested. They must be closed in the reverse\r
1422 of the order that they are opened.\r
1423\r
1424 In order to free memory used by a variable during a guess\r
1425 a user must write a routine which can be called to\r
1426 register the variable along with the current guess sequence\r
1427 number provided by the zzUSER_GUESS_HOOK macro. If the guess\r
1428 fails, all variables tagged with the corresponding guess\r
1429 sequence number should be released. This is ugly, but\r
1430 it would require a major rewrite of antlr 1.33 to use\r
1431 some mechanism other than setjmp()/longjmp().\r
1432\r
1433 The order of calls for a *successful* guess would be:\r
1434\r
1435 zzUSER_GUESS_HOOK(guessSeq,0);\r
1436 zzUSER_GUESS_DONE_HOOK(guessSeq);\r
1437\r
1438 The order of calls for a *failed* guess would be:\r
1439\r
1440 zzUSER_GUESS_HOOK(guessSeq,0);\r
1441 zzUSER_GUESS_HOOK(guessSeq,1);\r
1442 zzUSER_GUESS_DONE_HOOK(guessSeq);\r
1443\r
1444 The default definitions of these macros are empty strings.\r
1445\r
1446 Here is an example in C++ mode. The zzUSER_GUESS_HOOK and\r
1447 zzUSER_GUESS_DONE_HOOK macros and myGuessHook() routine\r
1448 can be used without change in both C and C++ versions.\r
1449\r
1450 ----------------------------------------------------------------------\r
1451 <<\r
1452\r
1453 #include "AToken.h"\r
1454\r
1455 typedef ANTLRCommonToken ANTLRToken;\r
1456\r
1457 #include "DLGLexer.h"\r
1458\r
1459 int main() {\r
1460\r
1461 {\r
1462 DLGFileInput in(stdin);\r
1463 DLGLexer lexer(&in,2000);\r
1464 ANTLRTokenBuffer pipe(&lexer,1);\r
1465 ANTLRCommonToken aToken;\r
1466 P parser(&pipe);\r
1467\r
1468 lexer.setToken(&aToken);\r
1469 parser.init();\r
1470 parser.start();\r
1471 };\r
1472\r
1473 fclose(stdin);\r
1474 fclose(stdout);\r
1475 return 0;\r
1476 }\r
1477\r
1478 >>\r
1479\r
1480 <<\r
1481 char *s=NULL;\r
1482\r
1483 #undef zzUSER_GUESS_HOOK\r
1484 #define zzUSER_GUESS_HOOK(guessSeq,zzrv) myGuessHook(guessSeq,zzrv);\r
1485 #undef zzUSER_GUESS_DONE_HOOK\r
1486 #define zzUSER_GUESS_DONE_HOOK(guessSeq) myGuessHook(guessSeq,2);\r
1487\r
1488 void myGuessHook(int guessSeq,int zzrv) {\r
1489 if (zzrv == 0) {\r
1490 fprintf(stderr,"User hook: starting guess #%d\n",guessSeq);\r
1491 } else if (zzrv == 1) {\r
1492 free (s);\r
1493 s=NULL;\r
1494 fprintf(stderr,"User hook: failed guess #%d\n",guessSeq);\r
1495 } else if (zzrv == 2) {\r
1496 free (s);\r
1497 s=NULL;\r
1498 fprintf(stderr,"User hook: ending guess #%d\n",guessSeq);\r
1499 };\r
1500 }\r
1501\r
1502 >>\r
1503\r
1504 #token A "a"\r
1505 #token "[\t \ \n]" <<skip();>>\r
1506\r
1507 class P {\r
1508\r
1509 start : (top)+\r
1510 ;\r
1511\r
1512 top : (which) ? <<fprintf(stderr,"%s is a which\n",s); free(s); s=NULL; >>\r
1513 | other <<fprintf(stderr,"%s is an other\n",s); free(s); s=NULL; >>\r
1514 ; <<if (s != NULL) free(s); s=NULL; >>\r
1515\r
1516 which : which2\r
1517 ;\r
1518\r
1519 which2 : which3\r
1520 ;\r
1521 which3\r
1522 : (label)? <<fprintf(stderr,"%s is a label\n",s);>>\r
1523 | (global)? <<fprintf(stderr,"%s is a global\n",s);>>\r
1524 | (exclamation)? <<fprintf(stderr,"%s is an exclamation\n",s);>>\r
1525 ;\r
1526\r
1527 label : <<s=strdup(LT(1)->getText());>> A ":" ;\r
1528\r
1529 global : <<s=strdup(LT(1)->getText());>> A "::" ;\r
1530\r
1531 exclamation : <<s=strdup(LT(1)->getText());>> A "!" ;\r
1532\r
1533 other : <<s=strdup(LT(1)->getText());>> "other" ;\r
1534\r
1535 }\r
1536 ----------------------------------------------------------------------\r
1537\r
1538 This is a silly example, but illustrates the idea. For the input\r
1539 "a ::" with tracing enabled the output begins:\r
1540\r
1541 ----------------------------------------------------------------------\r
1542 enter rule "start" depth 1\r
1543 enter rule "top" depth 2\r
1544 User hook: starting guess #1\r
1545 enter rule "which" depth 3 guessing\r
1546 enter rule "which2" depth 4 guessing\r
1547 enter rule "which3" depth 5 guessing\r
1548 User hook: starting guess #2\r
1549 enter rule "label" depth 6 guessing\r
1550 guess failed\r
1551 User hook: failed guess #2\r
1552 guess done - returning to rule "which3" at depth 5 (guess mode continues\r
1553 - an enclosing guess is still active)\r
1554 User hook: ending guess #2\r
1555 User hook: starting guess #3\r
1556 enter rule "global" depth 6 guessing\r
1557 exit rule "global" depth 6 guessing\r
1558 guess done - returning to rule "which3" at depth 5 (guess mode continues\r
1559 - an enclosing guess is still active)\r
1560 User hook: ending guess #3\r
1561 enter rule "global" depth 6 guessing\r
1562 exit rule "global" depth 6 guessing\r
1563 exit rule "which3" depth 5 guessing\r
1564 exit rule "which2" depth 4 guessing\r
1565 exit rule "which" depth 3 guessing\r
1566 guess done - returning to rule "top" at depth 2 (guess mode ends)\r
1567 User hook: ending guess #1\r
1568 enter rule "which" depth 3\r
1569 .....\r
1570 ----------------------------------------------------------------------\r
1571\r
1572 Remember:\r
1573\r
1574 (a) Only init-actions are executed during guess mode.\r
1575 (b) A rule can be invoked multiple times during guess mode.\r
1576 (c) If the guess succeeds the rule will be called once more\r
1577 without guess mode so that normal actions will be executed.\r
1578 This means that the init-action might need to distinguish\r
1579 between guess mode and non-guess mode using the variable\r
1580 [zz]guessing.\r
1581\r
1582#101. (Changed in 1.33MR10) antlr -info command line switch\r
1583\r
1584 -info\r
1585\r
1586 p - extra predicate information in generated file\r
1587\r
1588 t - information about tnode use:\r
1589 at the end of each rule in generated file\r
1590 summary on stderr at end of program\r
1591\r
1592 m - monitor progress\r
1593 prints name of each rule as it is started\r
1594 flushes output at start of each rule\r
1595\r
1596 f - first/follow set information to stdout\r
1597\r
1598 0 - no operation (added in 1.33MR11)\r
1599\r
1600 The options may be combined and may appear in any order.\r
1601 For example:\r
1602\r
1603 antlr -info ptm -CC -gt -mrhoist on mygrammar.g\r
1604\r
1605#100a. (Changed in 1.33MR10) Predicate tree simplification\r
1606\r
1607 When the same predicates can be referenced in more than one\r
1608 alternative of a block large predicate trees can be formed.\r
1609\r
1610 The difference that these optimizations make is so dramatic\r
1611 that I have decided to use it even when -mrhoist is not selected.\r
1612\r
1613 Consider the following grammar:\r
1614\r
1615 start : ( all )* ;\r
1616\r
1617 all : a\r
1618 | d\r
1619 | e\r
1620 | f\r
1621 ;\r
1622\r
1623 a : c A B\r
1624 | c A C\r
1625 ;\r
1626\r
1627 c : <<AAA(LATEXT(2))>>?\r
1628 ;\r
1629\r
1630 d : <<BBB(LATEXT(2))>>? B C\r
1631 ;\r
1632\r
1633 e : <<CCC(LATEXT(2))>>? B C\r
1634 ;\r
1635\r
1636 f : e X Y\r
1637 ;\r
1638\r
1639 In rule "a" there is a reference to rule "c" in both alternatives.\r
1640 The length of the predicate AAA is k=2 and it can be followed in\r
1641 alternative 1 only by (A B) while in alternative 2 it can be\r
1642 followed only by (A C). Thus they do not have identical context.\r
1643\r
1644 In rule "all" the alternatives which refer to rules "e" and "f" allow\r
1645 elimination of the duplicate reference to predicate CCC.\r
1646\r
1647 The table below summarized the kind of simplification performed by\r
1648 1.33MR10. In the table, X and Y stand for single predicates\r
1649 (not trees).\r
1650\r
1651 (OR X (OR Y (OR Z))) => (OR X Y Z)\r
1652 (AND X (AND Y (AND Z))) => (AND X Y Z)\r
1653\r
1654 (OR X (... (OR X Y) ... )) => (OR X (... Y ... ))\r
1655 (AND X (... (AND X Y) ... )) => (AND X (... Y ... ))\r
1656 (OR X (... (AND X Y) ... )) => (OR X (... ... ))\r
1657 (AND X (... (OR X Y) ... )) => (AND X (... ... ))\r
1658\r
1659 (AND X) => X\r
1660 (OR X) => X\r
1661\r
1662 In a test with a complex grammar for a real application, a predicate\r
1663 tree with six OR nodes and 12 leaves was reduced to "(OR X Y Z)".\r
1664\r
1665 In 1.33MR10 there is a greater effort to release memory used\r
1666 by predicates once they are no longer in use.\r
1667\r
1668#100b. (Changed in 1.33MR10) Suppression of extra predicate tests\r
1669\r
1670 The following optimizations require that -mrhoist be selected.\r
1671\r
1672 It is relatively easy to optimize the code generated for predicate\r
1673 gates when they are of the form:\r
1674\r
1675 (AND X Y Z ...)\r
1676 or (OR X Y Z ...)\r
1677\r
1678 where X, Y, Z, and "..." represent individual predicates (leaves) not\r
1679 predicate trees.\r
1680\r
1681 If the predicate is an AND the contexts of the X, Y, Z, etc. are\r
1682 ANDed together to create a single Tree context for the group and\r
1683 context tests for the individual predicates are suppressed:\r
1684\r
1685 --------------------------------------------------\r
1686 Note: This was incorrect. The contexts should be\r
1687 ORed together. This has been fixed. A more \r
1688 complete description is available in item #152.\r
1689 ---------------------------------------------------\r
1690\r
1691 Optimization 1: (AND X Y Z ...)\r
1692\r
1693 Suppose the context for Xtest is LA(1)==LP and the context for\r
1694 Ytest is LA(1)==LP && LA(2)==ID.\r
1695\r
1696 Without the optimization the code would resemble:\r
1697\r
1698 if (lookaheadContext &&\r
1699 !(LA(1)==LP && LA(1)==LP && LA(2)==ID) ||\r
1700 ( (! LA(1)==LP || Xtest) &&\r
1701 (! (LA(1)==LP || LA(2)==ID) || Xtest)\r
1702 )) {...\r
1703\r
1704 With the -mrhoist optimization the code would resemble:\r
1705\r
1706 if (lookaheadContext &&\r
1707 ! (LA(1)==LP && LA(2)==ID) || (Xtest && Ytest) {...\r
1708\r
1709 Optimization 2: (OR X Y Z ...) with identical contexts\r
1710\r
1711 Suppose the context for Xtest is LA(1)==ID and for Ytest\r
1712 the context is also LA(1)==ID.\r
1713\r
1714 Without the optimization the code would resemble:\r
1715\r
1716 if (lookaheadContext &&\r
1717 ! (LA(1)==ID || LA(1)==ID) ||\r
1718 (LA(1)==ID && Xtest) ||\r
1719 (LA(1)==ID && Ytest) {...\r
1720\r
1721 With the -mrhoist optimization the code would resemble:\r
1722\r
1723 if (lookaheadContext &&\r
1724 (! LA(1)==ID) || (Xtest || Ytest) {...\r
1725\r
1726 Optimization 3: (OR X Y Z ...) with distinct contexts\r
1727\r
1728 Suppose the context for Xtest is LA(1)==ID and for Ytest\r
1729 the context is LA(1)==LP.\r
1730\r
1731 Without the optimization the code would resemble:\r
1732\r
1733 if (lookaheadContext &&\r
1734 ! (LA(1)==ID || LA(1)==LP) ||\r
1735 (LA(1)==ID && Xtest) ||\r
1736 (LA(1)==LP && Ytest) {...\r
1737\r
1738 With the -mrhoist optimization the code would resemble:\r
1739\r
1740 if (lookaheadContext &&\r
1741 (zzpf=0,\r
1742 (LA(1)==ID && (zzpf=1) && Xtest) ||\r
1743 (LA(1)==LP && (zzpf=1) && Ytest) ||\r
1744 !zzpf) {\r
1745\r
1746 These may appear to be of similar complexity at first,\r
1747 but the non-optimized version contains two tests of each\r
1748 context while the optimized version contains only one\r
1749 such test, as well as eliminating some of the inverted\r
1750 logic (" !(...) || ").\r
1751\r
1752 Optimization 4: Computation of predicate gate trees\r
1753\r
1754 When generating code for the gates of predicate expressions\r
1755 antlr 1.33 vanilla uses a recursive procedure to generate\r
1756 "&&" and "||" expressions for testing the lookahead. As each\r
1757 layer of the predicate tree is exposed a new set of "&&" and\r
1758 "||" expressions on the lookahead are generated. In many\r
1759 cases the lookahead being tested has already been tested.\r
1760\r
1761 With -mrhoist a lookahead tree is computed for the entire\r
1762 lookahead expression. This means that predicates with identical\r
1763 context or context which is a subset of another predicate's\r
1764 context disappear.\r
1765\r
1766 This is especially important for predicates formed by rules\r
1767 like the following:\r
1768\r
1769 uppperCaseVowel : <<isUpperCase(LATEXT(1))>>? vowel;\r
1770 vowel: : <<isVowel(LATEXT(1))>>? LETTERS;\r
1771\r
1772 These predicates are combined using AND since both must be\r
1773 satisfied for rule upperCaseVowel. They have identical\r
1774 context which makes this optimization very effective.\r
1775\r
1776 The affect of Items #100a and #100b together can be dramatic. In\r
1777 a very large (but real world) grammar one particular predicate\r
1778 expression was reduced from an (unreadable) 50 predicate leaves,\r
1779 195 LA(1) terms, and 5500 characters to an (easily comprehensible)\r
1780 3 predicate leaves (all different) and a *single* LA(1) term.\r
1781\r
1782#98. (Changed in 1.33MR10) Option "-info p"\r
1783\r
1784 When the user selects option "-info p" the program will generate\r
1785 detailed information about predicates. If the user selects\r
1786 "-mrhoist on" additional detail will be provided explaining\r
1787 the promotion and suppression of predicates. The output is part\r
1788 of the generated file and sandwiched between #if 0/#endif statements.\r
1789\r
1790 Consider the following k=1 grammar:\r
1791\r
1792 start : ( all ) * ;\r
1793\r
1794 all : ( a\r
1795 | b\r
1796 )\r
1797 ;\r
1798\r
1799 a : c B\r
1800 ;\r
1801\r
1802 c : <<LATEXT(1)>>?\r
1803 | B\r
1804 ;\r
1805\r
1806 b : <<LATEXT(1)>>? X\r
1807 ;\r
1808\r
1809 Below is an excerpt of the output for rule "start" for the three\r
1810 predicate options (off, on, and maintenance release style hoisting).\r
1811\r
1812 For those who do not wish to use the "-mrhoist on" option for code\r
1813 generation the option can be used in a "diagnostic" mode to provide\r
1814 valuable information:\r
1815\r
1816 a. where one should insert null actions to inhibit hoisting\r
1817 b. a chain of rule references which shows where predicates are\r
1818 being hoisted\r
1819\r
1820 ======================================================================\r
1821 Example of "-info p" with "-mrhoist on"\r
1822 ======================================================================\r
1823 #if 0\r
1824\r
1825 Hoisting of predicate suppressed by alternative without predicate.\r
1826 The alt without the predicate includes all cases where the\r
1827 predicate is false.\r
1828\r
1829 WITH predicate: line 11 v36.g\r
1830 WITHOUT predicate: line 12 v36.g\r
1831\r
1832 The context set for the predicate:\r
1833\r
1834 B\r
1835\r
1836 The lookahead set for alt WITHOUT the semantic predicate:\r
1837\r
1838 B\r
1839\r
1840 The predicate:\r
1841\r
1842 pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g\r
1843\r
1844 set context:\r
1845 B\r
1846 tree context: null\r
1847\r
1848 Chain of referenced rules:\r
1849\r
1850 #0 in rule start (line 1 v36.g) to rule all\r
1851 #1 in rule all (line 3 v36.g) to rule a\r
1852 #2 in rule a (line 8 v36.g) to rule c\r
1853 #3 in rule c (line 11 v36.g)\r
1854\r
1855 #endif\r
1856 &&\r
1857 #if 0\r
1858\r
1859 pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g\r
1860\r
1861 set context:\r
1862 X\r
1863 tree context: null\r
1864\r
1865 #endif\r
1866 ======================================================================\r
1867 Example of "-info p" with the default -prc setting ( "-prc off")\r
1868 ======================================================================\r
1869 #if 0\r
1870\r
1871 OR\r
1872 pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g\r
1873\r
1874 set context:\r
1875 nil\r
1876 tree context: null\r
1877\r
1878 pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g\r
1879\r
1880 set context:\r
1881 nil\r
1882 tree context: null\r
1883\r
1884 #endif\r
1885 ======================================================================\r
1886 Example of "-info p" with "-prc on" and "-mrhoist off"\r
1887 ======================================================================\r
1888 #if 0\r
1889\r
1890 OR\r
1891 pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g\r
1892\r
1893 set context:\r
1894 B\r
1895 tree context: null\r
1896\r
1897 pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g\r
1898\r
1899 set context:\r
1900 X\r
1901 tree context: null\r
1902\r
1903 #endif\r
1904 ======================================================================\r
1905\r
1906#60. (Changed in 1.33MR7) Major changes to exception handling\r
1907\r
1908 There were significant problems in the handling of exceptions\r
1909 in 1.33 vanilla. The general problem is that it can only\r
1910 process one level of exception handler. For example, a named\r
1911 exception handler, an exception handler for an alternative, or\r
1912 an exception for a subrule always went to the rule's exception\r
1913 handler if there was no "catch" which matched the exception.\r
1914\r
1915 In 1.33MR7 the exception handlers properly "nest". If an\r
1916 exception handler does not have a matching "catch" then the\r
1917 nextmost outer exception handler is checked for an appropriate\r
1918 "catch" clause, and so on until an exception handler with an\r
1919 appropriate "catch" is found.\r
1920\r
1921 There are still undesirable features in the way exception\r
1922 handlers are implemented, but I do not have time to fix them\r
1923 at the moment:\r
1924\r
1925 The exception handlers for alternatives are outside the\r
1926 block containing the alternative. This makes it impossible\r
1927 to access variables declared in a block or to resume the\r
1928 parse by "falling through". The parse can still be easily\r
1929 resumed in other ways, but not in the most natural fashion.\r
1930\r
1931 This results in an inconsistentcy between named exception\r
1932 handlers and exception handlers for alternatives. When\r
1933 an exception handler for an alternative "falls through"\r
1934 it goes to the nextmost outer handler - not the "normal\r
1935 action".\r
1936\r
1937 A major difference between 1.33MR7 and 1.33 vanilla is\r
1938 the default action after an exception is caught:\r
1939\r
1940 1.33 Vanilla\r
1941 ------------\r
1942 In 1.33 vanilla the signal value is set to zero ("NoSignal")\r
1943 and the code drops through to the code following the exception.\r
1944 For named exception handlers this is the "normal action".\r
1945 For alternative exception handlers this is the rule's handler.\r
1946\r
1947 1.33MR7\r
1948 -------\r
1949 In 1.33MR7 the signal value is NOT automatically set to zero.\r
1950\r
1951 There are two cases:\r
1952\r
1953 For named exception handlers: if the signal value has been\r
1954 set to zero the code drops through to the "normal action".\r
1955\r
1956 For all other cases the code branches to the nextmost outer\r
1957 exception handler until it reaches the handler for the rule.\r
1958\r
1959 The following macros have been defined for convenience:\r
1960\r
1961 C/C++ Mode Name\r
1962 --------------------\r
1963 (zz)suppressSignal\r
1964 set signal & return signal arg to 0 ("NoSignal")\r
1965 (zz)setSignal(intValue)\r
1966 set signal & return signal arg to some value\r
1967 (zz)exportSignal\r
1968 copy the signal value to the return signal arg\r
1969\r
1970 I'm not sure why PCCTS make a distinction between the local\r
1971 signal value and the return signal argument, but I'm loathe\r
1972 to change the code. The burden of copying the local signal\r
1973 value to the return signal argument can be given to the\r
1974 default signal handler, I suppose.\r
1975\r
1976#53. (Explanation for 1.33MR6) What happens after an exception is caught ?\r
1977\r
1978 The Book is silent about what happens after an exception\r
1979 is caught.\r
1980\r
1981 The following code fragment prints "Error Action" followed\r
1982 by "Normal Action".\r
1983\r
1984 test : Word ex:Number <<printf("Normal Action\n");>>\r
1985 exception[ex]\r
1986 catch NoViableAlt:\r
1987 <<printf("Error Action\n");>>\r
1988 ;\r
1989\r
1990 The reason for "Normal Action" is that the normal flow of the\r
1991 program after a user-written exception handler is to "drop through".\r
1992 In the case of an exception handler for a rule this results in\r
1993 the exection of a "return" statement. In the case of an\r
1994 exception handler attached to an alternative, rule, or token\r
1995 this is the code that would have executed had there been no\r
1996 exception.\r
1997\r
1998 The user can achieve the desired result by using a "return"\r
1999 statement.\r
2000\r
2001 test : Word ex:Number <<printf("Normal Action\n");>>\r
2002 exception[ex]\r
2003 catch NoViableAlt:\r
2004 <<printf("Error Action\n"); return;>>\r
2005 ;\r
2006\r
2007 The most powerful mechanism for recovery from parse errors\r
2008 in pccts is syntactic predicates because they provide\r
2009 backtracking. Exceptions allow "return", "break",\r
2010 "consumeUntil(...)", "goto _handler", "goto _fail", and\r
2011 changing the _signal value.\r
2012\r
2013#41. (Added in 1.33MR6) antlr -stdout\r
2014\r
2015 Using "antlr -stdout ..." forces the text that would\r
2016 normally go to the grammar.c or grammar.cpp file to\r
2017 stdout.\r
2018\r
2019#40. (Added in 1.33MR6) antlr -tab to change tab stops\r
2020\r
2021 Using "antlr -tab number ..." changes the tab stops\r
2022 for the grammar.c or grammar.cpp file. The number\r
2023 must be between 0 and 8. Using 0 gives tab characters,\r
2024 values between 1 and 8 give the appropriate number of\r
2025 space characters.\r
2026\r
2027#34. (Added to 1.33MR1) Add public DLGLexerBase::set_line(int newValue)\r
2028\r
2029 Previously there was no public function for changing the line\r
2030 number maintained by the lexer.\r
2031\r
2032#28. (Added to 1.33MR1) More control over DLG header\r
2033\r
2034 Version 1.33MR1 adds the following directives to PCCTS\r
2035 for C++ mode:\r
2036\r
2037 #lexprefix <<source code>>\r
2038\r
2039 Adds source code to the DLGLexer.h file\r
2040 after the #include "DLexerBase.h" but\r
2041 before the start of the class definition.\r
2042\r
2043 #lexmember <<source code>>\r
2044\r
2045 Adds source code to the DLGLexer.h file\r
2046 as part of the DLGLexer class body. It\r
2047 appears immediately after the start of\r
2048 the class and a "public: statement.\r
2049\r