]>
Commit | Line | Data |
---|---|---|
1 | \r | |
2 | ------------------------------------------------------------\r | |
3 | This is the second part of a two part file.\r | |
4 | This is a list of changes to pccts 1.33 prior to MR13\r | |
5 | For more recent information see CHANGES_FROM_133.txt\r | |
6 | ------------------------------------------------------------\r | |
7 | \r | |
8 | DISCLAIMER\r | |
9 | \r | |
10 | The software and these notes are provided "as is". They may include\r | |
11 | typographical or technical errors and their authors disclaims all\r | |
12 | liability of any kind or nature for damages due to error, fault,\r | |
13 | defect, or deficiency regardless of cause. All warranties of any\r | |
14 | kind, either express or implied, including, but not limited to, the\r | |
15 | implied warranties of merchantability and fitness for a particular\r | |
16 | purpose are disclaimed.\r | |
17 | \r | |
18 | \r | |
19 | #153. (Changed in MR12b) Bug in computation of -mrhoist suppression set\r | |
20 | \r | |
21 | Consider the following grammar with k=1 and "-mrhoist on":\r | |
22 | \r | |
23 | r1 : (A)? => ((p>>? x /* l1 */\r | |
24 | | r2 /* l2 */\r | |
25 | ;\r | |
26 | r2 : A /* l4 */\r | |
27 | | (B)? => <<q>>? y /* l5 */\r | |
28 | ;\r | |
29 | \r | |
30 | In earlier versions the mrhoist routine would see that both l1 and\r | |
31 | l2 contained predicates and would assume that this prevented either\r | |
32 | from acting to suppress the other predicate. In the example above\r | |
33 | it didn't realize the A at line l4 is capable of suppressing the\r | |
34 | predicate at l1 even though alt l2 contains (indirectly) a predicate.\r | |
35 | \r | |
36 | This is fixed in MR12b.\r | |
37 | \r | |
38 | Reported by Reinier van den Born (reinier@vnet.ibm.com)\r | |
39 | \r | |
40 | #153. (Changed in MR12a) Bug in computation of -mrhoist suppression set\r | |
41 | \r | |
42 | An oversight similar to that described in Item #152 appeared in\r | |
43 | the computation of the set that "covered" a predicate. If a\r | |
44 | predicate expression included a term such as p=AND(q,r) the context\r | |
45 | of p was taken to be context(q) & context(r), when it should have\r | |
46 | been context(q) | context(r). This is fixed in MR12a.\r | |
47 | \r | |
48 | #152. (Changed in MR12) Bug in generation of predicate expressions\r | |
49 | \r | |
50 | The primary purpose for MR12 is to make quite clear that MR11 is\r | |
51 | obsolete and to fix the bug related to predicate expressions.\r | |
52 | \r | |
53 | In MR10 code was added to optimize the code generated for\r | |
54 | predicate expression tests. Unfortunately, there was a\r | |
55 | significant oversight in the code which resulted in a bug in\r | |
56 | the generation of code for predicate expression tests which\r | |
57 | contained predicates combined using AND:\r | |
58 | \r | |
59 | r0 : (r1)* "@" ;\r | |
60 | r1 : (AAA)? => <<p LATEXT(1)>>? r2 ;\r | |
61 | r2 : (BBB)? => <<q LATEXT(1)>>? Q\r | |
62 | | (BBB)? => <<r LATEXT(1)>>? Q\r | |
63 | ;\r | |
64 | \r | |
65 | In MR11 (and MR10 when using "-mrhoist on") the code generated\r | |
66 | for r0 to predict r1 would be equivalent to:\r | |
67 | \r | |
68 | if ( LA(1)==Q &&\r | |
69 | (LA(1)==AAA && LA(1)==BBB) &&\r | |
70 | ( p && ( q || r )) ) {\r | |
71 | \r | |
72 | This is incorrect because it expresses the idea that LA(1)\r | |
73 | *must* be AAA in order to attempt r1, and *must* be BBB to\r | |
74 | attempt r2. The result was that r1 became unreachable since\r | |
75 | both condition can not be simultaneously true.\r | |
76 | \r | |
77 | The general philosophy of code generation for predicates\r | |
78 | can be summarized as follows:\r | |
79 | \r | |
80 | a. If the context is true don't enter an alt\r | |
81 | for which the corresponding predicate is false.\r | |
82 | \r | |
83 | If the context is false then it is okay to enter\r | |
84 | the alt without evaluating the predicate at all.\r | |
85 | \r | |
86 | b. A predicate created by ORing of predicates has\r | |
87 | context which is the OR of their individual contexts.\r | |
88 | \r | |
89 | c. A predicate created by ANDing of predicates has\r | |
90 | (surprise) context which is the OR of their individual\r | |
91 | contexts.\r | |
92 | \r | |
93 | d. Apply these rules recursively.\r | |
94 | \r | |
95 | e. Remember rule (a)\r | |
96 | \r | |
97 | The correct code should express the idea that *if* LA(1) is\r | |
98 | AAA then p must be true to attempt r1, but if LA(1) is *not*\r | |
99 | AAA then it is okay to attempt r1, provided that *if* LA(1) is\r | |
100 | BBB then one of q or r must be true.\r | |
101 | \r | |
102 | if ( LA(1)==Q &&\r | |
103 | ( !(LA(1)==AAA || LA(1)==BBB) ||\r | |
104 | ( ! LA(1) == AAA || p) &&\r | |
105 | ( ! LA(1) == BBB || q || r ) ) ) {\r | |
106 | \r | |
107 | I believe this is fixed in MR12.\r | |
108 | \r | |
109 | Reported by Reinier van den Born (reinier@vnet.ibm.com)\r | |
110 | \r | |
111 | #151a. (Changed in MR12) ANTLRParser::getLexer()\r | |
112 | \r | |
113 | As a result of several requests, I have added public methods to\r | |
114 | get a pointer to the lexer belonging to a parser.\r | |
115 | \r | |
116 | ANTLRTokenStream *ANTLRParser::getLexer() const\r | |
117 | \r | |
118 | Returns a pointer to the lexer being used by the\r | |
119 | parser. ANTLRTokenStream is the base class of\r | |
120 | DLGLexer\r | |
121 | \r | |
122 | ANTLRTokenStream *ANTLRTokenBuffer::getLexer() const\r | |
123 | \r | |
124 | Returns a pointer to the lexer being used by the\r | |
125 | ANTLRTokenBuffer. ANTLRTokenStream is the base\r | |
126 | class of DLGLexer\r | |
127 | \r | |
128 | You must manually cast the ANTLRTokenStream to your program's\r | |
129 | lexer class. Because the name of the lexer's class is not fixed.\r | |
130 | Thus it is impossible to incorporate it into the DLGLexerBase\r | |
131 | class.\r | |
132 | \r | |
133 | #151b.(Changed in MR12) ParserBlackBox member getLexer()\r | |
134 | \r | |
135 | The template class ParserBlackBox now has a member getLexer()\r | |
136 | which returns a pointer to the lexer.\r | |
137 | \r | |
138 | #150. (Changed in MR12) syntaxErrCount and lexErrCount now public\r | |
139 | \r | |
140 | See Item #127 for more information.\r | |
141 | \r | |
142 | #149. (Changed in MR12) antlr option -info o (letter o for orphan)\r | |
143 | \r | |
144 | If there is more than one rule which is not referenced by any\r | |
145 | other rule then all such rules are listed. This is useful for\r | |
146 | alerting one to rules which are not used, but which can still\r | |
147 | contribute to ambiguity. For example:\r | |
148 | \r | |
149 | start : a Z ;\r | |
150 | unused: a A ;\r | |
151 | a : (A)+ ;\r | |
152 | \r | |
153 | will cause an ambiguity report for rule "a" which will be\r | |
154 | difficult to understand if the user forgets about rule "unused"\r | |
155 | simply because it is not used in the grammar.\r | |
156 | \r | |
157 | #148. (Changed in MR11) #token names appearing in zztokens,token_tbl\r | |
158 | \r | |
159 | In a #token statement like the following:\r | |
160 | \r | |
161 | #token Plus "\+"\r | |
162 | \r | |
163 | the string "Plus" appears in the zztokens array (C mode) and\r | |
164 | token_tbl (C++ mode). This string is used in most error\r | |
165 | messages. In MR11 one has the option of using some other string,\r | |
166 | (e.g. "+") in those tables.\r | |
167 | \r | |
168 | In MR11 one can write:\r | |
169 | \r | |
170 | #token Plus ("+") "\+"\r | |
171 | #token RP ("(") "\("\r | |
172 | #token COM ("comment begin") "/\*"\r | |
173 | \r | |
174 | A #token statement is allowed to appear in more than one #lexclass\r | |
175 | with different regular expressions. However, the token name appears\r | |
176 | only once in the zztokens/token_tbl array. This means that only\r | |
177 | one substitute can be specified for a given #token name. The second\r | |
178 | attempt to define a substitute name (different from the first) will\r | |
179 | result in an error message.\r | |
180 | \r | |
181 | #147. (Changed in MR11) Bug in follow set computation\r | |
182 | \r | |
183 | There is a bug in 1.33 vanilla and all maintenance releases\r | |
184 | prior to MR11 in the computation of the follow set. The bug is\r | |
185 | different than that described in Item #82 and probably more\r | |
186 | common. It was discovered in the ansi.g grammar while testing\r | |
187 | the "ambiguity aid" (Item #119). The search for a bug started\r | |
188 | when the ambiguity aid was unable to discover the actual source\r | |
189 | of an ambiguity reported by antlr.\r | |
190 | \r | |
191 | The problem appears when an optimization of the follow set\r | |
192 | computation is used inappropriately. The result is that the\r | |
193 | follow set used is the "worst case". In other words, the error\r | |
194 | can lead to false reports of ambiguity. The good news is that\r | |
195 | if you have a grammar in which you have addressed all reported\r | |
196 | ambiguities you are ok. The bad news is that you may have spent\r | |
197 | time fixing ambiguities that were not real, or used k=2 when\r | |
198 | ck=2 might have been sufficient, and so on.\r | |
199 | \r | |
200 | The following grammar demonstrates the problem:\r | |
201 | \r | |
202 | ------------------------------------------------------------\r | |
203 | expr : ID ;\r | |
204 | \r | |
205 | start : stmt SEMI ;\r | |
206 | \r | |
207 | stmt : CASE expr COLON\r | |
208 | | expr SEMI\r | |
209 | | plain_stmt\r | |
210 | ;\r | |
211 | \r | |
212 | plain_stmt : ID COLON ;\r | |
213 | ------------------------------------------------------------\r | |
214 | \r | |
215 | When compiled with k=1 and ck=2 it will report:\r | |
216 | \r | |
217 | warning: alts 2 and 3 of the rule itself ambiguous upon\r | |
218 | { IDENTIFIER }, { COLON }\r | |
219 | \r | |
220 | When antlr analyzes "stmt" it computes the first[1] set of all\r | |
221 | alternatives. It finds an ambiguity between alts 2 and 3 for ID.\r | |
222 | It then computes the first[2] set for alternatives 2 and 3 to resolve\r | |
223 | the ambiguity. In computing the first[2] set of "expr" (which is\r | |
224 | only one token long) it needs to determine what could follow "expr".\r | |
225 | Under a certain combination of circumstances antlr forgets that it\r | |
226 | is trying to analyze "stmt" which can only be followed by SEMI and\r | |
227 | adds to the first[2] set of "expr" the "global" follow set (including\r | |
228 | "COLON") which could follow "expr" (under other conditions) in the\r | |
229 | phrase "CASE expr COLON".\r | |
230 | \r | |
231 | #146. (Changed in MR11) Option -treport for locating "difficult" alts\r | |
232 | \r | |
233 | It can be difficult to determine which alternatives are causing\r | |
234 | pccts to work hard to resolve an ambiguity. In some cases the\r | |
235 | ambiguity is successfully resolved after much CPU time so there\r | |
236 | is no message at all.\r | |
237 | \r | |
238 | A rough measure of the amount of work being peformed which is\r | |
239 | independent of the CPU speed and system load is the number of\r | |
240 | tnodes created. Using "-info t" gives information about the\r | |
241 | total number of tnodes created and the peak number of tnodes.\r | |
242 | \r | |
243 | Tree Nodes: peak 1300k created 1416k lost 0\r | |
244 | \r | |
245 | It also puts in the generated C or C++ file the number of tnodes\r | |
246 | created for a rule (at the end of the rule). However this\r | |
247 | information is not sufficient to locate the alternatives within\r | |
248 | a rule which are causing the creation of tnodes.\r | |
249 | \r | |
250 | Using:\r | |
251 | \r | |
252 | antlr -treport 100000 ....\r | |
253 | \r | |
254 | causes antlr to list on stdout any alternatives which require the\r | |
255 | creation of more than 100,000 tnodes, along with the lookahead sets\r | |
256 | for those alternatives.\r | |
257 | \r | |
258 | The following is a trivial case from the ansi.g grammar which shows\r | |
259 | the format of the report. This report might be of more interest\r | |
260 | in cases where 1,000,000 tuples were created to resolve the ambiguity.\r | |
261 | \r | |
262 | -------------------------------------------------------------------------\r | |
263 | There were 0 tuples whose ambiguity could not be resolved\r | |
264 | by full lookahead\r | |
265 | There were 157 tnodes created to resolve ambiguity between:\r | |
266 | \r | |
267 | Choice 1: statement/2 line 475 file ansi.g\r | |
268 | Choice 2: statement/3 line 476 file ansi.g\r | |
269 | \r | |
270 | Intersection of lookahead[1] sets:\r | |
271 | \r | |
272 | IDENTIFIER\r | |
273 | \r | |
274 | Intersection of lookahead[2] sets:\r | |
275 | \r | |
276 | LPARENTHESIS COLON AMPERSAND MINUS\r | |
277 | STAR PLUSPLUS MINUSMINUS ONESCOMPLEMENT\r | |
278 | NOT SIZEOF OCTALINT DECIMALINT\r | |
279 | HEXADECIMALINT FLOATONE FLOATTWO IDENTIFIER\r | |
280 | STRING CHARACTER\r | |
281 | -------------------------------------------------------------------------\r | |
282 | \r | |
283 | #145. (Documentation) Generation of Expression Trees\r | |
284 | \r | |
285 | Item #99 was misleading because it implied that the optimization\r | |
286 | for tree expressions was available only for trees created by\r | |
287 | predicate expressions and neglected to mention that it required\r | |
288 | the use of "-mrhoist on". The optimization applies to tree\r | |
289 | expressions created for grammars with k>1 and for predicates with\r | |
290 | lookahead depth >1.\r | |
291 | \r | |
292 | In MR11 the optimized version is always used so the -mrhoist on\r | |
293 | option need not be specified.\r | |
294 | \r | |
295 | #144. (Changed in MR11) Incorrect test for exception group\r | |
296 | \r | |
297 | In testing for a rule's exception group the label a pointer\r | |
298 | is compared against '\0'. The intention is "*pointer".\r | |
299 | \r | |
300 | Reported by Jeffrey C. Fried (Jeff@Fried.net).\r | |
301 | \r | |
302 | #143. (Changed in MR11) Optional ";" at end of #token statement\r | |
303 | \r | |
304 | Fixes problem of:\r | |
305 | \r | |
306 | #token X "x"\r | |
307 | \r | |
308 | <<\r | |
309 | parser action\r | |
310 | >>\r | |
311 | \r | |
312 | Being confused with:\r | |
313 | \r | |
314 | #token X "x" <<lexical action>>\r | |
315 | \r | |
316 | #142. (Changed in MR11) class BufFileInput subclass of DLGInputStream\r | |
317 | \r | |
318 | Alexey Demakov (demakov@kazbek.ispras.ru) has supplied class\r | |
319 | BufFileInput derived from DLGInputStream which provides a\r | |
320 | function lookahead(char *string) to test characters in the\r | |
321 | input stream more than one character ahead.\r | |
322 | \r | |
323 | The default amount of lookahead is specified by the constructor\r | |
324 | and defaults to 8 characters. This does *not* include the one\r | |
325 | character of lookahead maintained internally by DLG in member "ch"\r | |
326 | and which is not available for testing via BufFileInput::lookahead().\r | |
327 | \r | |
328 | This is a useful class for overcoming the one-character-lookahead\r | |
329 | limitation of DLG without resorting to a lexer capable of\r | |
330 | backtracking (like flex) which is not integrated with antlr as is\r | |
331 | DLG.\r | |
332 | \r | |
333 | There are no restrictions on copying or using BufFileInput.* except\r | |
334 | that the authorship and related information must be retained in the\r | |
335 | source code.\r | |
336 | \r | |
337 | The class is located in pccts/h/BufFileInput.* of the kit.\r | |
338 | \r | |
339 | #141. (Changed in MR11) ZZDEBUG_CONSUME for ANTLRParser::consume()\r | |
340 | \r | |
341 | A debug aid has been added to file ANTLRParser::consume() in\r | |
342 | file AParser.cpp:\r | |
343 | \r | |
344 | #ifdef ZZDEBUG_CONSUME_ACTION\r | |
345 | zzdebug_consume_action();\r | |
346 | #endif\r | |
347 | \r | |
348 | Suggested by Sramji Ramanathan (ps@kumaran.com).\r | |
349 | \r | |
350 | #140. (Changed in MR11) #pred to define predicates\r | |
351 | \r | |
352 | +---------------------------------------------------+\r | |
353 | | Note: Assume "-prc on" for this entire discussion |\r | |
354 | +---------------------------------------------------+\r | |
355 | \r | |
356 | A problem with predicates is that each one is regarded as\r | |
357 | unique and capable of disambiguating cases where two\r | |
358 | alternatives have identical lookahead. For example:\r | |
359 | \r | |
360 | rule : <<pred(LATEXT(1))>>? A\r | |
361 | | <<pred(LATEXT(1))>>? A\r | |
362 | ;\r | |
363 | \r | |
364 | will not cause any error messages or warnings to be issued\r | |
365 | by earlier versions of pccts. To compare the text of the\r | |
366 | predicates is an incomplete solution.\r | |
367 | \r | |
368 | In 1.33MR11 I am introducing the #pred statement in order to\r | |
369 | solve some problems with predicates. The #pred statement allows\r | |
370 | one to give a symbolic name to a "predicate literal" or a\r | |
371 | "predicate expression" in order to refer to it in other predicate\r | |
372 | expressions or in the rules of the grammar.\r | |
373 | \r | |
374 | The predicate literal associated with a predicate symbol is C\r | |
375 | or C++ code which can be used to test the condition. A\r | |
376 | predicate expression defines a predicate symbol in terms of other\r | |
377 | predicate symbols using "!", "&&", and "||". A predicate symbol\r | |
378 | can be defined in terms of a predicate literal, a predicate\r | |
379 | expression, or *both*.\r | |
380 | \r | |
381 | When a predicate symbol is defined with both a predicate literal\r | |
382 | and a predicate expression, the predicate literal is used to generate\r | |
383 | code, but the predicate expression is used to check for two\r | |
384 | alternatives with identical predicates in both alternatives.\r | |
385 | \r | |
386 | Here are some examples of #pred statements:\r | |
387 | \r | |
388 | #pred IsLabel <<isLabel(LATEXT(1))>>?\r | |
389 | #pred IsLocalVar <<isLocalVar(LATEXT(1))>>?\r | |
390 | #pred IsGlobalVar <<isGlobalVar(LATEXT(1)>>?\r | |
391 | #pred IsVar <<isVar(LATEXT(1))>>? IsLocalVar || IsGlobalVar\r | |
392 | #pred IsScoped <<isScoped(LATEXT(1))>>? IsLabel || IsLocalVar\r | |
393 | \r | |
394 | I hope that the use of EBNF notation to describe the syntax of the\r | |
395 | #pred statement will not cause problems for my readers (joke).\r | |
396 | \r | |
397 | predStatement : "#pred"\r | |
398 | CapitalizedName\r | |
399 | (\r | |
400 | "<<predicate_literal>>?"\r | |
401 | | "<<predicate_literal>>?" predOrExpr\r | |
402 | | predOrExpr\r | |
403 | )\r | |
404 | ;\r | |
405 | \r | |
406 | predOrExpr : predAndExpr ( "||" predAndExpr ) * ;\r | |
407 | \r | |
408 | predAndExpr : predPrimary ( "&&" predPrimary ) * ;\r | |
409 | \r | |
410 | predPrimary : CapitalizedName\r | |
411 | | "!" predPrimary\r | |
412 | | "(" predOrExpr ")"\r | |
413 | ;\r | |
414 | \r | |
415 | What is the purpose of this nonsense ?\r | |
416 | \r | |
417 | To understand how predicate symbols help, you need to realize that\r | |
418 | predicate symbols are used in two different ways with two different\r | |
419 | goals.\r | |
420 | \r | |
421 | a. Allow simplification of predicates which have been combined\r | |
422 | during predicate hoisting.\r | |
423 | \r | |
424 | b. Allow recognition of identical predicates which can't disambiguate\r | |
425 | alternatives with common lookahead.\r | |
426 | \r | |
427 | First we will discuss goal (a). Consider the following rule:\r | |
428 | \r | |
429 | rule0: rule1\r | |
430 | | ID\r | |
431 | | ...\r | |
432 | ;\r | |
433 | \r | |
434 | rule1: rule2\r | |
435 | | rule3\r | |
436 | ;\r | |
437 | \r | |
438 | rule2: <<isX(LATEXT(1))>>? ID ;\r | |
439 | rule3: <<!isX(LATEXT(1)>>? ID ;\r | |
440 | \r | |
441 | When the predicates in rule2 and rule3 are combined by hoisting\r | |
442 | to create a prediction expression for rule1 the result is:\r | |
443 | \r | |
444 | if ( LA(1)==ID\r | |
445 | && ( isX(LATEXT(1) || !isX(LATEXT(1) ) ) { rule1(); ...\r | |
446 | \r | |
447 | This is inefficient, but more importantly, can lead to false\r | |
448 | assumptions that the predicate expression distinguishes the rule1\r | |
449 | alternative with some other alternative with lookahead ID. In\r | |
450 | MR11 one can write:\r | |
451 | \r | |
452 | #pred IsX <<isX(LATEXT(1))>>?\r | |
453 | \r | |
454 | ...\r | |
455 | \r | |
456 | rule2: <<IsX>>? ID ;\r | |
457 | rule3: <<!IsX>>? ID ;\r | |
458 | \r | |
459 | During hoisting MR11 recognizes this as a special case and\r | |
460 | eliminates the predicates. The result is a prediction\r | |
461 | expression like the following:\r | |
462 | \r | |
463 | if ( LA(1)==ID ) { rule1(); ...\r | |
464 | \r | |
465 | Please note that the following cases which appear to be equivalent\r | |
466 | *cannot* be simplified by MR11 during hoisting because the hoisting\r | |
467 | logic only checks for a "!" in the predicate action, not in the\r | |
468 | predicate expression for a predicate symbol.\r | |
469 | \r | |
470 | *Not* equivalent and is not simplified during hoisting:\r | |
471 | \r | |
472 | #pred IsX <<isX(LATEXT(1))>>?\r | |
473 | #pred NotX <<!isX(LATEXT(1))>>?\r | |
474 | ...\r | |
475 | rule2: <<IsX>>? ID ;\r | |
476 | rule3: <<NotX>>? ID ;\r | |
477 | \r | |
478 | *Not* equivalent and is not simplified during hoisting:\r | |
479 | \r | |
480 | #pred IsX <<isX(LATEXT(1))>>?\r | |
481 | #pred NotX !IsX\r | |
482 | ...\r | |
483 | rule2: <<IsX>>? ID ;\r | |
484 | rule3: <<NotX>>? ID ;\r | |
485 | \r | |
486 | Now we will discuss goal (b).\r | |
487 | \r | |
488 | When antlr discovers that there is a lookahead ambiguity between\r | |
489 | two alternatives it attempts to resolve the ambiguity by searching\r | |
490 | for predicates in both alternatives. In the past any predicate\r | |
491 | would do, even if the same one appeared in both alternatives:\r | |
492 | \r | |
493 | rule: <<p(LATEXT(1))>>? X\r | |
494 | | <<p(LATEXT(1))>>? X\r | |
495 | ;\r | |
496 | \r | |
497 | The #pred statement is a start towards solving this problem.\r | |
498 | During ambiguity resolution (*not* predicate hoisting) the\r | |
499 | predicates for the two alternatives are expanded and compared.\r | |
500 | Consider the following example:\r | |
501 | \r | |
502 | #pred Upper <<isUpper(LATEXT(1))>>?\r | |
503 | #pred Lower <<isLower(LATEXT(1))>>?\r | |
504 | #pred Alpha <<isAlpha(LATEXT(1))>>? Upper || Lower\r | |
505 | \r | |
506 | rule0: rule1\r | |
507 | | <<Alpha>>? ID\r | |
508 | ;\r | |
509 | \r | |
510 | rule1:\r | |
511 | | rule2\r | |
512 | | rule3\r | |
513 | ...\r | |
514 | ;\r | |
515 | \r | |
516 | rule2: <<Upper>>? ID;\r | |
517 | rule3: <<Lower>>? ID;\r | |
518 | \r | |
519 | The definition of #pred Alpha expresses:\r | |
520 | \r | |
521 | a. to test the predicate use the C code "isAlpha(LATEXT(1))"\r | |
522 | \r | |
523 | b. to analyze the predicate use the information that\r | |
524 | Alpha is equivalent to the union of Upper and Lower,\r | |
525 | \r | |
526 | During ambiguity resolution the definition of Alpha is expanded\r | |
527 | into "Upper || Lower" and compared with the predicate in the other\r | |
528 | alternative, which is also "Upper || Lower". Because they are\r | |
529 | identical MR11 will report a problem.\r | |
530 | \r | |
531 | -------------------------------------------------------------------------\r | |
532 | t10.g, line 5: warning: the predicates used to disambiguate rule rule0\r | |
533 | (file t10.g alt 1 line 5 and alt 2 line 6)\r | |
534 | are identical when compared without context and may have no\r | |
535 | resolving power for some lookahead sequences.\r | |
536 | -------------------------------------------------------------------------\r | |
537 | \r | |
538 | If you use the "-info p" option the output file will contain:\r | |
539 | \r | |
540 | +----------------------------------------------------------------------+\r | |
541 | |#if 0 |\r | |
542 | | |\r | |
543 | |The following predicates are identical when compared without |\r | |
544 | | lookahead context information. For some ambiguous lookahead |\r | |
545 | | sequences they may not have any power to resolve the ambiguity. |\r | |
546 | | |\r | |
547 | |Choice 1: rule0/1 alt 1 line 5 file t10.g |\r | |
548 | | |\r | |
549 | | The original predicate for choice 1 with available context |\r | |
550 | | information: |\r | |
551 | | |\r | |
552 | | OR expr |\r | |
553 | | |\r | |
554 | | pred << Upper>>? |\r | |
555 | | depth=k=1 rule rule2 line 14 t10.g |\r | |
556 | | set context: |\r | |
557 | | ID |\r | |
558 | | |\r | |
559 | | pred << Lower>>? |\r | |
560 | | depth=k=1 rule rule3 line 15 t10.g |\r | |
561 | | set context: |\r | |
562 | | ID |\r | |
563 | | |\r | |
564 | | The predicate for choice 1 after expansion (but without context |\r | |
565 | | information): |\r | |
566 | | |\r | |
567 | | OR expr |\r | |
568 | | |\r | |
569 | | pred << isUpper(LATEXT(1))>>? |\r | |
570 | | depth=k=1 rule line 1 t10.g |\r | |
571 | | |\r | |
572 | | pred << isLower(LATEXT(1))>>? |\r | |
573 | | depth=k=1 rule line 2 t10.g |\r | |
574 | | |\r | |
575 | | |\r | |
576 | |Choice 2: rule0/2 alt 2 line 6 file t10.g |\r | |
577 | | |\r | |
578 | | The original predicate for choice 2 with available context |\r | |
579 | | information: |\r | |
580 | | |\r | |
581 | | pred << Alpha>>? |\r | |
582 | | depth=k=1 rule rule0 line 6 t10.g |\r | |
583 | | set context: |\r | |
584 | | ID |\r | |
585 | | |\r | |
586 | | The predicate for choice 2 after expansion (but without context |\r | |
587 | | information): |\r | |
588 | | |\r | |
589 | | OR expr |\r | |
590 | | |\r | |
591 | | pred << isUpper(LATEXT(1))>>? |\r | |
592 | | depth=k=1 rule line 1 t10.g |\r | |
593 | | |\r | |
594 | | pred << isLower(LATEXT(1))>>? |\r | |
595 | | depth=k=1 rule line 2 t10.g |\r | |
596 | | |\r | |
597 | | |\r | |
598 | |#endif |\r | |
599 | +----------------------------------------------------------------------+\r | |
600 | \r | |
601 | The comparison of the predicates for the two alternatives takes\r | |
602 | place without context information, which means that in some cases\r | |
603 | the predicates will be considered identical even though they operate\r | |
604 | on disjoint lookahead sets. Consider:\r | |
605 | \r | |
606 | #pred Alpha\r | |
607 | \r | |
608 | rule1: <<Alpha>>? ID\r | |
609 | | <<Alpha>>? Label\r | |
610 | ;\r | |
611 | \r | |
612 | Because the comparison of predicates takes place without context\r | |
613 | these will be considered identical. The reason for comparing\r | |
614 | without context is that otherwise it would be necessary to re-evaluate\r | |
615 | the entire predicate expression for each possible lookahead sequence.\r | |
616 | This would require more code to be written and more CPU time during\r | |
617 | grammar analysis, and it is not yet clear whether anyone will even make\r | |
618 | use of the new #pred facility.\r | |
619 | \r | |
620 | A temporary workaround might be to use different #pred statements\r | |
621 | for predicates you know have different context. This would avoid\r | |
622 | extraneous warnings.\r | |
623 | \r | |
624 | The above example might be termed a "false positive". Comparison\r | |
625 | without context will also lead to "false negatives". Consider the\r | |
626 | following example:\r | |
627 | \r | |
628 | #pred Alpha\r | |
629 | #pred Beta\r | |
630 | \r | |
631 | rule1: <<Alpha>>? A\r | |
632 | | rule2\r | |
633 | ;\r | |
634 | \r | |
635 | rule2: <<Alpha>>? A\r | |
636 | | <<Beta>>? B\r | |
637 | ;\r | |
638 | \r | |
639 | The predicate used for alt 2 of rule1 is (Alpha || Beta). This\r | |
640 | appears to be different than the predicate Alpha used for alt1.\r | |
641 | However, the context of Beta is B. Thus when the lookahead is A\r | |
642 | Beta will have no resolving power and Alpha will be used for both\r | |
643 | alternatives. Using the same predicate for both alternatives isn't\r | |
644 | very helpful, but this will not be detected with 1.33MR11.\r | |
645 | \r | |
646 | To properly handle this the predicate expression would have to be\r | |
647 | evaluated for each distinct lookahead context.\r | |
648 | \r | |
649 | To determine whether two predicate expressions are identical is\r | |
650 | difficult. The routine may fail to identify identical predicates.\r | |
651 | \r | |
652 | The #pred feature also compares predicates to see if a choice between\r | |
653 | alternatives which is resolved by a predicate which makes the second\r | |
654 | choice unreachable. Consider the following example:\r | |
655 | \r | |
656 | #pred A <<A(LATEXT(1)>>?\r | |
657 | #pred B <<B(LATEXT(1)>>?\r | |
658 | #pred A_or_B A || B\r | |
659 | \r | |
660 | r : s\r | |
661 | | t\r | |
662 | ;\r | |
663 | s : <<A_or_B>>? ID\r | |
664 | ;\r | |
665 | t : <<A>>? ID\r | |
666 | ;\r | |
667 | \r | |
668 | ----------------------------------------------------------------------------\r | |
669 | t11.g, line 5: warning: the predicate used to disambiguate the\r | |
670 | first choice of rule r\r | |
671 | (file t11.g alt 1 line 5 and alt 2 line 6)\r | |
672 | appears to "cover" the second predicate when compared without context.\r | |
673 | The second predicate may have no resolving power for some lookahead\r | |
674 | sequences.\r | |
675 | ----------------------------------------------------------------------------\r | |
676 | \r | |
677 | #139. (Changed in MR11) Problem with -gp in C++ mode\r | |
678 | \r | |
679 | The -gp option to add a prefix to rule names did not work in\r | |
680 | C++ mode. This has been fixed.\r | |
681 | \r | |
682 | Reported by Alexey Demakov (demakov@kazbek.ispras.ru).\r | |
683 | \r | |
684 | #138. (Changed in MR11) Additional makefiles for non-MSVC++ MS systems\r | |
685 | \r | |
686 | Sramji Ramanathan (ps@kumaran.com) has supplied makefiles for\r | |
687 | building antlr and dlg with Win95/NT development tools that\r | |
688 | are not based on MSVC5. They are pccts/antlr/AntlrMS.mak and\r | |
689 | pccts/dlg/DlgMS.mak.\r | |
690 | \r | |
691 | The first line of the makefiles require a definition of PCCTS_HOME.\r | |
692 | \r | |
693 | These are in additiion to the AntlrMSVC50.* and DlgMSVC50.*\r | |
694 | supplied by Jeff Vincent (JVincent@novell.com).\r | |
695 | \r | |
696 | #137. (Changed in MR11) Token getType(), getText(), getLine() const members\r | |
697 | \r | |
698 | --------------------------------------------------------------------\r | |
699 | If you use ANTLRCommonToken this change probably does not affect you.\r | |
700 | --------------------------------------------------------------------\r | |
701 | \r | |
702 | For a long time it has bothered me that these accessor functions\r | |
703 | in ANTLRAbstractToken were not const member functions. I have\r | |
704 | refrained from changing them because it require users to modify\r | |
705 | existing token class definitions which are derived directly\r | |
706 | from ANTLRAbstractToken. I think it is now time.\r | |
707 | \r | |
708 | For those who are not used to C++, a "const member function" is a\r | |
709 | member function which does not modify its own object - the thing\r | |
710 | to which "this" points. This is quite different from a function\r | |
711 | which does not modify its arguments\r | |
712 | \r | |
713 | Most token definitions based on ANTLRAbstractToken have something like\r | |
714 | the following in order to create concrete definitions of the pure\r | |
715 | virtual methods in ANTLRAbstractToken:\r | |
716 | \r | |
717 | class MyToken : public ANTLRAbstractToken {\r | |
718 | ...\r | |
719 | ANTLRTokenType getType() {return _type; }\r | |
720 | int getLine() {return _line; }\r | |
721 | ANTLRChar * getText() {return _text; }\r | |
722 | ...\r | |
723 | }\r | |
724 | \r | |
725 | The required change is simply to put "const" following the function\r | |
726 | prototype in the header (.h file) and the definition file (.cpp if\r | |
727 | it is not inline):\r | |
728 | \r | |
729 | class MyToken : public ANTLRAbstractToken {\r | |
730 | ...\r | |
731 | ANTLRTokenType getType() const {return _type; }\r | |
732 | int getLine() const {return _line; }\r | |
733 | ANTLRChar * getText() const {return _text; }\r | |
734 | ...\r | |
735 | }\r | |
736 | \r | |
737 | This was originally proposed a long time ago by Bruce\r | |
738 | Guenter (bruceg@qcc.sk.ca).\r | |
739 | \r | |
740 | #136. (Changed in MR11) Added getLength() to ANTLRCommonToken\r | |
741 | \r | |
742 | Classes ANTLRCommonToken and ANTLRCommonTokenNoRefCountToken\r | |
743 | now have a member function:\r | |
744 | \r | |
745 | int getLength() const { return strlen(getText()) }\r | |
746 | \r | |
747 | Suggested by Sramji Ramanathan (ps@kumaran.com).\r | |
748 | \r | |
749 | #135. (Changed in MR11) Raised antlr's own default ZZLEXBUFSIZE to 8k\r | |
750 | \r | |
751 | #134a. (ansi_mr10.zip) T.J. Parr's ANSI C grammar made 1.33MR11 compatible\r | |
752 | \r | |
753 | There is a typographical error in the definition of BITWISEOREQ:\r | |
754 | \r | |
755 | #token BITWISEOREQ "!=" should be "\|="\r | |
756 | \r | |
757 | When this change is combined with the bugfix to the follow set cache\r | |
758 | problem (Item #147) and a minor rearrangement of the grammar\r | |
759 | (Item #134b) it becomes a k=1 ck=2 grammar.\r | |
760 | \r | |
761 | #134b. (ansi_mr10.zip) T.J. Parr's ANSI C grammar made 1.33MR11 compatible\r | |
762 | \r | |
763 | The following changes were made in the ansi.g grammar (along with\r | |
764 | using -mrhoist on):\r | |
765 | \r | |
766 | ansi.g\r | |
767 | ======\r | |
768 | void tracein(char *) ====> void tracein(const char *)\r | |
769 | void traceout(char *) ====> void traceout(const char *)\r | |
770 | \r | |
771 | <LT(1)->getType()==IDENTIFIER ? isTypeName(LT(1)->getText()) : 1>>?\r | |
772 | ====> <<isTypeName(LT(1)->getText())>>?\r | |
773 | \r | |
774 | <<(LT(1)->getType()==LPARENTHESIS && LT(2)->getType()==IDENTIFIER) ? \\r | |
775 | isTypeName(LT(2)->getText()) : 1>>?\r | |
776 | ====> (LPARENTHESIS IDENTIFIER)? => <<isTypeName(LT(2)->getText())>>?\r | |
777 | \r | |
778 | <<(LT(1)->getType()==LPARENTHESIS && LT(2)->getType()==IDENTIFIER) ? \\r | |
779 | isTypeName(LT(2)->getText()) : 1>>?\r | |
780 | ====> (LPARENTHESIS IDENTIFIER)? => <<isTypeName(LT(2)->getText())>>?\r | |
781 | \r | |
782 | added to init(): traceOptionValueDefault=0;\r | |
783 | added to init(): traceOption(-1);\r | |
784 | \r | |
785 | change rule "statement":\r | |
786 | \r | |
787 | statement\r | |
788 | : plain_label_statement\r | |
789 | | case_label_statement\r | |
790 | | <<;>> expression SEMICOLON\r | |
791 | | compound_statement\r | |
792 | | selection_statement\r | |
793 | | iteration_statement\r | |
794 | | jump_statement\r | |
795 | | SEMICOLON\r | |
796 | ;\r | |
797 | \r | |
798 | plain_label_statement\r | |
799 | : IDENTIFIER COLON statement\r | |
800 | ;\r | |
801 | \r | |
802 | case_label_statement\r | |
803 | : CASE constant_expression COLON statement\r | |
804 | | DEFAULT COLON statement\r | |
805 | ;\r | |
806 | \r | |
807 | support.cpp\r | |
808 | ===========\r | |
809 | void tracein(char *) ====> void tracein(const char *)\r | |
810 | void traceout(char *) ====> void traceout(const char *)\r | |
811 | \r | |
812 | added to tracein(): ANTLRParser::tracein(r); // call superclass method\r | |
813 | added to traceout(): ANTLRParser::traceout(r); // call superclass method\r | |
814 | \r | |
815 | Makefile\r | |
816 | ========\r | |
817 | added to AFLAGS: -mrhoist on -prc on\r | |
818 | \r | |
819 | #133. (Changed in 1.33MR11) Make trace options public in ANTLRParser\r | |
820 | \r | |
821 | In checking T.J. Parr's ANSI C grammar for compatibility with\r | |
822 | 1.33MR11 discovered that it was inconvenient to have the\r | |
823 | trace facilities with protected access.\r | |
824 | \r | |
825 | #132. (Changed in 1.33MR11) Recognition of identical predicates in alts\r | |
826 | \r | |
827 | Prior to 1.33MR11, there would be no ambiguity warning when the\r | |
828 | very same predicate was used to disambiguate both alternatives:\r | |
829 | \r | |
830 | test: ref B\r | |
831 | | ref C\r | |
832 | ;\r | |
833 | \r | |
834 | ref : <<pred(LATEXT(1)>>? A\r | |
835 | \r | |
836 | In 1.33MR11 this will cause the warning:\r | |
837 | \r | |
838 | warning: the predicates used to disambiguate rule test\r | |
839 | (file v98.g alt 1 line 1 and alt 2 line 2)\r | |
840 | are identical and have no resolving power\r | |
841 | \r | |
842 | ----------------- Note -----------------\r | |
843 | \r | |
844 | This is different than the following case\r | |
845 | \r | |
846 | test: <<pred(LATEXT(1))>>? A B\r | |
847 | | <<pred(LATEXT(1)>>? A C\r | |
848 | ;\r | |
849 | \r | |
850 | In this case there are two distinct predicates\r | |
851 | which have exactly the same text. In the first\r | |
852 | example there are two references to the same\r | |
853 | predicate. The problem represented by this\r | |
854 | grammar will be addressed later.\r | |
855 | \r | |
856 | #131. (Changed in 1.33MR11) Case insensitive command line options\r | |
857 | \r | |
858 | Command line switches like "-CC" and keywords like "on", "off",\r | |
859 | and "stdin" are no longer case sensitive in antlr, dlg, and sorcerer.\r | |
860 | \r | |
861 | #130. (Changed in 1.33MR11) Changed ANTLR_VERSION to int from string\r | |
862 | \r | |
863 | The ANTLR_VERSION was not an integer, making it difficult to\r | |
864 | perform conditional compilation based on the antlr version.\r | |
865 | \r | |
866 | Henceforth, ANTLR_VERSION will be:\r | |
867 | \r | |
868 | (base_version * 10000) + release number\r | |
869 | \r | |
870 | thus 1.33MR11 will be: 133*100+11 = 13311\r | |
871 | \r | |
872 | Suggested by Rainer Janssen (Rainer.Janssen@Informatik.Uni-Oldenburg.DE).\r | |
873 | \r | |
874 | #129. (Changed in 1.33MR11) Addition of ANTLR_VERSION to <parserName>.h\r | |
875 | \r | |
876 | The following code is now inserted into <parserName>.h amd\r | |
877 | stdpccts.h:\r | |
878 | \r | |
879 | #ifndef ANTLR_VERSION\r | |
880 | #define ANTLR_VERSION 13311\r | |
881 | #endif\r | |
882 | \r | |
883 | Suggested by Rainer Janssen (Rainer.Janssen@Informatik.Uni-Oldenburg.DE)\r | |
884 | \r | |
885 | #128. (Changed in 1.33MR11) Redundant predicate code in (<<pred>>? ...)+\r | |
886 | \r | |
887 | Prior to 1.33MR11, the following grammar would generate\r | |
888 | redundant tests for the "while" condition.\r | |
889 | \r | |
890 | rule2 : (<<pred>>? X)+ X\r | |
891 | | B\r | |
892 | ;\r | |
893 | \r | |
894 | The code would resemble:\r | |
895 | \r | |
896 | if (LA(1)==X) {\r | |
897 | if (pred) {\r | |
898 | do {\r | |
899 | if (!pred) {zzfailed_pred(" pred");}\r | |
900 | zzmatch(X); zzCONSUME;\r | |
901 | } while (LA(1)==X && pred && pred);\r | |
902 | } else {...\r | |
903 | \r | |
904 | With 1.33MR11 the redundant predicate test is omitted.\r | |
905 | \r | |
906 | #127. (Changed in 1.33MR11)\r | |
907 | \r | |
908 | Count Syntax Errors Count DLG Errors\r | |
909 | ------------------- ----------------\r | |
910 | \r | |
911 | C++ mode ANTLRParser:: DLGLexerBase::\r | |
912 | syntaxErrCount lexErrCount\r | |
913 | C mode zzSyntaxErrCount zzLexErrCount\r | |
914 | \r | |
915 | The C mode variables are global and initialized to 0.\r | |
916 | They are *not* reset to 0 automatically when antlr is\r | |
917 | restarted.\r | |
918 | \r | |
919 | The C++ mode variables are public. They are initialized\r | |
920 | to 0 by the constructors. They are *not* reset to 0 by the\r | |
921 | ANTLRParser::init() method.\r | |
922 | \r | |
923 | Suggested by Reinier van den Born (reinier@vnet.ibm.com).\r | |
924 | \r | |
925 | #126. (Changed in 1.33MR11) Addition of #first <<...>>\r | |
926 | \r | |
927 | The #first <<...>> inserts the specified text in the output\r | |
928 | files before any other #include statements required by pccts.\r | |
929 | The only things before the #first text are comments and\r | |
930 | a #define ANTLR_VERSION.\r | |
931 | \r | |
932 | Requested by and Esa Pulkkinen (esap@cs.tut.fi) and Alexin\r | |
933 | Zoltan (alexin@inf.u-szeged.hu).\r | |
934 | \r | |
935 | #125. (Changed in 1.33MR11) Lookahead for (guard)? && <<p>>? predicates\r | |
936 | \r | |
937 | When implementing the new style of guard predicate (Item #113)\r | |
938 | in 1.33MR10 I decided to temporarily ignore the problem of\r | |
939 | computing the "narrowest" lookahead context.\r | |
940 | \r | |
941 | Consider the following k=1 grammar:\r | |
942 | \r | |
943 | start : a\r | |
944 | | b\r | |
945 | ;\r | |
946 | \r | |
947 | a : (A)? && <<pred1(LATEXT(1))>>? ab ;\r | |
948 | b : (B)? && <<pred2(LATEXT(1))>>? ab ;\r | |
949 | \r | |
950 | ab : A | B ;\r | |
951 | \r | |
952 | In MR10 the context for both "a" and "b" was {A B} because this is\r | |
953 | the first set of rule "ab". Normally, this is not a problem because\r | |
954 | the predicate which follows the guard inhibits any ambiguity report\r | |
955 | by antlr.\r | |
956 | \r | |
957 | In MR11 the first set for rule "a" is {A} and for rule "b" it is {B}.\r | |
958 | \r | |
959 | #124. A Note on the New "&&" Style Guarded Predicates\r | |
960 | \r | |
961 | I've been asked several times, "What is the difference between\r | |
962 | the old "=>" style guard predicates and the new style "&&" guard\r | |
963 | predicates, and how do you choose one over the other" ?\r | |
964 | \r | |
965 | The main difference is that the "=>" does not apply the\r | |
966 | predicate if the context guard doesn't match, whereas\r | |
967 | the && form always does. What is the significance ?\r | |
968 | \r | |
969 | If you have a predicate which is not on the "leading edge"\r | |
970 | it cannot be hoisted. Suppose you need a predicate that\r | |
971 | looks at LA(2). You must introduce it manually. The\r | |
972 | classic example is:\r | |
973 | \r | |
974 | castExpr :\r | |
975 | LP typeName RP\r | |
976 | | ....\r | |
977 | ;\r | |
978 | \r | |
979 | typeName : <<isTypeName(LATEXT(1))>>? ID\r | |
980 | | STRUCT ID\r | |
981 | ;\r | |
982 | \r | |
983 | The problem is that typeName isn't on the leading edge\r | |
984 | of castExpr, so the predicate isTypeName won't be hoisted into\r | |
985 | castExpr to help make a decision on which production to choose.\r | |
986 | \r | |
987 | The *first* attempt to fix it is this:\r | |
988 | \r | |
989 | castExpr :\r | |
990 | <<isTypeName(LATEXT(2))>>?\r | |
991 | LP typeName RP\r | |
992 | | ....\r | |
993 | ;\r | |
994 | \r | |
995 | Unfortunately, this won't work because it ignores\r | |
996 | the problem of STRUCT. The solution is to apply\r | |
997 | isTypeName() in castExpr if LA(2) is an ID and\r | |
998 | don't apply it when LA(2) is STRUCT:\r | |
999 | \r | |
1000 | castExpr :\r | |
1001 | (LP ID)? => <<isTypeName(LATEXT(2))>>?\r | |
1002 | LP typeName RP\r | |
1003 | | ....\r | |
1004 | ;\r | |
1005 | \r | |
1006 | In conclusion, the "=>" style guarded predicate is\r | |
1007 | useful when:\r | |
1008 | \r | |
1009 | a. the tokens required for the predicate\r | |
1010 | are not on the leading edge\r | |
1011 | b. there are alternatives in the expression\r | |
1012 | selected by the predicate for which the\r | |
1013 | predicate is inappropriate\r | |
1014 | \r | |
1015 | If (b) were false, then one could use a simple\r | |
1016 | predicate (assuming "-prc on"):\r | |
1017 | \r | |
1018 | castExpr :\r | |
1019 | <<isTypeName(LATEXT(2))>>?\r | |
1020 | LP typeName RP\r | |
1021 | | ....\r | |
1022 | ;\r | |
1023 | \r | |
1024 | typeName : <<isTypeName(LATEXT(1))>>? ID\r | |
1025 | ;\r | |
1026 | \r | |
1027 | So, when do you use the "&&" style guarded predicate ?\r | |
1028 | \r | |
1029 | The new-style "&&" predicate should always be used with\r | |
1030 | predicate context. The context guard is in ADDITION to\r | |
1031 | the automatically computed context. Thus it useful for\r | |
1032 | predicates which depend on the token type for reasons\r | |
1033 | other than context.\r | |
1034 | \r | |
1035 | The following example is contributed by Reinier van den Born\r | |
1036 | (reinier@vnet.ibm.com).\r | |
1037 | \r | |
1038 | +-------------------------------------------------------------------------+\r | |
1039 | | This grammar has two ways to call functions: |\r | |
1040 | | |\r | |
1041 | | - a "standard" call syntax with parens and comma separated args |\r | |
1042 | | - a shell command like syntax (no parens and spacing separated args) |\r | |
1043 | | |\r | |
1044 | | The former also allows a variable to hold the name of the function, |\r | |
1045 | | the latter can also be used to call external commands. |\r | |
1046 | | |\r | |
1047 | | The grammar (simplified) looks like this: |\r | |
1048 | | |\r | |
1049 | | fun_call : ID "(" { expr ("," expr)* } ")" |\r | |
1050 | | /* ID is function name */ |\r | |
1051 | | | "@" ID "(" { expr ("," expr)* } ")" |\r | |
1052 | | /* ID is var containing fun name */ |\r | |
1053 | | ; |\r | |
1054 | | |\r | |
1055 | | command : ID expr* /* ID is function name */ |\r | |
1056 | | | path expr* /* path is external command name */ |\r | |
1057 | | ; |\r | |
1058 | | |\r | |
1059 | | path : ID /* left out slashes and such */ |\r | |
1060 | | | "@" ID /* ID is environment var */ |\r | |
1061 | | ; |\r | |
1062 | | |\r | |
1063 | | expr : .... |\r | |
1064 | | | "(" expr ")"; |\r | |
1065 | | |\r | |
1066 | | call : fun_call |\r | |
1067 | | | command |\r | |
1068 | | ; |\r | |
1069 | | |\r | |
1070 | | Obviously the call is wildly ambiguous. This is more or less how this |\r | |
1071 | | is to be resolved: |\r | |
1072 | | |\r | |
1073 | | A call begins with an ID or an @ followed by an ID. |\r | |
1074 | | |\r | |
1075 | | If it is an ID and if it is an ext. command name -> command |\r | |
1076 | | if followed by a paren -> fun_call |\r | |
1077 | | otherwise -> command |\r | |
1078 | | |\r | |
1079 | | If it is an @ and if the ID is a var name -> fun_call |\r | |
1080 | | otherwise -> command |\r | |
1081 | | |\r | |
1082 | | One can implement these rules quite neatly using && predicates: |\r | |
1083 | | |\r | |
1084 | | call : ("@" ID)? && <<isVarName(LT(2))>>? fun_call |\r | |
1085 | | | (ID)? && <<isExtCmdName>>? command |\r | |
1086 | | | (ID "(")? fun_call |\r | |
1087 | | | command |\r | |
1088 | | ; |\r | |
1089 | | |\r | |
1090 | | This can be done better, so it is not an ideal example, but it |\r | |
1091 | | conveys the principle. |\r | |
1092 | +-------------------------------------------------------------------------+\r | |
1093 | \r | |
1094 | #123. (Changed in 1.33MR11) Correct definition of operators in ATokPtr.h\r | |
1095 | \r | |
1096 | The return value of operators in ANTLRTokenPtr:\r | |
1097 | \r | |
1098 | changed: unsigned ... operator !=(...)\r | |
1099 | to: int ... operator != (...)\r | |
1100 | changed: unsigned ... operator ==(...)\r | |
1101 | to: int ... operator == (...)\r | |
1102 | \r | |
1103 | Suggested by R.A. Nelson (cowboy@VNET.IBM.COM)\r | |
1104 | \r | |
1105 | #122. (Changed in 1.33MR11) Member functions to reset DLG in C++ mode\r | |
1106 | \r | |
1107 | void DLGFileReset(FILE *f) { input = f; found_eof = 0; }\r | |
1108 | void DLGStringReset(DLGChar *s) { input = s; p = &input[0]; }\r | |
1109 | \r | |
1110 | Supplied by R.A. Nelson (cowboy@VNET.IBM.COM)\r | |
1111 | \r | |
1112 | #121. (Changed in 1.33MR11) Another attempt to fix -o (output dir) option\r | |
1113 | \r | |
1114 | Another attempt is made to improve the -o option of antlr, dlg,\r | |
1115 | and sorcerer. This one by JVincent (JVincent@novell.com).\r | |
1116 | \r | |
1117 | The current rule:\r | |
1118 | \r | |
1119 | a. If -o is not specified than any explicit directory\r | |
1120 | names are retained.\r | |
1121 | \r | |
1122 | b. If -o is specified than the -o directory name overrides any\r | |
1123 | explicit directory names.\r | |
1124 | \r | |
1125 | c. The directory name of the grammar file is *not* stripped\r | |
1126 | to create the main output file. However it is stil subject\r | |
1127 | to override by the -o directory name.\r | |
1128 | \r | |
1129 | #120. (Changed in 1.33MR11) "-info f" output to stdout rather than stderr\r | |
1130 | \r | |
1131 | Added option 0 (e.g. "-info 0") which is a noop.\r | |
1132 | \r | |
1133 | #119. (Changed in 1.33MR11) Ambiguity aid for grammars\r | |
1134 | \r | |
1135 | The user can ask for additional information on ambiguities reported\r | |
1136 | by antlr to stdout. At the moment, only one ambiguity report can\r | |
1137 | be created in an antlr run.\r | |
1138 | \r | |
1139 | This feature is enabled using the "-aa" (Ambiguity Aid) option.\r | |
1140 | \r | |
1141 | The following options control the reporting of ambiguities:\r | |
1142 | \r | |
1143 | -aa ruleName Selects reporting by name of rule\r | |
1144 | -aa lineNumber Selects reporting by line number\r | |
1145 | (file name not compared)\r | |
1146 | \r | |
1147 | -aam Selects "multiple" reporting for a token\r | |
1148 | in the intersection set of the\r | |
1149 | alternatives.\r | |
1150 | \r | |
1151 | For instance, the token ID may appear dozens\r | |
1152 | of times in various paths as the program\r | |
1153 | explores the rules which are reachable from\r | |
1154 | the point of an ambiguity. With option -aam\r | |
1155 | every possible path the search program\r | |
1156 | encounters is reported.\r | |
1157 | \r | |
1158 | Without -aam only the first encounter is\r | |
1159 | reported. This may result in incomplete\r | |
1160 | information, but the information may be\r | |
1161 | sufficient and much shorter.\r | |
1162 | \r | |
1163 | -aad depth Selects the depth of the search.\r | |
1164 | The default value is 1.\r | |
1165 | \r | |
1166 | The number of paths to be searched, and the\r | |
1167 | size of the report can grow geometrically\r | |
1168 | with the -ck value if a full search for all\r | |
1169 | contributions to the source of the ambiguity\r | |
1170 | is explored.\r | |
1171 | \r | |
1172 | The depth represents the number of tokens\r | |
1173 | in the lookahead set which are matched against\r | |
1174 | the set of ambiguous tokens. A depth of 1\r | |
1175 | means that the search stops when a lookahead\r | |
1176 | sequence of just one token is matched.\r | |
1177 | \r | |
1178 | A k=1 ck=6 grammar might generate 5,000 items\r | |
1179 | in a report if a full depth 6 search is made\r | |
1180 | with the Ambiguity Aid. The source of the\r | |
1181 | problem may be in the first token and obscured\r | |
1182 | by the volume of data - I hesitate to call\r | |
1183 | it information.\r | |
1184 | \r | |
1185 | When the user selects a depth > 1, the search\r | |
1186 | is first performed at depth=1 for both\r | |
1187 | alternatives, then depth=2 for both alternatives,\r | |
1188 | etc.\r | |
1189 | \r | |
1190 | Sample output for rule grammar in antlr.g itself:\r | |
1191 | \r | |
1192 | +---------------------------------------------------------------------+\r | |
1193 | | Ambiguity Aid |\r | |
1194 | | |\r | |
1195 | | Choice 1: grammar/70 line 632 file a.g |\r | |
1196 | | Choice 2: grammar/82 line 644 file a.g |\r | |
1197 | | |\r | |
1198 | | Intersection of lookahead[1] sets: |\r | |
1199 | | |\r | |
1200 | | "\}" "class" "#errclass" "#tokclass" |\r | |
1201 | | |\r | |
1202 | | Choice:1 Depth:1 Group:1 ("#errclass") |\r | |
1203 | | 1 in (...)* block grammar/70 line 632 a.g |\r | |
1204 | | 2 to error grammar/73 line 635 a.g |\r | |
1205 | | 3 error error/1 line 894 a.g |\r | |
1206 | | 4 #token "#errclass" error/2 line 895 a.g |\r | |
1207 | | |\r | |
1208 | | Choice:1 Depth:1 Group:2 ("#tokclass") |\r | |
1209 | | 2 to tclass grammar/74 line 636 a.g |\r | |
1210 | | 3 tclass tclass/1 line 937 a.g |\r | |
1211 | | 4 #token "#tokclass" tclass/2 line 938 a.g |\r | |
1212 | | |\r | |
1213 | | Choice:1 Depth:1 Group:3 ("class") |\r | |
1214 | | 2 to class_def grammar/75 line 637 a.g |\r | |
1215 | | 3 class_def class_def/1 line 669 a.g |\r | |
1216 | | 4 #token "class" class_def/3 line 671 a.g |\r | |
1217 | | |\r | |
1218 | | Choice:1 Depth:1 Group:4 ("\}") |\r | |
1219 | | 2 #token "\}" grammar/76 line 638 a.g |\r | |
1220 | | |\r | |
1221 | | Choice:2 Depth:1 Group:5 ("#errclass") |\r | |
1222 | | 1 in (...)* block grammar/83 line 645 a.g |\r | |
1223 | | 2 to error grammar/93 line 655 a.g |\r | |
1224 | | 3 error error/1 line 894 a.g |\r | |
1225 | | 4 #token "#errclass" error/2 line 895 a.g |\r | |
1226 | | |\r | |
1227 | | Choice:2 Depth:1 Group:6 ("#tokclass") |\r | |
1228 | | 2 to tclass grammar/94 line 656 a.g |\r | |
1229 | | 3 tclass tclass/1 line 937 a.g |\r | |
1230 | | 4 #token "#tokclass" tclass/2 line 938 a.g |\r | |
1231 | | |\r | |
1232 | | Choice:2 Depth:1 Group:7 ("class") |\r | |
1233 | | 2 to class_def grammar/95 line 657 a.g |\r | |
1234 | | 3 class_def class_def/1 line 669 a.g |\r | |
1235 | | 4 #token "class" class_def/3 line 671 a.g |\r | |
1236 | | |\r | |
1237 | | Choice:2 Depth:1 Group:8 ("\}") |\r | |
1238 | | 2 #token "\}" grammar/96 line 658 a.g |\r | |
1239 | +---------------------------------------------------------------------+\r | |
1240 | \r | |
1241 | For a linear lookahead set ambiguity (where k=1 or for k>1 but\r | |
1242 | when all lookahead sets [i] with i<k all have degree one) the\r | |
1243 | reports appear in the following order:\r | |
1244 | \r | |
1245 | for (depth=1 ; depth <= "-aad depth" ; depth++) {\r | |
1246 | for (alternative=1; alternative <=2 ; alternative++) {\r | |
1247 | while (matches-are-found) {\r | |
1248 | group++;\r | |
1249 | print-report\r | |
1250 | };\r | |
1251 | };\r | |
1252 | };\r | |
1253 | \r | |
1254 | For reporting a k-tuple ambiguity, the reports appear in the\r | |
1255 | following order:\r | |
1256 | \r | |
1257 | for (depth=1 ; depth <= "-aad depth" ; depth++) {\r | |
1258 | while (matches-are-found) {\r | |
1259 | for (alternative=1; alternative <=2 ; alternative++) {\r | |
1260 | group++;\r | |
1261 | print-report\r | |
1262 | };\r | |
1263 | };\r | |
1264 | };\r | |
1265 | \r | |
1266 | This is because matches are generated in different ways for\r | |
1267 | linear lookahead and k-tuples.\r | |
1268 | \r | |
1269 | #118. (Changed in 1.33MR11) DEC VMS makefile and VMS related changes\r | |
1270 | \r | |
1271 | Revised makefiles for DEC/VMS operating system for antlr, dlg,\r | |
1272 | and sorcerer.\r | |
1273 | \r | |
1274 | Reduced names of routines with external linkage to less than 32\r | |
1275 | characters to conform to DEC/VMS linker limitations.\r | |
1276 | \r | |
1277 | Jean-Francois Pieronne discovered problems with dlg and antlr\r | |
1278 | due to the VMS linker not being case sensitive for names with\r | |
1279 | external linkage. In dlg the problem was with "className" and\r | |
1280 | "ClassName". In antlr the problem was with "GenExprSets" and\r | |
1281 | "genExprSets".\r | |
1282 | \r | |
1283 | Added genmms, a version of genmk for the DEC/VMS version of make.\r | |
1284 | The source is in directory pccts/support/DECmms.\r | |
1285 | \r | |
1286 | All VMS contributions by Jean-Francois Pieronne (jfp@iname.com).\r | |
1287 | \r | |
1288 | #117. (Changed in 1.33MR10) new EXPERIMENTAL predicate hoisting code\r | |
1289 | \r | |
1290 | The hoisting of predicates into rules to create prediction\r | |
1291 | expressions is a problem in antlr. Consider the following\r | |
1292 | example (k=1 with -prc on):\r | |
1293 | \r | |
1294 | start : (a)* "@" ;\r | |
1295 | a : b | c ;\r | |
1296 | b : <<isUpper(LATEXT(1))>>? A ;\r | |
1297 | c : A ;\r | |
1298 | \r | |
1299 | Prior to 1.33MR10 the code generated for "start" would resemble:\r | |
1300 | \r | |
1301 | while {\r | |
1302 | if (LA(1)==A &&\r | |
1303 | (!LA(1)==A || isUpper())) {\r | |
1304 | a();\r | |
1305 | }\r | |
1306 | };\r | |
1307 | \r | |
1308 | This code is wrong because it makes rule "c" unreachable from\r | |
1309 | "start". The essence of the problem is that antlr fails to\r | |
1310 | recognize that there can be a valid alternative within "a" even\r | |
1311 | when the predicate <<isUpper(LATEXT(1))>>? is false.\r | |
1312 | \r | |
1313 | In 1.33MR10 with -mrhoist the hoisting of the predicate into\r | |
1314 | "start" is suppressed because it recognizes that "c" can\r | |
1315 | cover all the cases where the predicate is false:\r | |
1316 | \r | |
1317 | while {\r | |
1318 | if (LA(1)==A) {\r | |
1319 | a();\r | |
1320 | }\r | |
1321 | };\r | |
1322 | \r | |
1323 | With the antlr "-info p" switch the user will receive information\r | |
1324 | about the predicate suppression in the generated file:\r | |
1325 | \r | |
1326 | --------------------------------------------------------------\r | |
1327 | #if 0\r | |
1328 | \r | |
1329 | Hoisting of predicate suppressed by alternative without predicate.\r | |
1330 | The alt without the predicate includes all cases where\r | |
1331 | the predicate is false.\r | |
1332 | \r | |
1333 | WITH predicate: line 7 v1.g\r | |
1334 | WITHOUT predicate: line 7 v1.g\r | |
1335 | \r | |
1336 | The context set for the predicate:\r | |
1337 | \r | |
1338 | A\r | |
1339 | \r | |
1340 | The lookahead set for the alt WITHOUT the semantic predicate:\r | |
1341 | \r | |
1342 | A\r | |
1343 | \r | |
1344 | The predicate:\r | |
1345 | \r | |
1346 | pred << isUpper(LATEXT(1))>>?\r | |
1347 | depth=k=1 rule b line 9 v1.g\r | |
1348 | set context:\r | |
1349 | A\r | |
1350 | tree context: null\r | |
1351 | \r | |
1352 | Chain of referenced rules:\r | |
1353 | \r | |
1354 | #0 in rule start (line 5 v1.g) to rule a\r | |
1355 | #1 in rule a (line 7 v1.g)\r | |
1356 | \r | |
1357 | #endif\r | |
1358 | --------------------------------------------------------------\r | |
1359 | \r | |
1360 | A predicate can be suppressed by a combination of alternatives\r | |
1361 | which, taken together, cover a predicate:\r | |
1362 | \r | |
1363 | start : (a)* "@" ;\r | |
1364 | \r | |
1365 | a : b | ca | cb | cc ;\r | |
1366 | \r | |
1367 | b : <<isUpper(LATEXT(1))>>? ( A | B | C ) ;\r | |
1368 | \r | |
1369 | ca : A ;\r | |
1370 | cb : B ;\r | |
1371 | cc : C ;\r | |
1372 | \r | |
1373 | Consider a more complex example in which "c" covers only part of\r | |
1374 | a predicate:\r | |
1375 | \r | |
1376 | start : (a)* "@" ;\r | |
1377 | \r | |
1378 | a : b\r | |
1379 | | c\r | |
1380 | ;\r | |
1381 | \r | |
1382 | b : <<isUpper(LATEXT(1))>>?\r | |
1383 | ( A\r | |
1384 | | X\r | |
1385 | );\r | |
1386 | \r | |
1387 | c : A\r | |
1388 | ;\r | |
1389 | \r | |
1390 | Prior to 1.33MR10 the code generated for "start" would resemble:\r | |
1391 | \r | |
1392 | while {\r | |
1393 | if ( (LA(1)==A || LA(1)==X) &&\r | |
1394 | (! (LA(1)==A || LA(1)==X) || isUpper()) {\r | |
1395 | a();\r | |
1396 | }\r | |
1397 | };\r | |
1398 | \r | |
1399 | With 1.33MR10 and -mrhoist the predicate context is restricted to\r | |
1400 | the non-covered lookahead. The code resembles:\r | |
1401 | \r | |
1402 | while {\r | |
1403 | if ( (LA(1)==A || LA(1)==X) &&\r | |
1404 | (! (LA(1)==X) || isUpper()) {\r | |
1405 | a();\r | |
1406 | }\r | |
1407 | };\r | |
1408 | \r | |
1409 | With the antlr "-info p" switch the user will receive information\r | |
1410 | about the predicate restriction in the generated file:\r | |
1411 | \r | |
1412 | --------------------------------------------------------------\r | |
1413 | #if 0\r | |
1414 | \r | |
1415 | Restricting the context of a predicate because of overlap\r | |
1416 | in the lookahead set between the alternative with the\r | |
1417 | semantic predicate and one without\r | |
1418 | Without this restriction the alternative without the predicate\r | |
1419 | could not be reached when input matched the context of the\r | |
1420 | predicate and the predicate was false.\r | |
1421 | \r | |
1422 | WITH predicate: line 11 v4.g\r | |
1423 | WITHOUT predicate: line 12 v4.g\r | |
1424 | \r | |
1425 | The original context set for the predicate:\r | |
1426 | \r | |
1427 | A X\r | |
1428 | \r | |
1429 | The lookahead set for the alt WITHOUT the semantic predicate:\r | |
1430 | \r | |
1431 | A\r | |
1432 | \r | |
1433 | The intersection of the two sets\r | |
1434 | \r | |
1435 | A\r | |
1436 | \r | |
1437 | The original predicate:\r | |
1438 | \r | |
1439 | pred << isUpper(LATEXT(1))>>?\r | |
1440 | depth=k=1 rule b line 15 v4.g\r | |
1441 | set context:\r | |
1442 | A X\r | |
1443 | tree context: null\r | |
1444 | \r | |
1445 | The new (modified) form of the predicate:\r | |
1446 | \r | |
1447 | pred << isUpper(LATEXT(1))>>?\r | |
1448 | depth=k=1 rule b line 15 v4.g\r | |
1449 | set context:\r | |
1450 | X\r | |
1451 | tree context: null\r | |
1452 | \r | |
1453 | #endif\r | |
1454 | --------------------------------------------------------------\r | |
1455 | \r | |
1456 | The bad news about -mrhoist:\r | |
1457 | \r | |
1458 | (a) -mrhoist does not analyze predicates with lookahead\r | |
1459 | depth > 1.\r | |
1460 | \r | |
1461 | (b) -mrhoist does not look past a guarded predicate to\r | |
1462 | find context which might cover other predicates.\r | |
1463 | \r | |
1464 | For these cases you might want to use syntactic predicates.\r | |
1465 | When a semantic predicate fails during guess mode the guess\r | |
1466 | fails and the next alternative is tried.\r | |
1467 | \r | |
1468 | Limitation (a) is illustrated by the following example:\r | |
1469 | \r | |
1470 | start : (stmt)* EOF ;\r | |
1471 | \r | |
1472 | stmt : cast\r | |
1473 | | expr\r | |
1474 | ;\r | |
1475 | cast : <<isTypename(LATEXT(2))>>? LP ID RP ;\r | |
1476 | \r | |
1477 | expr : LP ID RP ;\r | |
1478 | \r | |
1479 | This is not much different from the first example, except that\r | |
1480 | it requires two tokens of lookahead context to determine what\r | |
1481 | to do. This predicate is NOT suppressed because the current version\r | |
1482 | is unable to handle predicates with depth > 1.\r | |
1483 | \r | |
1484 | A predicate can be combined with other predicates during hoisting.\r | |
1485 | In those cases the depth=1 predicates are still handled. Thus,\r | |
1486 | in the following example the isUpper() predicate will be suppressed\r | |
1487 | by line #4 when hoisted from "bizarre" into "start", but will still\r | |
1488 | be present in "bizarre" in order to predict "stmt".\r | |
1489 | \r | |
1490 | start : (bizarre)* EOF ; // #1\r | |
1491 | // #2\r | |
1492 | bizarre : stmt // #3\r | |
1493 | | A // #4\r | |
1494 | ;\r | |
1495 | \r | |
1496 | stmt : cast\r | |
1497 | | expr\r | |
1498 | ;\r | |
1499 | \r | |
1500 | cast : <<isTypename(LATEXT(2))>>? LP ID RP ;\r | |
1501 | \r | |
1502 | expr : LP ID RP ;\r | |
1503 | | <<isUpper(LATEXT(1))>>? A\r | |
1504 | \r | |
1505 | Limitation (b) is illustrated by the following example of a\r | |
1506 | context guarded predicate:\r | |
1507 | \r | |
1508 | rule : (A)? <<p>>? // #1\r | |
1509 | (A // #2\r | |
1510 | |B // #3\r | |
1511 | ) // #4\r | |
1512 | | <<q>> B // #5\r | |
1513 | ;\r | |
1514 | \r | |
1515 | Recall that this means that when the lookahead is NOT A then\r | |
1516 | the predicate "p" is ignored and it attempts to match "A|B".\r | |
1517 | Ideally, the "B" at line #3 should suppress predicate "q".\r | |
1518 | However, the current version does not attempt to look past\r | |
1519 | the guard predicate to find context which might suppress other\r | |
1520 | predicates.\r | |
1521 | \r | |
1522 | In some cases -mrhoist will lead to the reporting of ambiguities\r | |
1523 | which were not visible before:\r | |
1524 | \r | |
1525 | start : (a)* "@";\r | |
1526 | a : bc | d;\r | |
1527 | bc : b | c ;\r | |
1528 | \r | |
1529 | b : <<isUpper(LATEXT(1))>>? A;\r | |
1530 | c : A ;\r | |
1531 | \r | |
1532 | d : A ;\r | |
1533 | \r | |
1534 | In this case there is a true ambiguity in "a" between "bc" and "d"\r | |
1535 | which can both match "A". Without -mrhoist the predicate in "b"\r | |
1536 | is hoisted into "a" and there is no ambiguity reported. However,\r | |
1537 | with -mrhoist, the predicate in "b" is suppressed by "c" (as it\r | |
1538 | should be) making the ambiguity in "a" apparent.\r | |
1539 | \r | |
1540 | The motivations for these changes were hoisting problems reported\r | |
1541 | by Reinier van den Born (reinier@vnet.ibm.com) and several others.\r | |
1542 | \r | |
1543 | #116. (Changed in 1.33MR10) C++ mode: tracein/traceout rule name is (const char *)\r | |
1544 | \r | |
1545 | The prototype for C++ mode routine tracein (and traceout) has changed from\r | |
1546 | "char *" to "const char *".\r | |
1547 | \r | |
1548 | #115. (Changed in 1.33MR10) Using guess mode with exception handlers in C mode\r | |
1549 | \r | |
1550 | The definition of the C mode macros zzmatch_wsig and zzsetmatch_wsig\r | |
1551 | neglected to consider guess mode. When control passed to the rule's\r | |
1552 | parse exception handler the routine would exit without ever closing the\r | |
1553 | guess block. This would lead to unpredictable behavior.\r | |
1554 | \r | |
1555 | In 1.33MR10 the behavior of exceptions in C mode and C++ mode should be\r | |
1556 | identical.\r | |
1557 | \r | |
1558 | #114. (Changed in 1.33MR10) difference in [zz]resynch() between C and C++ modes\r | |
1559 | \r | |
1560 | There was a slight difference in the way C and C++ mode resynchronized\r | |
1561 | following a parsing error. The C routine would sometimes skip an extra\r | |
1562 | token before attempting to resynchronize.\r | |
1563 | \r | |
1564 | The C routine was changed to match the C++ routine.\r | |
1565 | \r | |
1566 | #113. (Changed in 1.33MR10) new context guarded pred: (g)? && <<p>>? expr\r | |
1567 | \r | |
1568 | The existing context guarded predicate:\r | |
1569 | \r | |
1570 | rule : (guard)? => <<p>>? expr\r | |
1571 | | next_alternative\r | |
1572 | ;\r | |
1573 | \r | |
1574 | generates code which resembles:\r | |
1575 | \r | |
1576 | if (lookahead(expr) && (!guard || pred)) {\r | |
1577 | expr()\r | |
1578 | } else ....\r | |
1579 | \r | |
1580 | This is not suitable for some applications because it allows\r | |
1581 | expr() to be invoked when the predicate is false. This is\r | |
1582 | intentional because it is meant to mimic automatically computed\r | |
1583 | predicate context.\r | |
1584 | \r | |
1585 | The new context guarded predicate uses the guard information\r | |
1586 | differently because it has a different goal. Consider:\r | |
1587 | \r | |
1588 | rule : (guard)? && <<p>>? expr\r | |
1589 | | next_alternative\r | |
1590 | ;\r | |
1591 | \r | |
1592 | The new style of context guarded predicate is equivalent to:\r | |
1593 | \r | |
1594 | rule : <<guard==true && pred>>? expr\r | |
1595 | | next_alternative\r | |
1596 | ;\r | |
1597 | \r | |
1598 | It generates code which resembles:\r | |
1599 | \r | |
1600 | if (lookahead(expr) && guard && pred) {\r | |
1601 | expr();\r | |
1602 | } else ...\r | |
1603 | \r | |
1604 | Both forms of guarded predicates severely restrict the form of\r | |
1605 | the context guard: it can contain no rule references, no\r | |
1606 | (...)*, no (...)+, and no {...}. It may contain token and\r | |
1607 | token class references, and alternation ("|").\r | |
1608 | \r | |
1609 | Addition for 1.33MR11: in the token expression all tokens must\r | |
1610 | be at the same height of the token tree:\r | |
1611 | \r | |
1612 | (A ( B | C))? && ... is ok (all height 2)\r | |
1613 | (A ( B | ))? && ... is not ok (some 1, some 2)\r | |
1614 | (A B C D | E F G H)? && ... is ok (all height 4)\r | |
1615 | (A B C D | E )? && ... is not ok (some 4, some 1)\r | |
1616 | \r | |
1617 | This restriction is required in order to properly compute the lookahead\r | |
1618 | set for expressions like:\r | |
1619 | \r | |
1620 | rule1 : (A B C)? && <<pred>>? rule2 ;\r | |
1621 | rule2 : (A|X) (B|Y) (C|Z);\r | |
1622 | \r | |
1623 | This addition was suggested by Rienier van den Born (reinier@vnet.ibm.com)\r | |
1624 | \r | |
1625 | #112. (Changed in 1.33MR10) failed validation predicate in C guess mode\r | |
1626 | \r | |
1627 | John Lilley (jlilley@empathy.com) suggested that failed validation\r | |
1628 | predicates abort a guess rather than reporting a failed error.\r | |
1629 | This was installed in C++ mode (Item #4). Only now was it noticed\r | |
1630 | that the fix was never installed for C mode.\r | |
1631 | \r | |
1632 | #111. (Changed in 1.33MR10) moved zzTRACEIN to before init action\r | |
1633 | \r | |
1634 | When the antlr -gd switch is present antlr generates calls to\r | |
1635 | zzTRACEIN at the start of a rule and zzTRACEOUT at the exit\r | |
1636 | from a rule. Prior to 1.33MR10 Tthe call to zzTRACEIN was\r | |
1637 | after the init-action, which could cause confusion because the\r | |
1638 | init-actions were reported with the name of the enclosing rule,\r | |
1639 | rather than the active rule.\r | |
1640 | \r | |
1641 | #110. (Changed in 1.33MR10) antlr command line copied to generated file\r | |
1642 | \r | |
1643 | The antlr command line is now copied to the generated file near\r | |
1644 | the start.\r | |
1645 | \r | |
1646 | #109. (Changed in 1.33MR10) improved trace information\r | |
1647 | \r | |
1648 | The quality of the trace information provided by the "-gd"\r | |
1649 | switch has been improved significantly. Here is an example\r | |
1650 | of the output from a test program. It shows the rule name,\r | |
1651 | the first token of lookahead, the call depth, and the guess\r | |
1652 | status:\r | |
1653 | \r | |
1654 | exit rule gusxx {"?"} depth 2\r | |
1655 | enter rule gusxx {"?"} depth 2\r | |
1656 | enter rule gus1 {"o"} depth 3 guessing\r | |
1657 | guess done - returning to rule gus1 {"o"} at depth 3\r | |
1658 | (guess mode continues - an enclosing guess is still active)\r | |
1659 | guess done - returning to rule gus1 {"Z"} at depth 3\r | |
1660 | (guess mode continues - an enclosing guess is still active)\r | |
1661 | exit rule gus1 {"Z"} depth 3 guessing\r | |
1662 | guess done - returning to rule gusxx {"o"} at depth 2 (guess mode ends)\r | |
1663 | enter rule gus1 {"o"} depth 3\r | |
1664 | guess done - returning to rule gus1 {"o"} at depth 3 (guess mode ends)\r | |
1665 | guess done - returning to rule gus1 {"Z"} at depth 3 (guess mode ends)\r | |
1666 | exit rule gus1 {"Z"} depth 3\r | |
1667 | line 1: syntax error at "Z" missing SC\r | |
1668 | ...\r | |
1669 | \r | |
1670 | Rule trace reporting is controlled by the value of the integer\r | |
1671 | [zz]traceOptionValue: when it is positive tracing is enabled,\r | |
1672 | otherwise it is disabled. Tracing during guess mode is controlled\r | |
1673 | by the value of the integer [zz]traceGuessOptionValue. When\r | |
1674 | it is positive AND [zz]traceOptionValue is positive rule trace\r | |
1675 | is reported in guess mode.\r | |
1676 | \r | |
1677 | The values of [zz]traceOptionValue and [zz]traceGuessOptionValue\r | |
1678 | can be adjusted by subroutine calls listed below.\r | |
1679 | \r | |
1680 | Depending on the presence or absence of the antlr -gd switch\r | |
1681 | the variable [zz]traceOptionValueDefault is set to 0 or 1. When\r | |
1682 | the parser is initialized or [zz]traceReset() is called the\r | |
1683 | value of [zz]traceOptionValueDefault is copied to [zz]traceOptionValue.\r | |
1684 | The value of [zz]traceGuessOptionValue is always initialzed to 1,\r | |
1685 | but, as noted earlier, nothing will be reported unless\r | |
1686 | [zz]traceOptionValue is also positive.\r | |
1687 | \r | |
1688 | When the parser state is saved/restored the value of the trace\r | |
1689 | variables are also saved/restored. If a restore causes a change in\r | |
1690 | reporting behavior from on to off or vice versa this will be reported.\r | |
1691 | \r | |
1692 | When the -gd option is selected, the macro "#define zzTRACE_RULES"\r | |
1693 | is added to appropriate output files.\r | |
1694 | \r | |
1695 | C++ mode\r | |
1696 | --------\r | |
1697 | int traceOption(int delta)\r | |
1698 | int traceGuessOption(int delta)\r | |
1699 | void traceReset()\r | |
1700 | int traceOptionValueDefault\r | |
1701 | \r | |
1702 | C mode\r | |
1703 | --------\r | |
1704 | int zzTraceOption(int delta)\r | |
1705 | int zzTraceGuessOption(int delta)\r | |
1706 | void zzTraceReset()\r | |
1707 | int zzTraceOptionValueDefault\r | |
1708 | \r | |
1709 | The argument "delta" is added to the traceOptionValue. To\r | |
1710 | turn on trace when inside a particular rule one:\r | |
1711 | \r | |
1712 | rule : <<traceOption(+1);>>\r | |
1713 | (\r | |
1714 | rest-of-rule\r | |
1715 | )\r | |
1716 | <<traceOption(-1);>>\r | |
1717 | ; /* fail clause */ <<traceOption(-1);>>\r | |
1718 | \r | |
1719 | One can use the same idea to turn *off* tracing within a\r | |
1720 | rule by using a delta of (-1).\r | |
1721 | \r | |
1722 | An improvement in the rule trace was suggested by Sramji\r | |
1723 | Ramanathan (ps@kumaran.com).\r | |
1724 | \r | |
1725 | #108. A Note on Deallocation of Variables Allocated in Guess Mode\r | |
1726 | \r | |
1727 | NOTE\r | |
1728 | ------------------------------------------------------\r | |
1729 | This mechanism only works for heap allocated variables\r | |
1730 | ------------------------------------------------------\r | |
1731 | \r | |
1732 | The rewrite of the trace provides the machinery necessary\r | |
1733 | to properly free variables or undo actions following a\r | |
1734 | failed guess.\r | |
1735 | \r | |
1736 | The macro zzUSER_GUESS_HOOK(guessSeq,zzrv) is expanded\r | |
1737 | as part of the zzGUESS macro. When a guess is opened\r | |
1738 | the value of zzrv is 0. When a longjmp() is executed to\r | |
1739 | undo the guess, the value of zzrv will be 1.\r | |
1740 | \r | |
1741 | The macro zzUSER_GUESS_DONE_HOOK(guessSeq) is expanded\r | |
1742 | as part of the zzGUESS_DONE macro. This is executed\r | |
1743 | whether the guess succeeds or fails as part of closing\r | |
1744 | the guess.\r | |
1745 | \r | |
1746 | The guessSeq is a sequence number which is assigned to each\r | |
1747 | guess and is incremented by 1 for each guess which becomes\r | |
1748 | active. It is needed by the user to associate the start of\r | |
1749 | a guess with the failure and/or completion (closing) of a\r | |
1750 | guess.\r | |
1751 | \r | |
1752 | Guesses are nested. They must be closed in the reverse\r | |
1753 | of the order that they are opened.\r | |
1754 | \r | |
1755 | In order to free memory used by a variable during a guess\r | |
1756 | a user must write a routine which can be called to\r | |
1757 | register the variable along with the current guess sequence\r | |
1758 | number provided by the zzUSER_GUESS_HOOK macro. If the guess\r | |
1759 | fails, all variables tagged with the corresponding guess\r | |
1760 | sequence number should be released. This is ugly, but\r | |
1761 | it would require a major rewrite of antlr 1.33 to use\r | |
1762 | some mechanism other than setjmp()/longjmp().\r | |
1763 | \r | |
1764 | The order of calls for a *successful* guess would be:\r | |
1765 | \r | |
1766 | zzUSER_GUESS_HOOK(guessSeq,0);\r | |
1767 | zzUSER_GUESS_DONE_HOOK(guessSeq);\r | |
1768 | \r | |
1769 | The order of calls for a *failed* guess would be:\r | |
1770 | \r | |
1771 | zzUSER_GUESS_HOOK(guessSeq,0);\r | |
1772 | zzUSER_GUESS_HOOK(guessSeq,1);\r | |
1773 | zzUSER_GUESS_DONE_HOOK(guessSeq);\r | |
1774 | \r | |
1775 | The default definitions of these macros are empty strings.\r | |
1776 | \r | |
1777 | Here is an example in C++ mode. The zzUSER_GUESS_HOOK and\r | |
1778 | zzUSER_GUESS_DONE_HOOK macros and myGuessHook() routine\r | |
1779 | can be used without change in both C and C++ versions.\r | |
1780 | \r | |
1781 | ----------------------------------------------------------------------\r | |
1782 | <<\r | |
1783 | \r | |
1784 | #include "AToken.h"\r | |
1785 | \r | |
1786 | typedef ANTLRCommonToken ANTLRToken;\r | |
1787 | \r | |
1788 | #include "DLGLexer.h"\r | |
1789 | \r | |
1790 | int main() {\r | |
1791 | \r | |
1792 | {\r | |
1793 | DLGFileInput in(stdin);\r | |
1794 | DLGLexer lexer(&in,2000);\r | |
1795 | ANTLRTokenBuffer pipe(&lexer,1);\r | |
1796 | ANTLRCommonToken aToken;\r | |
1797 | P parser(&pipe);\r | |
1798 | \r | |
1799 | lexer.setToken(&aToken);\r | |
1800 | parser.init();\r | |
1801 | parser.start();\r | |
1802 | };\r | |
1803 | \r | |
1804 | fclose(stdin);\r | |
1805 | fclose(stdout);\r | |
1806 | return 0;\r | |
1807 | }\r | |
1808 | \r | |
1809 | >>\r | |
1810 | \r | |
1811 | <<\r | |
1812 | char *s=NULL;\r | |
1813 | \r | |
1814 | #undef zzUSER_GUESS_HOOK\r | |
1815 | #define zzUSER_GUESS_HOOK(guessSeq,zzrv) myGuessHook(guessSeq,zzrv);\r | |
1816 | #undef zzUSER_GUESS_DONE_HOOK\r | |
1817 | #define zzUSER_GUESS_DONE_HOOK(guessSeq) myGuessHook(guessSeq,2);\r | |
1818 | \r | |
1819 | void myGuessHook(int guessSeq,int zzrv) {\r | |
1820 | if (zzrv == 0) {\r | |
1821 | fprintf(stderr,"User hook: starting guess #%d\n",guessSeq);\r | |
1822 | } else if (zzrv == 1) {\r | |
1823 | free (s);\r | |
1824 | s=NULL;\r | |
1825 | fprintf(stderr,"User hook: failed guess #%d\n",guessSeq);\r | |
1826 | } else if (zzrv == 2) {\r | |
1827 | free (s);\r | |
1828 | s=NULL;\r | |
1829 | fprintf(stderr,"User hook: ending guess #%d\n",guessSeq);\r | |
1830 | };\r | |
1831 | }\r | |
1832 | \r | |
1833 | >>\r | |
1834 | \r | |
1835 | #token A "a"\r | |
1836 | #token "[\t \ \n]" <<skip();>>\r | |
1837 | \r | |
1838 | class P {\r | |
1839 | \r | |
1840 | start : (top)+\r | |
1841 | ;\r | |
1842 | \r | |
1843 | top : (which) ? <<fprintf(stderr,"%s is a which\n",s); free(s); s=NULL; >>\r | |
1844 | | other <<fprintf(stderr,"%s is an other\n",s); free(s); s=NULL; >>\r | |
1845 | ; <<if (s != NULL) free(s); s=NULL; >>\r | |
1846 | \r | |
1847 | which : which2\r | |
1848 | ;\r | |
1849 | \r | |
1850 | which2 : which3\r | |
1851 | ;\r | |
1852 | which3\r | |
1853 | : (label)? <<fprintf(stderr,"%s is a label\n",s);>>\r | |
1854 | | (global)? <<fprintf(stderr,"%s is a global\n",s);>>\r | |
1855 | | (exclamation)? <<fprintf(stderr,"%s is an exclamation\n",s);>>\r | |
1856 | ;\r | |
1857 | \r | |
1858 | label : <<s=strdup(LT(1)->getText());>> A ":" ;\r | |
1859 | \r | |
1860 | global : <<s=strdup(LT(1)->getText());>> A "::" ;\r | |
1861 | \r | |
1862 | exclamation : <<s=strdup(LT(1)->getText());>> A "!" ;\r | |
1863 | \r | |
1864 | other : <<s=strdup(LT(1)->getText());>> "other" ;\r | |
1865 | \r | |
1866 | }\r | |
1867 | ----------------------------------------------------------------------\r | |
1868 | \r | |
1869 | This is a silly example, but illustrates the idea. For the input\r | |
1870 | "a ::" with tracing enabled the output begins:\r | |
1871 | \r | |
1872 | ----------------------------------------------------------------------\r | |
1873 | enter rule "start" depth 1\r | |
1874 | enter rule "top" depth 2\r | |
1875 | User hook: starting guess #1\r | |
1876 | enter rule "which" depth 3 guessing\r | |
1877 | enter rule "which2" depth 4 guessing\r | |
1878 | enter rule "which3" depth 5 guessing\r | |
1879 | User hook: starting guess #2\r | |
1880 | enter rule "label" depth 6 guessing\r | |
1881 | guess failed\r | |
1882 | User hook: failed guess #2\r | |
1883 | guess done - returning to rule "which3" at depth 5 (guess mode continues\r | |
1884 | - an enclosing guess is still active)\r | |
1885 | User hook: ending guess #2\r | |
1886 | User hook: starting guess #3\r | |
1887 | enter rule "global" depth 6 guessing\r | |
1888 | exit rule "global" depth 6 guessing\r | |
1889 | guess done - returning to rule "which3" at depth 5 (guess mode continues\r | |
1890 | - an enclosing guess is still active)\r | |
1891 | User hook: ending guess #3\r | |
1892 | enter rule "global" depth 6 guessing\r | |
1893 | exit rule "global" depth 6 guessing\r | |
1894 | exit rule "which3" depth 5 guessing\r | |
1895 | exit rule "which2" depth 4 guessing\r | |
1896 | exit rule "which" depth 3 guessing\r | |
1897 | guess done - returning to rule "top" at depth 2 (guess mode ends)\r | |
1898 | User hook: ending guess #1\r | |
1899 | enter rule "which" depth 3\r | |
1900 | .....\r | |
1901 | ----------------------------------------------------------------------\r | |
1902 | \r | |
1903 | Remember:\r | |
1904 | \r | |
1905 | (a) Only init-actions are executed during guess mode.\r | |
1906 | (b) A rule can be invoked multiple times during guess mode.\r | |
1907 | (c) If the guess succeeds the rule will be called once more\r | |
1908 | without guess mode so that normal actions will be executed.\r | |
1909 | This means that the init-action might need to distinguish\r | |
1910 | between guess mode and non-guess mode using the variable\r | |
1911 | [zz]guessing.\r | |
1912 | \r | |
1913 | #107. (Changed in 1.33MR10) construction of ASTs in guess mode\r | |
1914 | \r | |
1915 | Prior to 1.33MR10, when using automatic AST construction in C++\r | |
1916 | mode for a rule, an AST would be constructed for elements of the\r | |
1917 | rule even while in guess mode. In MR10 this no longer occurs.\r | |
1918 | \r | |
1919 | #106. (Changed in 1.33MR10) guess variable confusion\r | |
1920 | \r | |
1921 | In C++ mode a guess which failed always restored the parser state\r | |
1922 | using zzGUESS_DONE as part of zzGUESS_FAIL. Prior to 1.33MR10,\r | |
1923 | C mode required an explicit call to zzGUESS_DONE after the\r | |
1924 | call to zzGUESS_FAIL.\r | |
1925 | \r | |
1926 | Consider:\r | |
1927 | \r | |
1928 | rule : (alpha)? beta\r | |
1929 | | ...\r | |
1930 | ;\r | |
1931 | \r | |
1932 | The generated code resembles:\r | |
1933 | \r | |
1934 | zzGUESS\r | |
1935 | if (!zzrv && LA(1)==ID) { <==== line #1\r | |
1936 | alpha\r | |
1937 | zzGUESS_DONE\r | |
1938 | beta\r | |
1939 | } else {\r | |
1940 | if (! zzrv) zzGUESS_DONE <==== line #2a\r | |
1941 | ....\r | |
1942 | \r | |
1943 | However, in some cases line #2 was rendered:\r | |
1944 | \r | |
1945 | if (guessing) zzGUESS_DONE <==== line #2b\r | |
1946 | \r | |
1947 | This would work for simple test cases, but would fail in\r | |
1948 | some cases where there was a guess while another guess was active.\r | |
1949 | One kind of failure would be to match up the zzGUESS_DONE at line\r | |
1950 | #2b with the "outer" guess which was still active. The outer\r | |
1951 | guess would "succeed" when only the inner guess should have\r | |
1952 | succeeded.\r | |
1953 | \r | |
1954 | In 1.33MR10 the behavior of zzGUESS and zzGUESS_FAIL in C and\r | |
1955 | and C++ mode should be identical.\r | |
1956 | \r | |
1957 | The same problem appears in 1.33 vanilla in some places. For\r | |
1958 | example:\r | |
1959 | \r | |
1960 | start : { (sub)? } ;\r | |
1961 | \r | |
1962 | or:\r | |
1963 | \r | |
1964 | start : (\r | |
1965 | B\r | |
1966 | | ( sub )?\r | |
1967 | | C\r | |
1968 | )+\r | |
1969 | ;\r | |
1970 | \r | |
1971 | generates incorrect code.\r | |
1972 | \r | |
1973 | The general principle is:\r | |
1974 | \r | |
1975 | (a) use [zz]guessing only when deciding between a call to zzFAIL\r | |
1976 | or zzGUESS_FAIL\r | |
1977 | \r | |
1978 | (b) use zzrv in all other cases\r | |
1979 | \r | |
1980 | This problem was discovered while testing changes to item #105.\r | |
1981 | I believe this is now fixed. My apologies.\r | |
1982 | \r | |
1983 | #105. (Changed in 1.33MR10) guess block as single alt of (...)+\r | |
1984 | \r | |
1985 | Prior to 1.33MR10 the following constructs:\r | |
1986 | \r | |
1987 | rule_plus : (\r | |
1988 | (sub)?\r | |
1989 | )+\r | |
1990 | ;\r | |
1991 | \r | |
1992 | rule_star : (\r | |
1993 | (sub)?\r | |
1994 | )*\r | |
1995 | ;\r | |
1996 | \r | |
1997 | generated incorrect code for the guess block (which could result\r | |
1998 | in runtime errors) because of an incorrect optimization of a\r | |
1999 | block with only a single alternative.\r | |
2000 | \r | |
2001 | The fix caused some changes to the fix described in Item #49\r | |
2002 | because there are now three code generation sequences for (...)+\r | |
2003 | blocks containing a guess block:\r | |
2004 | \r | |
2005 | a. single alternative which is a guess block\r | |
2006 | b. multiple alternatives in which the last is a guess block\r | |
2007 | c. all other cases\r | |
2008 | \r | |
2009 | Forms like "rule_star" can have unexpected behavior when there\r | |
2010 | is a syntax error: if the subrule "sub" is not matched *exactly*\r | |
2011 | then "rule_star" will consume no tokens.\r | |
2012 | \r | |
2013 | Reported by Esa Pulkkinen (esap@cs.tut.fi).\r | |
2014 | \r | |
2015 | #104. (Changed in 1.33MR10) -o option for dlg\r | |
2016 | \r | |
2017 | There was problem with the code added by item #74 to handle the\r | |
2018 | -o option of dlg. This should fix it.\r | |
2019 | \r | |
2020 | #103. (Changed in 1.33MR10) ANDed semantic predicates\r | |
2021 | \r | |
2022 | Rescinded.\r | |
2023 | \r | |
2024 | The optimization was a mistake.\r | |
2025 | The resulting problem is described in Item #150.\r | |
2026 | \r | |
2027 | #102. (Changed in 1.33MR10) allow "class parser : .... {"\r | |
2028 | \r | |
2029 | The syntax of the class statement ("class parser-name {")\r | |
2030 | has been extended to allow for the specification of base\r | |
2031 | classes. An arbirtrary number of tokens may now appear\r | |
2032 | between the class name and the "{". They are output\r | |
2033 | again when the class declaration is generated. For\r | |
2034 | example:\r | |
2035 | \r | |
2036 | class Parser : public MyBaseClassANTLRparser {\r | |
2037 | \r | |
2038 | This was suggested by a user, but I don't have a record\r | |
2039 | of who it was.\r | |
2040 | \r | |
2041 | #101. (Changed in 1.33MR10) antlr -info command line switch\r | |
2042 | \r | |
2043 | -info\r | |
2044 | \r | |
2045 | p - extra predicate information in generated file\r | |
2046 | \r | |
2047 | t - information about tnode use:\r | |
2048 | at the end of each rule in generated file\r | |
2049 | summary on stderr at end of program\r | |
2050 | \r | |
2051 | m - monitor progress\r | |
2052 | prints name of each rule as it is started\r | |
2053 | flushes output at start of each rule\r | |
2054 | \r | |
2055 | f - first/follow set information to stdout\r | |
2056 | \r | |
2057 | 0 - no operation (added in 1.33MR11)\r | |
2058 | \r | |
2059 | The options may be combined and may appear in any order.\r | |
2060 | For example:\r | |
2061 | \r | |
2062 | antlr -info ptm -CC -gt -mrhoist on mygrammar.g\r | |
2063 | \r | |
2064 | #100a. (Changed in 1.33MR10) Predicate tree simplification\r | |
2065 | \r | |
2066 | When the same predicates can be referenced in more than one\r | |
2067 | alternative of a block large predicate trees can be formed.\r | |
2068 | \r | |
2069 | The difference that these optimizations make is so dramatic\r | |
2070 | that I have decided to use it even when -mrhoist is not selected.\r | |
2071 | \r | |
2072 | Consider the following grammar:\r | |
2073 | \r | |
2074 | start : ( all )* ;\r | |
2075 | \r | |
2076 | all : a\r | |
2077 | | d\r | |
2078 | | e\r | |
2079 | | f\r | |
2080 | ;\r | |
2081 | \r | |
2082 | a : c A B\r | |
2083 | | c A C\r | |
2084 | ;\r | |
2085 | \r | |
2086 | c : <<AAA(LATEXT(2))>>?\r | |
2087 | ;\r | |
2088 | \r | |
2089 | d : <<BBB(LATEXT(2))>>? B C\r | |
2090 | ;\r | |
2091 | \r | |
2092 | e : <<CCC(LATEXT(2))>>? B C\r | |
2093 | ;\r | |
2094 | \r | |
2095 | f : e X Y\r | |
2096 | ;\r | |
2097 | \r | |
2098 | In rule "a" there is a reference to rule "c" in both alternatives.\r | |
2099 | The length of the predicate AAA is k=2 and it can be followed in\r | |
2100 | alternative 1 only by (A B) while in alternative 2 it can be\r | |
2101 | followed only by (A C). Thus they do not have identical context.\r | |
2102 | \r | |
2103 | In rule "all" the alternatives which refer to rules "e" and "f" allow\r | |
2104 | elimination of the duplicate reference to predicate CCC.\r | |
2105 | \r | |
2106 | The table below summarized the kind of simplification performed by\r | |
2107 | 1.33MR10. In the table, X and Y stand for single predicates\r | |
2108 | (not trees).\r | |
2109 | \r | |
2110 | (OR X (OR Y (OR Z))) => (OR X Y Z)\r | |
2111 | (AND X (AND Y (AND Z))) => (AND X Y Z)\r | |
2112 | \r | |
2113 | (OR X (... (OR X Y) ... )) => (OR X (... Y ... ))\r | |
2114 | (AND X (... (AND X Y) ... )) => (AND X (... Y ... ))\r | |
2115 | (OR X (... (AND X Y) ... )) => (OR X (... ... ))\r | |
2116 | (AND X (... (OR X Y) ... )) => (AND X (... ... ))\r | |
2117 | \r | |
2118 | (AND X) => X\r | |
2119 | (OR X) => X\r | |
2120 | \r | |
2121 | In a test with a complex grammar for a real application, a predicate\r | |
2122 | tree with six OR nodes and 12 leaves was reduced to "(OR X Y Z)".\r | |
2123 | \r | |
2124 | In 1.33MR10 there is a greater effort to release memory used\r | |
2125 | by predicates once they are no longer in use.\r | |
2126 | \r | |
2127 | #100b. (Changed in 1.33MR10) Suppression of extra predicate tests\r | |
2128 | \r | |
2129 | The following optimizations require that -mrhoist be selected.\r | |
2130 | \r | |
2131 | It is relatively easy to optimize the code generated for predicate\r | |
2132 | gates when they are of the form:\r | |
2133 | \r | |
2134 | (AND X Y Z ...)\r | |
2135 | or (OR X Y Z ...)\r | |
2136 | \r | |
2137 | where X, Y, Z, and "..." represent individual predicates (leaves) not\r | |
2138 | predicate trees.\r | |
2139 | \r | |
2140 | If the predicate is an AND the contexts of the X, Y, Z, etc. are\r | |
2141 | ANDed together to create a single Tree context for the group and\r | |
2142 | context tests for the individual predicates are suppressed:\r | |
2143 | \r | |
2144 | --------------------------------------------------\r | |
2145 | Note: This was incorrect. The contexts should be\r | |
2146 | ORed together. This has been fixed. A more \r | |
2147 | complete description is available in item #152.\r | |
2148 | ---------------------------------------------------\r | |
2149 | \r | |
2150 | Optimization 1: (AND X Y Z ...)\r | |
2151 | \r | |
2152 | Suppose the context for Xtest is LA(1)==LP and the context for\r | |
2153 | Ytest is LA(1)==LP && LA(2)==ID.\r | |
2154 | \r | |
2155 | Without the optimization the code would resemble:\r | |
2156 | \r | |
2157 | if (lookaheadContext &&\r | |
2158 | !(LA(1)==LP && LA(1)==LP && LA(2)==ID) ||\r | |
2159 | ( (! LA(1)==LP || Xtest) &&\r | |
2160 | (! (LA(1)==LP || LA(2)==ID) || Xtest)\r | |
2161 | )) {...\r | |
2162 | \r | |
2163 | With the -mrhoist optimization the code would resemble:\r | |
2164 | \r | |
2165 | if (lookaheadContext &&\r | |
2166 | ! (LA(1)==LP && LA(2)==ID) || (Xtest && Ytest) {...\r | |
2167 | \r | |
2168 | Optimization 2: (OR X Y Z ...) with identical contexts\r | |
2169 | \r | |
2170 | Suppose the context for Xtest is LA(1)==ID and for Ytest\r | |
2171 | the context is also LA(1)==ID.\r | |
2172 | \r | |
2173 | Without the optimization the code would resemble:\r | |
2174 | \r | |
2175 | if (lookaheadContext &&\r | |
2176 | ! (LA(1)==ID || LA(1)==ID) ||\r | |
2177 | (LA(1)==ID && Xtest) ||\r | |
2178 | (LA(1)==ID && Ytest) {...\r | |
2179 | \r | |
2180 | With the -mrhoist optimization the code would resemble:\r | |
2181 | \r | |
2182 | if (lookaheadContext &&\r | |
2183 | (! LA(1)==ID) || (Xtest || Ytest) {...\r | |
2184 | \r | |
2185 | Optimization 3: (OR X Y Z ...) with distinct contexts\r | |
2186 | \r | |
2187 | Suppose the context for Xtest is LA(1)==ID and for Ytest\r | |
2188 | the context is LA(1)==LP.\r | |
2189 | \r | |
2190 | Without the optimization the code would resemble:\r | |
2191 | \r | |
2192 | if (lookaheadContext &&\r | |
2193 | ! (LA(1)==ID || LA(1)==LP) ||\r | |
2194 | (LA(1)==ID && Xtest) ||\r | |
2195 | (LA(1)==LP && Ytest) {...\r | |
2196 | \r | |
2197 | With the -mrhoist optimization the code would resemble:\r | |
2198 | \r | |
2199 | if (lookaheadContext &&\r | |
2200 | (zzpf=0,\r | |
2201 | (LA(1)==ID && (zzpf=1) && Xtest) ||\r | |
2202 | (LA(1)==LP && (zzpf=1) && Ytest) ||\r | |
2203 | !zzpf) {\r | |
2204 | \r | |
2205 | These may appear to be of similar complexity at first,\r | |
2206 | but the non-optimized version contains two tests of each\r | |
2207 | context while the optimized version contains only one\r | |
2208 | such test, as well as eliminating some of the inverted\r | |
2209 | logic (" !(...) || ").\r | |
2210 | \r | |
2211 | Optimization 4: Computation of predicate gate trees\r | |
2212 | \r | |
2213 | When generating code for the gates of predicate expressions\r | |
2214 | antlr 1.33 vanilla uses a recursive procedure to generate\r | |
2215 | "&&" and "||" expressions for testing the lookahead. As each\r | |
2216 | layer of the predicate tree is exposed a new set of "&&" and\r | |
2217 | "||" expressions on the lookahead are generated. In many\r | |
2218 | cases the lookahead being tested has already been tested.\r | |
2219 | \r | |
2220 | With -mrhoist a lookahead tree is computed for the entire\r | |
2221 | lookahead expression. This means that predicates with identical\r | |
2222 | context or context which is a subset of another predicate's\r | |
2223 | context disappear.\r | |
2224 | \r | |
2225 | This is especially important for predicates formed by rules\r | |
2226 | like the following:\r | |
2227 | \r | |
2228 | uppperCaseVowel : <<isUpperCase(LATEXT(1))>>? vowel;\r | |
2229 | vowel: : <<isVowel(LATEXT(1))>>? LETTERS;\r | |
2230 | \r | |
2231 | These predicates are combined using AND since both must be\r | |
2232 | satisfied for rule upperCaseVowel. They have identical\r | |
2233 | context which makes this optimization very effective.\r | |
2234 | \r | |
2235 | The affect of Items #100a and #100b together can be dramatic. In\r | |
2236 | a very large (but real world) grammar one particular predicate\r | |
2237 | expression was reduced from an (unreadable) 50 predicate leaves,\r | |
2238 | 195 LA(1) terms, and 5500 characters to an (easily comprehensible)\r | |
2239 | 3 predicate leaves (all different) and a *single* LA(1) term.\r | |
2240 | \r | |
2241 | #99. (Changed in 1.33MR10) Code generation for expression trees\r | |
2242 | \r | |
2243 | Expression trees are used for k>1 grammars and predicates with\r | |
2244 | lookahead depth >1. This optimization must be enabled using\r | |
2245 | "-mrhoist on". (Clarification added for 1.33MR11).\r | |
2246 | \r | |
2247 | In the processing of expression trees, antlr can generate long chains\r | |
2248 | of token comparisons. Prior to 1.33MR10 there were many redundant\r | |
2249 | parenthesis which caused problems for compilers which could handle\r | |
2250 | expressions of only limited complexity. For example, to test an\r | |
2251 | expression tree (root R A B C D), antlr would generate something\r | |
2252 | resembling:\r | |
2253 | \r | |
2254 | (LA(1)==R && (LA(2)==A || (LA(2)==B || (LA(2)==C || LA(2)==D)))))\r | |
2255 | \r | |
2256 | If there were twenty tokens to test then there would be twenty\r | |
2257 | parenthesis at the end of the expression.\r | |
2258 | \r | |
2259 | In 1.33MR10 the generated code for tree expressions resembles:\r | |
2260 | \r | |
2261 | (LA(1)==R && (LA(2)==A || LA(2)==B || LA(2)==C || LA(2)==D))\r | |
2262 | \r | |
2263 | For "complex" expressions the output is indented to reflect the LA\r | |
2264 | number being tested:\r | |
2265 | \r | |
2266 | (LA(1)==R\r | |
2267 | && (LA(2)==A || LA(2)==B || LA(2)==C || LA(2)==D\r | |
2268 | || LA(2)==E || LA(2)==F)\r | |
2269 | || LA(1)==S\r | |
2270 | && (LA(2)==G || LA(2)==H))\r | |
2271 | \r | |
2272 | \r | |
2273 | Suggested by S. Bochnak (S.Bochnak@@microTool.com.pl),\r | |
2274 | \r | |
2275 | #98. (Changed in 1.33MR10) Option "-info p"\r | |
2276 | \r | |
2277 | When the user selects option "-info p" the program will generate\r | |
2278 | detailed information about predicates. If the user selects\r | |
2279 | "-mrhoist on" additional detail will be provided explaining\r | |
2280 | the promotion and suppression of predicates. The output is part\r | |
2281 | of the generated file and sandwiched between #if 0/#endif statements.\r | |
2282 | \r | |
2283 | Consider the following k=1 grammar:\r | |
2284 | \r | |
2285 | start : ( all ) * ;\r | |
2286 | \r | |
2287 | all : ( a\r | |
2288 | | b\r | |
2289 | )\r | |
2290 | ;\r | |
2291 | \r | |
2292 | a : c B\r | |
2293 | ;\r | |
2294 | \r | |
2295 | c : <<LATEXT(1)>>?\r | |
2296 | | B\r | |
2297 | ;\r | |
2298 | \r | |
2299 | b : <<LATEXT(1)>>? X\r | |
2300 | ;\r | |
2301 | \r | |
2302 | Below is an excerpt of the output for rule "start" for the three\r | |
2303 | predicate options (off, on, and maintenance release style hoisting).\r | |
2304 | \r | |
2305 | For those who do not wish to use the "-mrhoist on" option for code\r | |
2306 | generation the option can be used in a "diagnostic" mode to provide\r | |
2307 | valuable information:\r | |
2308 | \r | |
2309 | a. where one should insert null actions to inhibit hoisting\r | |
2310 | b. a chain of rule references which shows where predicates are\r | |
2311 | being hoisted\r | |
2312 | \r | |
2313 | ======================================================================\r | |
2314 | Example of "-info p" with "-mrhoist on"\r | |
2315 | ======================================================================\r | |
2316 | #if 0\r | |
2317 | \r | |
2318 | Hoisting of predicate suppressed by alternative without predicate.\r | |
2319 | The alt without the predicate includes all cases where the\r | |
2320 | predicate is false.\r | |
2321 | \r | |
2322 | WITH predicate: line 11 v36.g\r | |
2323 | WITHOUT predicate: line 12 v36.g\r | |
2324 | \r | |
2325 | The context set for the predicate:\r | |
2326 | \r | |
2327 | B\r | |
2328 | \r | |
2329 | The lookahead set for alt WITHOUT the semantic predicate:\r | |
2330 | \r | |
2331 | B\r | |
2332 | \r | |
2333 | The predicate:\r | |
2334 | \r | |
2335 | pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g\r | |
2336 | \r | |
2337 | set context:\r | |
2338 | B\r | |
2339 | tree context: null\r | |
2340 | \r | |
2341 | Chain of referenced rules:\r | |
2342 | \r | |
2343 | #0 in rule start (line 1 v36.g) to rule all\r | |
2344 | #1 in rule all (line 3 v36.g) to rule a\r | |
2345 | #2 in rule a (line 8 v36.g) to rule c\r | |
2346 | #3 in rule c (line 11 v36.g)\r | |
2347 | \r | |
2348 | #endif\r | |
2349 | &&\r | |
2350 | #if 0\r | |
2351 | \r | |
2352 | pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g\r | |
2353 | \r | |
2354 | set context:\r | |
2355 | X\r | |
2356 | tree context: null\r | |
2357 | \r | |
2358 | #endif\r | |
2359 | ======================================================================\r | |
2360 | Example of "-info p" with the default -prc setting ( "-prc off")\r | |
2361 | ======================================================================\r | |
2362 | #if 0\r | |
2363 | \r | |
2364 | OR\r | |
2365 | pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g\r | |
2366 | \r | |
2367 | set context:\r | |
2368 | nil\r | |
2369 | tree context: null\r | |
2370 | \r | |
2371 | pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g\r | |
2372 | \r | |
2373 | set context:\r | |
2374 | nil\r | |
2375 | tree context: null\r | |
2376 | \r | |
2377 | #endif\r | |
2378 | ======================================================================\r | |
2379 | Example of "-info p" with "-prc on" and "-mrhoist off"\r | |
2380 | ======================================================================\r | |
2381 | #if 0\r | |
2382 | \r | |
2383 | OR\r | |
2384 | pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g\r | |
2385 | \r | |
2386 | set context:\r | |
2387 | B\r | |
2388 | tree context: null\r | |
2389 | \r | |
2390 | pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g\r | |
2391 | \r | |
2392 | set context:\r | |
2393 | X\r | |
2394 | tree context: null\r | |
2395 | \r | |
2396 | #endif\r | |
2397 | ======================================================================\r | |
2398 | \r | |
2399 | #97. (Fixed in 1.33MR10) "Predicate applied for more than one ... "\r | |
2400 | \r | |
2401 | In 1.33 vanilla, the grammar listed below produced this message for\r | |
2402 | the first alternative (only) of rule "b":\r | |
2403 | \r | |
2404 | warning: predicate applied for >1 lookahead 1-sequences\r | |
2405 | [you may only want one lookahead 1-sequence to apply.\r | |
2406 | Try using a context guard '(...)? =>'\r | |
2407 | \r | |
2408 | In 1.33MR10 the message is issued for both alternatives.\r | |
2409 | \r | |
2410 | top : (a)*;\r | |
2411 | a : b | c ;\r | |
2412 | \r | |
2413 | b : <<PPP(LATEXT(1))>>? ( AAA | BBB )\r | |
2414 | | <<QQQ(LATEXT(1))>>? ( XXX | YYY )\r | |
2415 | ;\r | |
2416 | \r | |
2417 | c : AAA | XXX;\r | |
2418 | \r | |
2419 | #96. (Fixed in 1.33MR10) Guard predicates ignored when -prc off\r | |
2420 | \r | |
2421 | Prior to 1.33MR10, guard predicate code was not generated unless\r | |
2422 | "-prc on" was selected.\r | |
2423 | \r | |
2424 | This was incorrect, since "-prc off" (the default) is supposed to\r | |
2425 | disable only AUTOMATIC computation of predicate context, not the\r | |
2426 | programmer specified context supplied by guard predicates.\r | |
2427 | \r | |
2428 | #95. (Fixed in 1.33MR10) Predicate guard context length was k, not max(k,ck)\r | |
2429 | \r | |
2430 | Prior to 1.33MR10, predicate guards were computed to k tokens rather\r | |
2431 | than max(k,ck). Consider the following grammar:\r | |
2432 | \r | |
2433 | a : ( A B C)? => <<AAA(LATEXT(1))>>? (A|X) (B|Y) (C|Z) ;\r | |
2434 | \r | |
2435 | The code generated by 1.33 vanilla with "-k 1 -ck 3 -prc on"\r | |
2436 | for the predicate in "a" resembles:\r | |
2437 | \r | |
2438 | if ( (! LA(1)==A) || AAA(LATEXT(1))) {...\r | |
2439 | \r | |
2440 | With 1.33MR10 and the same options the code resembles:\r | |
2441 | \r | |
2442 | if ( (! (LA(1)==A && LA(2)==B && LA(3)==C) || AAA(LATEXT(1))) {...\r | |
2443 | \r | |
2444 | #94. (Fixed in 1.33MR10) Predicates followed by rule references\r | |
2445 | \r | |
2446 | Prior to 1.33MR10, a semantic predicate which referenced a token\r | |
2447 | which was off the end of the rule caused an incomplete context\r | |
2448 | to be computed (with "-prc on") for the predicate under some circum-\r | |
2449 | stances. In some cases this manifested itself as illegal C code\r | |
2450 | (e.g. "LA(2)==[Ep](1)" in the k=2 examples below:\r | |
2451 | \r | |
2452 | all : ( a ) *;\r | |
2453 | \r | |
2454 | a : <<AAA(LATEXT(2))>>? ID X\r | |
2455 | | <<BBB(LATEXT(2))>>? Y\r | |
2456 | | Z\r | |
2457 | ;\r | |
2458 | \r | |
2459 | This might also occur when the semantic predicate was followed\r | |
2460 | by a rule reference which was shorter than the length of the\r | |
2461 | semantic predicate:\r | |
2462 | \r | |
2463 | all : ( a ) *;\r | |
2464 | \r | |
2465 | a : <<AAA(LATEXT(2))>>? ID X\r | |
2466 | | <<BBB(LATEXT(2))>>? y\r | |
2467 | | Z\r | |
2468 | ;\r | |
2469 | \r | |
2470 | y : Y ;\r | |
2471 | \r | |
2472 | Depending on circumstance, the resulting context might be too\r | |
2473 | generous because it was too short, or too restrictive because\r | |
2474 | of missing alternatives.\r | |
2475 | \r | |
2476 | #93. (Changed in 1.33MR10) Definition of Purify macro\r | |
2477 | \r | |
2478 | Ofer Ben-Ami (gremlin@cs.huji.ac.il) has supplied a definition\r | |
2479 | for the Purify macro:\r | |
2480 | \r | |
2481 | #define PURIFY(r, s) memset((char *) &(r), '\0', (s));\r | |
2482 | \r | |
2483 | Note: This may not be the right thing to do for C++ objects that\r | |
2484 | have constructors. Reported by Bonny Rais (bonny@werple.net.au).\r | |
2485 | \r | |
2486 | For those cases one should #define PURIFY to an empty macro in the\r | |
2487 | #header or #first actions.\r | |
2488 | \r | |
2489 | #92. (Fixed in 1.33MR10) Guarded predicates and hoisting\r | |
2490 | \r | |
2491 | When a guarded predicate participates in hoisting it is linked into\r | |
2492 | a predicate expression tree. Prior to 1.33MR10 this link was never\r | |
2493 | cleared and the next time the guard was used to construct a new\r | |
2494 | tree the link could contain a spurious reference to another element\r | |
2495 | which had previosly been joined to it in the semantic predicate tree.\r | |
2496 | \r | |
2497 | For example:\r | |
2498 | \r | |
2499 | start : ( all ) *;\r | |
2500 | all : ( a | b ) ;\r | |
2501 | \r | |
2502 | start2 : ( all2 ) *;\r | |
2503 | all2 : ( a ) ;\r | |
2504 | \r | |
2505 | a : (A)? => <<AAA(LATEXT(1))>>? A ;\r | |
2506 | b : (B)? => <<BBB(LATEXT(1))>>? B ;\r | |
2507 | \r | |
2508 | Prior to 1.33MR10 the code for "start2" would include a spurious\r | |
2509 | reference to the BBB predicate which was left from constructing\r | |
2510 | the predicate tree for rule "start" (i.e. or(AAA,BBB) ).\r | |
2511 | \r | |
2512 | In 1.33MR10 this problem is avoided by cloning the original guard\r | |
2513 | each time it is linked into a predicate tree.\r | |
2514 | \r | |
2515 | #91. (Changed in 1.33MR10) Extensive changes to semantic pred hoisting\r | |
2516 | \r | |
2517 | ============================================\r | |
2518 | This has been rendered obsolete by Item #117\r | |
2519 | ============================================\r | |
2520 | \r | |
2521 | #90. (Fixed in 1.33MR10) Semantic pred with LT(i) and i>max(k,ck)\r | |
2522 | \r | |
2523 | There is a bug in antlr 1.33 vanilla and all maintenance releases\r | |
2524 | prior to 1.33MR10 which allows semantic predicates to reference\r | |
2525 | an LT(i) or LATEXT(i) where i is larger than max(k,ck). When\r | |
2526 | this occurs antlr will attempt to mark the ith element of an array\r | |
2527 | in which there are only max(k,ck) elements. The result cannot\r | |
2528 | be predicted.\r | |
2529 | \r | |
2530 | Using LT(i) or LATEXT(i) for i>max(k,ck) is reported as an error\r | |
2531 | in 1.33MR10.\r | |
2532 | \r | |
2533 | #89. Rescinded\r | |
2534 | \r | |
2535 | #88. (Fixed in 1.33MR10) Tokens used in semantic predicates in guess mode\r | |
2536 | \r | |
2537 | Consider the behavior of a semantic predicate during guess mode:\r | |
2538 | \r | |
2539 | rule : a:A (\r | |
2540 | <<test($a)>>? b:B\r | |
2541 | | c:C\r | |
2542 | );\r | |
2543 | \r | |
2544 | Prior to MR10 the assignment of the token or attribute to\r | |
2545 | $a did not occur during guess mode, which would cause the\r | |
2546 | semantic predicate to misbehave because $a would be null.\r | |
2547 | \r | |
2548 | In 1.33MR10 a semantic predicate with a reference to an\r | |
2549 | element label (such as $a) forces the assignment to take\r | |
2550 | place even in guess mode.\r | |
2551 | \r | |
2552 | In order to work, this fix REQUIRES use of the $label format\r | |
2553 | for token pointers and attributes referenced in semantic\r | |
2554 | predicates.\r | |
2555 | \r | |
2556 | The fix does not apply to semantic predicates using the\r | |
2557 | numeric form to refer to attributes (e.g. <<test($1)>>?).\r | |
2558 | The user will receive a warning for this case.\r | |
2559 | \r | |
2560 | Reported by Rob Trout (trout@mcs.cs.kent.edu).\r | |
2561 | \r | |
2562 | #87. (Fixed in 1.33MR10) Malformed guard predicates\r | |
2563 | \r | |
2564 | Context guard predicates may contain only references to\r | |
2565 | tokens. They may not contain references to (...)+ and\r | |
2566 | (...)* blocks. This is now checked. This replaces the\r | |
2567 | fatal error message in item #78 with an appropriate\r | |
2568 | (non-fatal) error messge.\r | |
2569 | \r | |
2570 | In theory, context guards should be allowed to reference\r | |
2571 | rules. However, I have not had time to fix this.\r | |
2572 | Evaluation of the guard takes place before all rules have\r | |
2573 | been read, making it difficult to resolve a forward reference\r | |
2574 | to rule "zzz" - it hasn't been read yet ! To postpone evaluation\r | |
2575 | of the guard until all rules have been read is too much\r | |
2576 | for the moment.\r | |
2577 | \r | |
2578 | #86. (Fixed in 1.33MR10) Unequal set size in set_sub\r | |
2579 | \r | |
2580 | Routine set_sub() in pccts/support/set/set.h did not work\r | |
2581 | correctly when the sets were of unequal sizes. Rewrote\r | |
2582 | set_equ to make it simpler and remove unnecessary and\r | |
2583 | expensive calls to set_deg(). This routine was not used\r | |
2584 | in 1.33 vanila.\r | |
2585 | \r | |
2586 | #85. (Changed in 1.33MR10) Allow redefinition of MaxNumFiles\r | |
2587 | \r | |
2588 | Raised the maximum number of input files to 99 from 20.\r | |
2589 | Put a #ifndef/#endif around the "#define MaxNumFiles 99".\r | |
2590 | \r | |
2591 | #84. (Fixed in 1.33MR10) Initialize zzBadTok in macro zzRULE\r | |
2592 | \r | |
2593 | Initialize zzBadTok to NULL in zzRULE macro of AParser.h.\r | |
2594 | in order to get rid of warning messages.\r | |
2595 | \r | |
2596 | #83. (Fixed in 1.33MR10) False warnings with -w2 for #tokclass\r | |
2597 | \r | |
2598 | When -w2 is selected antlr gives inappropriate warnings about\r | |
2599 | #tokclass names not having any associated regular expressions.\r | |
2600 | Since a #tokclass is not a "real" token it will never have an\r | |
2601 | associated regular expression and there should be no warning.\r | |
2602 | \r | |
2603 | Reported by Derek Pappas (derek.pappas@eng.sun.com)\r | |
2604 | \r | |
2605 | #82. (Fixed in 1.33MR10) Computation of follow sets with multiple cycles\r | |
2606 | \r | |
2607 | Reinier van den Born (reinier@vnet.ibm.com) reported a problem\r | |
2608 | in the computation of follow sets by antlr. The problem (bug)\r | |
2609 | exists in 1.33 vanilla and all maintenance releases prior to 1.33MR10.\r | |
2610 | \r | |
2611 | The problem involves the computation of follow sets when there are\r | |
2612 | cycles - rules which have mutual references. I believe the problem\r | |
2613 | is restricted to cases where there is more than one cycle AND\r | |
2614 | elements of those cycles have rules in common. Even when this\r | |
2615 | occurs it may not affect the code generated - but it might. It\r | |
2616 | might also lead to undetected ambiguities.\r | |
2617 | \r | |
2618 | There were no changes in antlr or dlg output from the revised version.\r | |
2619 | \r | |
2620 | The following fragment demonstates the problem by giving different\r | |
2621 | follow sets (option -pa) for var_access when built with k=1 and ck=2 on\r | |
2622 | 1.33 vanilla and 1.33MR10:\r | |
2623 | \r | |
2624 | echo_statement : ECHO ( echo_expr )*\r | |
2625 | ;\r | |
2626 | \r | |
2627 | echo_expr : ( command )?\r | |
2628 | | expression\r | |
2629 | ;\r | |
2630 | \r | |
2631 | command : IDENTIFIER\r | |
2632 | { concat }\r | |
2633 | ;\r | |
2634 | \r | |
2635 | expression : operand ( OPERATOR operand )*\r | |
2636 | ;\r | |
2637 | \r | |
2638 | operand : value\r | |
2639 | | START command END\r | |
2640 | ;\r | |
2641 | \r | |
2642 | value : concat\r | |
2643 | | TYPE operand\r | |
2644 | ;\r | |
2645 | \r | |
2646 | concat : var_access { CONCAT value }\r | |
2647 | ;\r | |
2648 | \r | |
2649 | var_access : IDENTIFIER { INDEX }\r | |
2650 | \r | |
2651 | ;\r | |
2652 | #81. (Changed in 1.33MR10) C mode use of attributes and ASTs\r | |
2653 | \r | |
2654 | Reported by Isaac Clark (irclark@mindspring.com).\r | |
2655 | \r | |
2656 | C mode code ignores attributes returned by rules which are\r | |
2657 | referenced using element labels when ASTs are enabled (-gt option).\r | |
2658 | \r | |
2659 | 1. start : r:rule t:Token <<$start=$r;>>\r | |
2660 | \r | |
2661 | The $r refrence will not work when combined with\r | |
2662 | the -gt option.\r | |
2663 | \r | |
2664 | 2. start : t:Token <<$start=$t;>>\r | |
2665 | \r | |
2666 | The $t reference works in all cases.\r | |
2667 | \r | |
2668 | 3. start : rule <<$0=$1;>>\r | |
2669 | \r | |
2670 | Numeric labels work in all cases.\r | |
2671 | \r | |
2672 | With MR10 the user will receive an error message for case 1 when\r | |
2673 | the -gt option is used.\r | |
2674 | \r | |
2675 | #80. (Fixed in 1.33MR10) (...)? as last alternative of block\r | |
2676 | \r | |
2677 | A construct like the following:\r | |
2678 | \r | |
2679 | rule : a\r | |
2680 | | (b)?\r | |
2681 | ;\r | |
2682 | \r | |
2683 | does not make sense because there is no alternative when\r | |
2684 | the guess block fails. This is now reported as a warning\r | |
2685 | to the user.\r | |
2686 | \r | |
2687 | Previously, there was a code generation error for this case:\r | |
2688 | the guess block was not "closed" when the guess failed.\r | |
2689 | This could cause an infinite loop or other problems. This\r | |
2690 | is now fixed.\r | |
2691 | \r | |
2692 | Example problem:\r | |
2693 | \r | |
2694 | #header<<\r | |
2695 | #include <stdio.h>\r | |
2696 | #include "charptr.h"\r | |
2697 | >>\r | |
2698 | \r | |
2699 | <<\r | |
2700 | #include "charptr.c"\r | |
2701 | main ()\r | |
2702 | {\r | |
2703 | ANTLR(start(),stdin);\r | |
2704 | }\r | |
2705 | >>\r | |
2706 | \r | |
2707 | #token "[\ \t]+" << zzskip(); >>\r | |
2708 | #token "[\n]" << zzline++; zzskip(); >>\r | |
2709 | \r | |
2710 | #token Word "[a-z]+"\r | |
2711 | #token Number "[0-9]+"\r | |
2712 | \r | |
2713 | \r | |
2714 | start : (test1)?\r | |
2715 | | (test2)?\r | |
2716 | ;\r | |
2717 | test1 : (Word Word Word Word)?\r | |
2718 | | (Word Word Word Number)?\r | |
2719 | ;\r | |
2720 | test2 : (Word Word Number Word)?\r | |
2721 | | (Word Word Number Number)?\r | |
2722 | ;\r | |
2723 | \r | |
2724 | Test data which caused infinite loop:\r | |
2725 | \r | |
2726 | a 1 a a\r | |
2727 | \r | |
2728 | #79. (Changed in 1.33MR10) Use of -fh with multiple parsers\r | |
2729 | \r | |
2730 | Previously, antlr always used the pre-processor symbol\r | |
2731 | STDPCCTS_H as a gate for the file stdpccts.h. This\r | |
2732 | caused problems when there were multiple parsers defined\r | |
2733 | because they used the same gate symbol.\r | |
2734 | \r | |
2735 | In 1.33MR10, the -fh filename is used to generate the\r | |
2736 | gate file for stdpccts.h. For instance:\r | |
2737 | \r | |
2738 | antlr -fh std_parser1.h\r | |
2739 | \r | |
2740 | generates the pre-processor symbol "STDPCCTS_std_parser1_H".\r | |
2741 | \r | |
2742 | Reported by Ramanathan Santhanam (ps@kumaran.com).\r | |
2743 | \r | |
2744 | #78. (Changed in 1.33MR9) Guard predicates that refer to rules\r | |
2745 | \r | |
2746 | ------------------------\r | |
2747 | Please refer to Item #87\r | |
2748 | ------------------------\r | |
2749 | \r | |
2750 | Guard predicates are processed during an early phase\r | |
2751 | of antlr (during parsing) before all data structures\r | |
2752 | are completed.\r | |
2753 | \r | |
2754 | There is an apparent bug in earlier versions of 1.33\r | |
2755 | which caused guard predicates which contained references\r | |
2756 | to rules (rather than tokens) to reference a structure\r | |
2757 | which hadn't yet been initialized.\r | |
2758 | \r | |
2759 | In some cases (perhaps all cases) references to rules\r | |
2760 | in guard predicates resulted in the use of "garbage".\r | |
2761 | \r | |
2762 | #79. (Changed in 1.33MR9) Jeff Vincent (JVincent@novell.com)\r | |
2763 | \r | |
2764 | Previously, the maximum length file name was set\r | |
2765 | arbitrarily to 300 characters in antlr, dlg, and sorcerer.\r | |
2766 | \r | |
2767 | The config.h file now attempts to define the maximum length\r | |
2768 | filename using _MAX_PATH from stdlib.h before falling back\r | |
2769 | to using the value 300.\r | |
2770 | \r | |
2771 | #78. (Changed in 1.33MR9) Jeff Vincent (JVincent@novell.com)\r | |
2772 | \r | |
2773 | Put #ifndef/#endif around definition of ZZLEXBUFSIZE in\r | |
2774 | antlr.\r | |
2775 | \r | |
2776 | #77. (Changed in 1.33MR9) Arithmetic overflow for very large grammars\r | |
2777 | \r | |
2778 | In routine HandleAmbiguities() antlr attempts to compute the\r | |
2779 | number of possible elements in a set that is order of\r | |
2780 | number-of-tokens raised to the number-of-lookahead-tokens power.\r | |
2781 | For large grammars or large lookahead (e.g. -ck 7) this can\r | |
2782 | cause arithmetic overflow.\r | |
2783 | \r | |
2784 | With 1.33MR9, arithmetic overflow in this computation is reported\r | |
2785 | the first time it happens. The program continues to run and\r | |
2786 | the program branches based on the assumption that the computed\r | |
2787 | value is larger than any number computed by counting actual cases\r | |
2788 | because 2**31 is larger than the number of bits in most computers.\r | |
2789 | \r | |
2790 | Before 1.33MR9 overflow was not reported. The behavior following\r | |
2791 | overflow is not predictable by anyone but the original author.\r | |
2792 | \r | |
2793 | NOTE\r | |
2794 | \r | |
2795 | In 1.33MR10 the warning message is suppressed.\r | |
2796 | The code which detects the overflow allows the\r | |
2797 | computation to continue without an error. The\r | |
2798 | error message itself made made users worry.\r | |
2799 | \r | |
2800 | #76. (Changed in 1.33MR9) Jeff Vincent (JVincent@novell.com)\r | |
2801 | \r | |
2802 | Jeff Vincent has convinced me to make ANTLRCommonToken and\r | |
2803 | ANTLRCommonNoRefCountToken use variable length strings\r | |
2804 | allocated from the heap rather than fixed length strings.\r | |
2805 | By suitable definition of setText(), the copy constructor,\r | |
2806 | and operator =() it is possible to maintain "copy" semantics.\r | |
2807 | By "copy" semantics I mean that when a token is copied from\r | |
2808 | an existing token it receives its own, distinct, copy of the\r | |
2809 | text allocated from the heap rather than simply a pointer\r | |
2810 | to the original token's text.\r | |
2811 | \r | |
2812 | ============================================================\r | |
2813 | W * A * R * N * I * N * G\r | |
2814 | ============================================================\r | |
2815 | \r | |
2816 | It is possible that this may cause problems for some users.\r | |
2817 | For those users I have included the old version of AToken.h as\r | |
2818 | pccts/h/AToken_traditional.h.\r | |
2819 | \r | |
2820 | #75. (Changed in 1.33MR9) Bruce Guenter (bruceg@qcc.sk.ca)\r | |
2821 | \r | |
2822 | Make DLGStringInput const correct. Since this is infrequently\r | |
2823 | subclassed, it should affect few users, I hope.\r | |
2824 | \r | |
2825 | #74. (Changed in 1.33MR9) -o (output directory) option\r | |
2826 | \r | |
2827 | Antlr does not properly handle the -o output directory option\r | |
2828 | when the filename of the grammar contains a directory part. For\r | |
2829 | example:\r | |
2830 | \r | |
2831 | antlr -o outdir pccts_src/myfile.g\r | |
2832 | \r | |
2833 | causes antlr create a file called "outdir/pccts_src/myfile.cpp.\r | |
2834 | It SHOULD create outdir/myfile.cpp\r | |
2835 | \r | |
2836 | The suggested code fix has been installed in antlr, dlg, and\r | |
2837 | Sorcerer.\r | |
2838 | \r | |
2839 | #73. (Changed in 1.33MR9) Hoisting of semantic predicates and -mrhoist\r | |
2840 | \r | |
2841 | ============================================\r | |
2842 | This has been rendered obsolete by Item #117\r | |
2843 | ============================================\r | |
2844 | \r | |
2845 | #72. (Changed in 1.33MR9) virtual saveState()/restoreState()/guess_XXX\r | |
2846 | \r | |
2847 | The following methods in ANTLRParser were made virtual at\r | |
2848 | the request of S. Bochnak (S.Bochnak@microTool.com.pl):\r | |
2849 | \r | |
2850 | saveState() and restoreState()\r | |
2851 | guess(), guess_fail(), and guess_done()\r | |
2852 | \r | |
2853 | #71. (Changed in 1.33MR9) Access to omitted command line argument\r | |
2854 | \r | |
2855 | If a switch requiring arguments is the last thing on the\r | |
2856 | command line, and the argument is omitted, antlr would core.\r | |
2857 | \r | |
2858 | antlr test.g -prc\r | |
2859 | \r | |
2860 | instead of\r | |
2861 | \r | |
2862 | antlr test.g -prc off\r | |
2863 | \r | |
2864 | #70. (Changed in 1.33MR9) Addition of MSVC .dsp and .mak build files\r | |
2865 | \r | |
2866 | The following MSVC .dsp and .mak files for pccts and sorcerer\r | |
2867 | were contributed by Stanislaw Bochnak (S.Bochnak@microTool.com.pl)\r | |
2868 | and Jeff Vincent (JVincent@novell.com)\r | |
2869 | \r | |
2870 | PCCTS Distribution Kit\r | |
2871 | ----------------------\r | |
2872 | pccts/PCCTSMSVC50.dsw\r | |
2873 | \r | |
2874 | pccts/antlr/AntlrMSVC50.dsp\r | |
2875 | pccts/antlr/AntlrMSVC50.mak\r | |
2876 | \r | |
2877 | pccts/dlg/DlgMSVC50.dsp\r | |
2878 | pccts/dlg/DlgMSVC50.mak\r | |
2879 | \r | |
2880 | pccts/support/msvc.dsp\r | |
2881 | \r | |
2882 | Sorcerer Distribution Kit\r | |
2883 | -------------------------\r | |
2884 | pccts/sorcerer/SorcererMSVC50.dsp\r | |
2885 | pccts/sorcerer/SorcererMSVC50.mak\r | |
2886 | \r | |
2887 | pccts/sorcerer/lib/msvc.dsp\r | |
2888 | \r | |
2889 | #69. (Changed in 1.33MR9) Change "unsigned int" to plain "int"\r | |
2890 | \r | |
2891 | Declaration of max_token_num in misc.c as "unsigned int"\r | |
2892 | caused comparison between signed and unsigned ints giving\r | |
2893 | warning message without any special benefit.\r | |
2894 | \r | |
2895 | #68. (Changed in 1.33MR9) Add void return for dlg internal_error()\r | |
2896 | \r | |
2897 | Get rid of "no return value" message in internal_error()\r | |
2898 | in file dlg/support.c and dlg/dlg.h.\r | |
2899 | \r | |
2900 | #67. (Changed in Sor) sor.g: lisp() has no return value\r | |
2901 | \r | |
2902 | Added a "void" for the return type.\r | |
2903 | \r | |
2904 | #66. (Added to Sor) sor.g: ZZLEXBUFSIZE enclosed in #ifndef/#endif\r | |
2905 | \r | |
2906 | A user needed to be able to change the ZZLEXBUFSIZE for\r | |
2907 | sor. Put the definition of ZZLEXBUFSIZE inside #ifndef/#endif\r | |
2908 | \r | |
2909 | #65. (Changed in 1.33MR9) PCCTSAST::deepCopy() and ast_dup() bug\r | |
2910 | \r | |
2911 | Jeff Vincent (JVincent@novell.com) found that deepCopy()\r | |
2912 | made new copies of only the direct descendents. No new\r | |
2913 | copies were made of sibling nodes, Sibling pointers are\r | |
2914 | set to zero by shallowCopy().\r | |
2915 | \r | |
2916 | PCCTS_AST::deepCopy() has been changed to make a\r | |
2917 | deep copy in the traditional sense.\r | |
2918 | \r | |
2919 | The deepCopy() routine depends on the behavior of\r | |
2920 | shallowCopy(). In all sor examples I've found,\r | |
2921 | shallowCopy() zeroes the right and down pointers.\r | |
2922 | \r | |
2923 | Original Tree Original deepCopy() Revised deepCopy\r | |
2924 | ------------- ------------------- ----------------\r | |
2925 | a->b->c A A\r | |
2926 | | | |\r | |
2927 | d->e->f D D->E->F\r | |
2928 | | | |\r | |
2929 | g->h->i G G->H->I\r | |
2930 | | |\r | |
2931 | j->k J->K\r | |
2932 | \r | |
2933 | While comparing deepCopy() for C++ mode with ast_dup for\r | |
2934 | C mode I found a problem with ast_dup().\r | |
2935 | \r | |
2936 | Routine ast_dup() has been changed to make a deep copy\r | |
2937 | in the traditional sense.\r | |
2938 | \r | |
2939 | Original Tree Original ast_dup() Revised ast_dup()\r | |
2940 | ------------- ------------------- ----------------\r | |
2941 | a->b->c A->B->C A\r | |
2942 | | | |\r | |
2943 | d->e->f D->E->F D->E->F\r | |
2944 | | | |\r | |
2945 | g->h->i G->H->I G->H->I\r | |
2946 | | | |\r | |
2947 | j->k J->K J->K\r | |
2948 | \r | |
2949 | \r | |
2950 | I believe this affects transform mode sorcerer programs only.\r | |
2951 | \r | |
2952 | #64. (Changed in 1.33MR9) anltr/hash.h prototype for killHashTable()\r | |
2953 | \r | |
2954 | #63. (Changed in 1.33MR8) h/charptr.h does not zero pointer after free\r | |
2955 | \r | |
2956 | The charptr.h routine now zeroes the pointer after free().\r | |
2957 | \r | |
2958 | Reported by Jens Tingleff (jensting@imaginet.fr)\r | |
2959 | \r | |
2960 | #62. (Changed in 1.33MR8) ANTLRParser::resynch had static variable\r | |
2961 | \r | |
2962 | The static variable "consumed" in ANTLRParser::resynch was\r | |
2963 | changed into an instance variable of the class with the\r | |
2964 | name "resynchConsumed".\r | |
2965 | \r | |
2966 | Reported by S.Bochnak@microTool.com.pl\r | |
2967 | \r | |
2968 | #61. (Changed in 1.33MR8) Using rule>[i,j] when rule has no return values\r | |
2969 | \r | |
2970 | Previously, the following code would cause antlr to core when\r | |
2971 | it tried to generate code for rule1 because rule2 had no return\r | |
2972 | values ("upward inheritance"):\r | |
2973 | \r | |
2974 | rule1 : <<int i; int j>>\r | |
2975 | rule2 > [i,j]\r | |
2976 | ;\r | |
2977 | \r | |
2978 | rule2 : Anything ;\r | |
2979 | \r | |
2980 | Reported by S.Bochnak@microTool.com.pl\r | |
2981 | \r | |
2982 | Verified correct operation of antlr MR8 when missing or extra\r | |
2983 | inheritance arguments for all combinations. When there are\r | |
2984 | missing or extra arguments code will still be generated even\r | |
2985 | though this might cause the invocation of a subroutine with\r | |
2986 | the wrong number of arguments.\r | |
2987 | \r | |
2988 | #60. (Changed in 1.33MR7) Major changes to exception handling\r | |
2989 | \r | |
2990 | There were significant problems in the handling of exceptions\r | |
2991 | in 1.33 vanilla. The general problem is that it can only\r | |
2992 | process one level of exception handler. For example, a named\r | |
2993 | exception handler, an exception handler for an alternative, or\r | |
2994 | an exception for a subrule always went to the rule's exception\r | |
2995 | handler if there was no "catch" which matched the exception.\r | |
2996 | \r | |
2997 | In 1.33MR7 the exception handlers properly "nest". If an\r | |
2998 | exception handler does not have a matching "catch" then the\r | |
2999 | nextmost outer exception handler is checked for an appropriate\r | |
3000 | "catch" clause, and so on until an exception handler with an\r | |
3001 | appropriate "catch" is found.\r | |
3002 | \r | |
3003 | There are still undesirable features in the way exception\r | |
3004 | handlers are implemented, but I do not have time to fix them\r | |
3005 | at the moment:\r | |
3006 | \r | |
3007 | The exception handlers for alternatives are outside the\r | |
3008 | block containing the alternative. This makes it impossible\r | |
3009 | to access variables declared in a block or to resume the\r | |
3010 | parse by "falling through". The parse can still be easily\r | |
3011 | resumed in other ways, but not in the most natural fashion.\r | |
3012 | \r | |
3013 | This results in an inconsistentcy between named exception\r | |
3014 | handlers and exception handlers for alternatives. When\r | |
3015 | an exception handler for an alternative "falls through"\r | |
3016 | it goes to the nextmost outer handler - not the "normal\r | |
3017 | action".\r | |
3018 | \r | |
3019 | A major difference between 1.33MR7 and 1.33 vanilla is\r | |
3020 | the default action after an exception is caught:\r | |
3021 | \r | |
3022 | 1.33 Vanilla\r | |
3023 | ------------\r | |
3024 | In 1.33 vanilla the signal value is set to zero ("NoSignal")\r | |
3025 | and the code drops through to the code following the exception.\r | |
3026 | For named exception handlers this is the "normal action".\r | |
3027 | For alternative exception handlers this is the rule's handler.\r | |
3028 | \r | |
3029 | 1.33MR7\r | |
3030 | -------\r | |
3031 | In 1.33MR7 the signal value is NOT automatically set to zero.\r | |
3032 | \r | |
3033 | There are two cases:\r | |
3034 | \r | |
3035 | For named exception handlers: if the signal value has been\r | |
3036 | set to zero the code drops through to the "normal action".\r | |
3037 | \r | |
3038 | For all other cases the code branches to the nextmost outer\r | |
3039 | exception handler until it reaches the handler for the rule.\r | |
3040 | \r | |
3041 | The following macros have been defined for convenience:\r | |
3042 | \r | |
3043 | C/C++ Mode Name\r | |
3044 | --------------------\r | |
3045 | (zz)suppressSignal\r | |
3046 | set signal & return signal arg to 0 ("NoSignal")\r | |
3047 | (zz)setSignal(intValue)\r | |
3048 | set signal & return signal arg to some value\r | |
3049 | (zz)exportSignal\r | |
3050 | copy the signal value to the return signal arg\r | |
3051 | \r | |
3052 | I'm not sure why PCCTS make a distinction between the local\r | |
3053 | signal value and the return signal argument, but I'm loathe\r | |
3054 | to change the code. The burden of copying the local signal\r | |
3055 | value to the return signal argument can be given to the\r | |
3056 | default signal handler, I suppose.\r | |
3057 | \r | |
3058 | #59. (Changed in 1.33MR7) Prototypes for some functions\r | |
3059 | \r | |
3060 | Added prototypes for the following functions to antlr.h\r | |
3061 | \r | |
3062 | zzconsumeUntil()\r | |
3063 | zzconsumeUntilToken()\r | |
3064 | \r | |
3065 | #58. (Changed in 1.33MR7) Added defintion of zzbufsize to dlgauto.h\r | |
3066 | \r | |
3067 | #57. (Changed in 1.33MR7) Format of #line directive\r | |
3068 | \r | |
3069 | Previously, the -gl directive for line 1234 would\r | |
3070 | resemble: "# 1234 filename.g". This caused problems\r | |
3071 | for some compilers/pre-processors. In MR7 it generates\r | |
3072 | "#line 1234 filename.g".\r | |
3073 | \r | |
3074 | #56. (Added in 1.33MR7) Jan Mikkelsen <janm@zeta.org.au>\r | |
3075 | \r | |
3076 | Move PURIFY macro invocaton to after rule's init action.\r | |
3077 | \r | |
3078 | #55. (Fixed in 1.33MR7) Unitialized variables in ANTLRParser\r | |
3079 | \r | |
3080 | Member variables inf_labase and inf_last were not initialized.\r | |
3081 | (See item #50.)\r | |
3082 | \r | |
3083 | #54. (Fixed in 1.33MR6) Brad Schick (schick@interacess.com)\r | |
3084 | \r | |
3085 | Previously, the following constructs generated the same\r | |
3086 | code:\r | |
3087 | \r | |
3088 | rule1 : (A B C)?\r | |
3089 | | something-else\r | |
3090 | ;\r | |
3091 | \r | |
3092 | rule2 : (A B C)? ()\r | |
3093 | | something-else\r | |
3094 | ;\r | |
3095 | \r | |
3096 | In all versions of pccts rule1 guesses (A B C) and then\r | |
3097 | consume all three tokens if the guess succeeds. In MR6\r | |
3098 | rule2 guesses (A B C) but consumes NONE of the tokens\r | |
3099 | when the guess succeeds because "()" matches epsilon.\r | |
3100 | \r | |
3101 | #53. (Explanation for 1.33MR6) What happens after an exception is caught ?\r | |
3102 | \r | |
3103 | The Book is silent about what happens after an exception\r | |
3104 | is caught.\r | |
3105 | \r | |
3106 | The following code fragment prints "Error Action" followed\r | |
3107 | by "Normal Action".\r | |
3108 | \r | |
3109 | test : Word ex:Number <<printf("Normal Action\n");>>\r | |
3110 | exception[ex]\r | |
3111 | catch NoViableAlt:\r | |
3112 | <<printf("Error Action\n");>>\r | |
3113 | ;\r | |
3114 | \r | |
3115 | The reason for "Normal Action" is that the normal flow of the\r | |
3116 | program after a user-written exception handler is to "drop through".\r | |
3117 | In the case of an exception handler for a rule this results in\r | |
3118 | the exection of a "return" statement. In the case of an\r | |
3119 | exception handler attached to an alternative, rule, or token\r | |
3120 | this is the code that would have executed had there been no\r | |
3121 | exception.\r | |
3122 | \r | |
3123 | The user can achieve the desired result by using a "return"\r | |
3124 | statement.\r | |
3125 | \r | |
3126 | test : Word ex:Number <<printf("Normal Action\n");>>\r | |
3127 | exception[ex]\r | |
3128 | catch NoViableAlt:\r | |
3129 | <<printf("Error Action\n"); return;>>\r | |
3130 | ;\r | |
3131 | \r | |
3132 | The most powerful mechanism for recovery from parse errors\r | |
3133 | in pccts is syntactic predicates because they provide\r | |
3134 | backtracking. Exceptions allow "return", "break",\r | |
3135 | "consumeUntil(...)", "goto _handler", "goto _fail", and\r | |
3136 | changing the _signal value.\r | |
3137 | \r | |
3138 | #52. (Fixed in 1.33MR6) Exceptions without syntactic predicates\r | |
3139 | \r | |
3140 | The following generates bad code in 1.33 if no syntactic\r | |
3141 | predicates are present in the grammar.\r | |
3142 | \r | |
3143 | test : Word ex:Number <<printf("Normal Action\n");>>\r | |
3144 | exception[ex]\r | |
3145 | catch NoViableAlt:\r | |
3146 | <<printf("Error Action\n");>>\r | |
3147 | \r | |
3148 | There is a reference to a guess variable. In C mode\r | |
3149 | this causes a compiler error. In C++ mode it generates\r | |
3150 | an extraneous check on member "guessing".\r | |
3151 | \r | |
3152 | In MR6 correct code is generated for both C and C++ mode.\r | |
3153 | \r | |
3154 | #51. (Added to 1.33MR6) Exception operator "@" used without exceptions\r | |
3155 | \r | |
3156 | In MR6 added a warning when the exception operator "@" is\r | |
3157 | used and no exception group is defined. This is probably\r | |
3158 | a case where "\@" or "@" is meant.\r | |
3159 | \r | |
3160 | #50. (Fixed in 1.33MR6) Gunnar Rxnning (gunnar@candleweb.no)\r | |
3161 | http://www.candleweb.no/~gunnar/\r | |
3162 | \r | |
3163 | Routines zzsave_antlr_state and zzrestore_antlr_state don't\r | |
3164 | save and restore all the data needed when switching states.\r | |
3165 | \r | |
3166 | Suggested patch applied to antlr.h and err.h for MR6.\r | |
3167 | \r | |
3168 | #49. (Fixed in 1.33MR6) Sinan Karasu (sinan@boeing.com)\r | |
3169 | \r | |
3170 | Generated code failed to turn off guess mode when leaving a\r | |
3171 | (...)+ block which contained a guess block. The result was\r | |
3172 | an infinite loop. For example:\r | |
3173 | \r | |
3174 | rule : (\r | |
3175 | (x)?\r | |
3176 | | y\r | |
3177 | )+\r | |
3178 | \r | |
3179 | Suggested code fix implemented in MR6. Replaced\r | |
3180 | \r | |
3181 | ... else if (zzcnt>1) break;\r | |
3182 | \r | |
3183 | with:\r | |
3184 | \r | |
3185 | C++ mode:\r | |
3186 | ... else if (zzcnt>1) {if (!zzrv) zzGUESS_DONE; break;};\r | |
3187 | C mode:\r | |
3188 | ... else if (zzcnt>1) {if (zzguessing) zzGUESS_DONE; break;};\r | |
3189 | \r | |
3190 | #48. (Fixed in 1.33MR6) Invalid exception element causes core\r | |
3191 | \r | |
3192 | A label attached to an invalid construct can cause\r | |
3193 | pccts to crash while processing the exception associated\r | |
3194 | with the label. For example:\r | |
3195 | \r | |
3196 | rule : t:(B C)\r | |
3197 | exception[t] catch MismatchedToken: <<printf(...);>>\r | |
3198 | \r | |
3199 | Version MR6 generates the message:\r | |
3200 | \r | |
3201 | reference in exception handler to undefined label 't'\r | |
3202 | \r | |
3203 | #47. (Fixed in 1.33MR6) Manuel Ornato\r | |
3204 | \r | |
3205 | Under some circumstances involving a k >1 or ck >1\r | |
3206 | grammar and a loop block (i.e. (...)* ) pccts will\r | |
3207 | fail to detect a syntax error and loop indefinitely.\r | |
3208 | The problem did not exist in 1.20, but has existed\r | |
3209 | from 1.23 to the present.\r | |
3210 | \r | |
3211 | Fixed in MR6.\r | |
3212 | \r | |
3213 | ---------------------------------------------------\r | |
3214 | Complete test program\r | |
3215 | ---------------------------------------------------\r | |
3216 | #header<<\r | |
3217 | #include <stdio.h>\r | |
3218 | #include "charptr.h"\r | |
3219 | >>\r | |
3220 | \r | |
3221 | <<\r | |
3222 | #include "charptr.c"\r | |
3223 | main ()\r | |
3224 | {\r | |
3225 | ANTLR(global(),stdin);\r | |
3226 | }\r | |
3227 | >>\r | |
3228 | \r | |
3229 | #token "[\ \t]+" << zzskip(); >>\r | |
3230 | #token "[\n]" << zzline++; zzskip(); >>\r | |
3231 | \r | |
3232 | #token B "b"\r | |
3233 | #token C "c"\r | |
3234 | #token D "d"\r | |
3235 | #token E "e"\r | |
3236 | #token LP "\("\r | |
3237 | #token RP "\)"\r | |
3238 | \r | |
3239 | #token ANTLREOF "@"\r | |
3240 | \r | |
3241 | global : (\r | |
3242 | (E liste)\r | |
3243 | | liste\r | |
3244 | | listed\r | |
3245 | ) ANTLREOF\r | |
3246 | ;\r | |
3247 | \r | |
3248 | listeb : LP ( B ( B | C )* ) RP ;\r | |
3249 | listec : LP ( C ( B | C )* ) RP ;\r | |
3250 | listed : LP ( D ( B | C )* ) RP ;\r | |
3251 | liste : ( listeb | listec )* ;\r | |
3252 | \r | |
3253 | ---------------------------------------------------\r | |
3254 | Sample data causing infinite loop\r | |
3255 | ---------------------------------------------------\r | |
3256 | e (d c)\r | |
3257 | ---------------------------------------------------\r | |
3258 | \r | |
3259 | #46. (Fixed in 1.33MR6) Robert Richter\r | |
3260 | (Robert.Richter@infotech.tu-chemnitz.de)\r | |
3261 | \r | |
3262 | This item from the list of known problems was\r | |
3263 | fixed by item #18 (below).\r | |
3264 | \r | |
3265 | #45. (Fixed in 1.33MR6) Brad Schick (schick@interaccess.com)\r | |
3266 | \r | |
3267 | The dependency scanner in VC++ mistakenly sees a\r | |
3268 | reference to an MPW #include file even though properly\r | |
3269 | #ifdef/#endif in config.h. The suggested workaround\r | |
3270 | has been implemented:\r | |
3271 | \r | |
3272 | #ifdef MPW\r | |
3273 | .....\r | |
3274 | #define MPW_CursorCtl_Header <CursorCtl.h>\r | |
3275 | #include MPW_CursorCtl_Header\r | |
3276 | .....\r | |
3277 | #endif\r | |
3278 | \r | |
3279 | #44. (Fixed in 1.33MR6) cast malloc() to (char *) in charptr.c\r | |
3280 | \r | |
3281 | Added (char *) cast for systems where malloc returns "void *".\r | |
3282 | \r | |
3283 | #43. (Added to 1.33MR6) Bruce Guenter (bruceg@qcc.sk.ca)\r | |
3284 | \r | |
3285 | Add setLeft() and setUp methods to ASTDoublyLinkedBase\r | |
3286 | for symmetry with setRight() and setDown() methods.\r | |
3287 | \r | |
3288 | #42. (Fixed in 1.33MR6) Jeff Katcher (jkatcher@nortel.ca)\r | |
3289 | \r | |
3290 | C++ style comment in antlr.c corrected.\r | |
3291 | \r | |
3292 | #41. (Added in 1.33MR6) antlr -stdout\r | |
3293 | \r | |
3294 | Using "antlr -stdout ..." forces the text that would\r | |
3295 | normally go to the grammar.c or grammar.cpp file to\r | |
3296 | stdout.\r | |
3297 | \r | |
3298 | #40. (Added in 1.33MR6) antlr -tab to change tab stops\r | |
3299 | \r | |
3300 | Using "antlr -tab number ..." changes the tab stops\r | |
3301 | for the grammar.c or grammar.cpp file. The number\r | |
3302 | must be between 0 and 8. Using 0 gives tab characters,\r | |
3303 | values between 1 and 8 give the appropriate number of\r | |
3304 | space characters.\r | |
3305 | \r | |
3306 | #39. (Fixed in 1.33MR5) Jan Mikkelsen <janm@zeta.org.au>\r | |
3307 | \r | |
3308 | Commas in function prototype still not correct under\r | |
3309 | some circumstances. Suggested code fix installed.\r | |
3310 | \r | |
3311 | #38. (Fixed in 1.33MR5) ANTLRTokenBuffer constructor\r | |
3312 | \r | |
3313 | Have ANTLRTokenBuffer ctor initialize member "parser" to null.\r | |
3314 | \r | |
3315 | #37. (Fixed in 1.33MR4) Bruce Guenter (bruceg@qcc.sk.ca)\r | |
3316 | \r | |
3317 | In ANTLRParser::FAIL(int k,...) released memory pointed to by\r | |
3318 | f[i] (as well as f itself. Should only free f itself.\r | |
3319 | \r | |
3320 | #36. (Fixed in 1.33MR3) Cortland D. Starrett (cort@shay.ecn.purdue.edu)\r | |
3321 | \r | |
3322 | Neglected to properly declare isDLGmaxToken() when fixing problem\r | |
3323 | reported by Andreas Magnusson.\r | |
3324 | \r | |
3325 | Undo "_retv=NULL;" change which caused problems for return values\r | |
3326 | from rules whose return values weren't pointers.\r | |
3327 | \r | |
3328 | Failed to create bin directory if it didn't exist.\r | |
3329 | \r | |
3330 | #35. (Fixed in 1.33MR2) Andreas Magnusson\r | |
3331 | (Andreas.Magnusson@mailbox.swipnet.se)\r | |
3332 | \r | |
3333 | Repair bug introduced by 1.33MR1 for #tokdefs. The original fix\r | |
3334 | placed "DLGmaxToken=9999" and "DLGminToken=0" in the TokenType enum\r | |
3335 | in order to fix a problem with an aggresive compiler assigning an 8\r | |
3336 | bit enum which might be too narrow. This caused #tokdefs to assume\r | |
3337 | that there were 9999 real tokens. The repair to the fix causes antlr to\r | |
3338 | ignore TokenTypes "DLGmaxToken" and "DLGminToken" in a #tokdefs file.\r | |
3339 | \r | |
3340 | #34. (Added to 1.33MR1) Add public DLGLexerBase::set_line(int newValue)\r | |
3341 | \r | |
3342 | Previously there was no public function for changing the line\r | |
3343 | number maintained by the lexer.\r | |
3344 | \r | |
3345 | #33. (Fixed in 1.33MR1) Franklin Chen (chen@adi.com)\r | |
3346 | \r | |
3347 | Accidental use of EXIT_FAILURE rather than PCCTS_EXIT_FAILURE\r | |
3348 | in pccts/h/AParser.cpp.\r | |
3349 | \r | |
3350 | #32. (Fixed in 1.33MR1) Franklin Chen (chen@adi.com)\r | |
3351 | \r | |
3352 | In PCCTSAST.cpp lines 405 and 466: Change\r | |
3353 | \r | |
3354 | free (t)\r | |
3355 | to\r | |
3356 | free ( (char *)t );\r | |
3357 | \r | |
3358 | to match prototype.\r | |
3359 | \r | |
3360 | #31. (Added to 1.33MR1) Pointer to parser in ANTLRTokenBuffer\r | |
3361 | Pointer to parser in DLGLexerBase\r | |
3362 | \r | |
3363 | The ANTLRTokenBuffer class now contains a pointer to the\r | |
3364 | parser which is using it. This is established by the\r | |
3365 | ANTLRParser constructor calling ANTLRTokenBuffer::\r | |
3366 | setParser(ANTLRParser *p).\r | |
3367 | \r | |
3368 | When ANTLRTokenBuffer::setParser(ANTLRParser *p) is\r | |
3369 | called it saves the pointer to the parser and then\r | |
3370 | calls ANTLRTokenStream::setParser(ANTLRParser *p)\r | |
3371 | so that the lexer can also save a pointer to the\r | |
3372 | parser.\r | |
3373 | \r | |
3374 | There is also a function getParser() in each class\r | |
3375 | with the obvious purpose.\r | |
3376 | \r | |
3377 | It is possible that these functions will return NULL\r | |
3378 | under some circumstances (e.g. a non-DLG lexer is used).\r | |
3379 | \r | |
3380 | #30. (Added to 1.33MR1) function tokenName(int token) standard\r | |
3381 | \r | |
3382 | The generated parser class now includes the\r | |
3383 | function:\r | |
3384 | \r | |
3385 | static const ANTLRChar * tokenName(int token)\r | |
3386 | \r | |
3387 | which returns a pointer to the "name" corresponding\r | |
3388 | to the token.\r | |
3389 | \r | |
3390 | The base class (ANTLRParser) always includes the\r | |
3391 | member function:\r | |
3392 | \r | |
3393 | const ANTLRChar * parserTokenName(int token)\r | |
3394 | \r | |
3395 | which can be accessed by objects which have a pointer\r | |
3396 | to an ANTLRParser, but do not know the name of the\r | |
3397 | parser class (e.g. ANTLRTokenBuffer and DLGLexerBase).\r | |
3398 | \r | |
3399 | #29. (Added to 1.33MR1) Debugging DLG lexers\r | |
3400 | \r | |
3401 | If the pre-processor symbol DEBUG_LEXER is defined\r | |
3402 | then DLexerBase will include code for printing out\r | |
3403 | key information about tokens which are recognized.\r | |
3404 | \r | |
3405 | The debug feature of the lexer is controlled by:\r | |
3406 | \r | |
3407 | int previousDebugValue=lexer.debugLexer(newValue);\r | |
3408 | \r | |
3409 | a value of 0 disables output\r | |
3410 | a value of 1 enables output\r | |
3411 | \r | |
3412 | Even if the lexer debug code is compiled into DLexerBase\r | |
3413 | it must be enabled before any output is generated. For\r | |
3414 | example:\r | |
3415 | \r | |
3416 | DLGFileInput in(stdin);\r | |
3417 | MyDLG lexer(&in,2000);\r | |
3418 | \r | |
3419 | lexer.setToken(&aToken);\r | |
3420 | \r | |
3421 | #if DEBUG_LEXER\r | |
3422 | lexer.debugLexer(1); // enable debug information\r | |
3423 | #endif\r | |
3424 | \r | |
3425 | #28. (Added to 1.33MR1) More control over DLG header\r | |
3426 | \r | |
3427 | Version 1.33MR1 adds the following directives to PCCTS\r | |
3428 | for C++ mode:\r | |
3429 | \r | |
3430 | #lexprefix <<source code>>\r | |
3431 | \r | |
3432 | Adds source code to the DLGLexer.h file\r | |
3433 | after the #include "DLexerBase.h" but\r | |
3434 | before the start of the class definition.\r | |
3435 | \r | |
3436 | #lexmember <<source code>>\r | |
3437 | \r | |
3438 | Adds source code to the DLGLexer.h file\r | |
3439 | as part of the DLGLexer class body. It\r | |
3440 | appears immediately after the start of\r | |
3441 | the class and a "public: statement.\r | |
3442 | \r | |
3443 | #27. (Fixed in 1.33MR1) Comments in DLG actions\r | |
3444 | \r | |
3445 | Previously, DLG would not recognize comments as a special case.\r | |
3446 | Thus, ">>" in the comments would cause errors. This is fixed.\r | |
3447 | \r | |
3448 | #26. (Fixed in 1.33MR1) Removed static variables from error routines\r | |
3449 | \r | |
3450 | Previously, the existence of statically allocated variables\r | |
3451 | in some of the parser's member functions posed a danger when\r | |
3452 | there was more than one parser active.\r | |
3453 | \r | |
3454 | Replaced with dynamically allocated/freed variables in 1.33MR1.\r | |
3455 | \r | |
3456 | #25. (Fixed in 1.33MR1) Use of string literals in semantic predicates\r | |
3457 | \r | |
3458 | Previously, it was not possible to place a string literal in\r | |
3459 | a semantic predicate because it was not properly "stringized"\r | |
3460 | for the report of a failed predicate.\r | |
3461 | \r | |
3462 | #24. (Fixed in 1.33MR1) Continuation lines for semantic predicates\r | |
3463 | \r | |
3464 | Previously, it was not possible to continue semantic\r | |
3465 | predicates across a line because it was not properly\r | |
3466 | "stringized" for the report of a failed predicate.\r | |
3467 | \r | |
3468 | rule : <<ifXYZ()>>?[ a very\r | |
3469 | long statement ]\r | |
3470 | \r | |
3471 | #23. (Fixed in 1.33MR1) {...} envelope for failed semantic predicates\r | |
3472 | \r | |
3473 | Previously, there was a code generation error for failed\r | |
3474 | semantic predicates:\r | |
3475 | \r | |
3476 | rule : <<xyz()>>?[ stmt1; stmt2; ]\r | |
3477 | \r | |
3478 | which generated code which resembled:\r | |
3479 | \r | |
3480 | if (! xyz()) stmt1; stmt2;\r | |
3481 | \r | |
3482 | It now puts the statements in a {...} envelope:\r | |
3483 | \r | |
3484 | if (! xyz()) { stmt1; stmt2; };\r | |
3485 | \r | |
3486 | #22. (Fixed in 1.33MR1) Continuation of #token across lines using "\"\r | |
3487 | \r | |
3488 | Previously, it was not possible to continue a #token regular\r | |
3489 | expression across a line. The trailing "\" and newline caused\r | |
3490 | a newline to be inserted into the regular expression by DLG.\r | |
3491 | \r | |
3492 | Fixed in 1.33MR1.\r | |
3493 | \r | |
3494 | #21. (Fixed in 1.33MR1) Use of ">>" (right shift operator in DLG actions\r | |
3495 | \r | |
3496 | It is now possible to use the C++ right shift operator ">>"\r | |
3497 | in DLG actions by using the normal escapes:\r | |
3498 | \r | |
3499 | #token "shift-right" << value=value \>\> 1;>>\r | |
3500 | \r | |
3501 | #20. (Version 1.33/19-Jan-97 Karl Eccleson <karle@microrobotics.co.uk>\r | |
3502 | P.A. Keller (P.A.Keller@bath.ac.uk)\r | |
3503 | \r | |
3504 | There is a problem due to using exceptions with the -gh option.\r | |
3505 | \r | |
3506 | Suggested fix now in 1.33MR1.\r | |
3507 | \r | |
3508 | #19. (Fixed in 1.33MR1) Tom Piscotti and John Lilley\r | |
3509 | \r | |
3510 | There were problems suppressing messages to stdin and stdout\r | |
3511 | when running in a window environment because some functions\r | |
3512 | which uses fprint were not virtual.\r | |
3513 | \r | |
3514 | Suggested change now in 1.33MR1.\r | |
3515 | \r | |
3516 | I believe all functions containing error messages (excluding those\r | |
3517 | indicating internal inconsistency) have been placed in functions\r | |
3518 | which are virtual.\r | |
3519 | \r | |
3520 | #18. (Version 1.33/ 22-Nov-96) John Bair (jbair@iftime.com)\r | |
3521 | \r | |
3522 | Under some combination of options a required "return _retv" is\r | |
3523 | not generated.\r | |
3524 | \r | |
3525 | Suggested fix now in 1.33MR1.\r | |
3526 | \r | |
3527 | #17. (Version 1.33/3-Sep-96) Ron House (house@helios.usq.edu.au)\r | |
3528 | \r | |
3529 | The routine ASTBase::predorder_action omits two "tree->"\r | |
3530 | prefixes, which results in the preorder_action belonging\r | |
3531 | to the wrong node to be invoked.\r | |
3532 | \r | |
3533 | Suggested fix now in 1.33MR1.\r | |
3534 | \r | |
3535 | #16. (Version 1.33/7-Jun-96) Eli Sternheim <eli@interhdl.com>\r | |
3536 | \r | |
3537 | Routine consumeUntilToken() does not check for end-of-file\r | |
3538 | condition.\r | |
3539 | \r | |
3540 | Suggested fix now in 1.33MR1.\r | |
3541 | \r | |
3542 | #15. (Version 1.33/8 Apr 96) Asgeir Olafsson <olafsson@cstar.ac.com>\r | |
3543 | \r | |
3544 | Problem with tree duplication of doubly linked ASTs in ASTBase.cpp.\r | |
3545 | \r | |
3546 | Suggested fix now in 1.33MR1.\r | |
3547 | \r | |
3548 | #14. (Version 1.33/28-Feb-96) Andreas.Magnusson@mailbox.swipnet.se\r | |
3549 | \r | |
3550 | Problem with definition of operator = (const ANTLRTokenPtr rhs).\r | |
3551 | \r | |
3552 | Suggested fix now in 1.33MR1.\r | |
3553 | \r | |
3554 | #13. (Version 1.33/13-Feb-96) Franklin Chen (chen@adi.com)\r | |
3555 | \r | |
3556 | Sun C++ Compiler 3.0.1 can't compile testcpp/1 due to goto in\r | |
3557 | block with destructors.\r | |
3558 | \r | |
3559 | Apparently fixed. Can't locate "goto".\r | |
3560 | \r | |
3561 | #12. (Version 1.33/10-Nov-95) Minor problems with 1.33 code\r | |
3562 | \r | |
3563 | The following items have been fixed in 1.33MR1:\r | |
3564 | \r | |
3565 | 1. pccts/antlr/main.c line 142\r | |
3566 | \r | |
3567 | "void" appears in classic C code\r | |
3568 | \r | |
3569 | 2. no makefile in support/genmk\r | |
3570 | \r | |
3571 | 3. EXIT_FAILURE/_SUCCESS instead of PCCTS_EXIT_FAILURE/_SUCCESS\r | |
3572 | \r | |
3573 | pccts/h/PCCTSAST.cpp\r | |
3574 | pccts/h/DLexerBase.cpp\r | |
3575 | pccts/testcpp/6/test.g\r | |
3576 | \r | |
3577 | 4. use of "signed int" isn't accepted by AT&T cfront\r | |
3578 | \r | |
3579 | pccts/h/PCCTSAST.h line 42\r | |
3580 | \r | |
3581 | 5. in call to ANTLRParser::FAIL the var arg err_k is passed as\r | |
3582 | "int" but is declared "unsigned int".\r | |
3583 | \r | |
3584 | 6. I believe that a failed validation predicate still does not\r | |
3585 | get put in a "{...}" envelope, despite the release notes.\r | |
3586 | \r | |
3587 | 7. The #token ">>" appearing in the DLG grammar description\r | |
3588 | causes DLG to generate the string literal "\>\>" which\r | |
3589 | is non-conforming and will cause some compilers to\r | |
3590 | complain (scan.c function act10 line 143 of source code).\r | |
3591 | \r | |
3592 | #11. (Version 1.32b6) Dave Kuhlman (dkuhlman@netcom.com)\r | |
3593 | \r | |
3594 | Problem with file close in gen.c. Already fixed in 1.33.\r | |
3595 | \r | |
3596 | #10. (Version 1.32b6/29-Aug-95)\r | |
3597 | \r | |
3598 | pccts/antlr/main.c contains a C++ style comments on lines 149\r | |
3599 | and 176 which causes problems for most C compilers.\r | |
3600 | \r | |
3601 | Already fixed in 1.33.\r | |
3602 | \r | |
3603 | #9. (Version 1.32b4/14-Mar-95) dlgauto.h #include "config.h"\r | |
3604 | \r | |
3605 | The file pccts/h/dlgauto.h should probably contain a #include\r | |
3606 | "config.h" as it uses the #define symbol __USE_PROTOS.\r | |
3607 | \r | |
3608 | Added to 1.33MR1.\r | |
3609 | \r | |
3610 | #8. (Version 1.32b4/6-Mar-95) Michael T. Richter (mtr@igs.net)\r | |
3611 | \r | |
3612 | In C++ output mode anonymous tokens from in-line regular expressions\r | |
3613 | can create enum values which are too wide for the datatype of the enum\r | |
3614 | assigned by the C++ compiler.\r | |
3615 | \r | |
3616 | Fixed in 1.33MR1.\r | |
3617 | \r | |
3618 | #7. (Version 1.32b4/6-Mar-95) C++ does not imply __STDC__\r | |
3619 | \r | |
3620 | In err.h the combination of # directives assumes that a C++\r | |
3621 | compiler has __STDC__ defined. This is not necessarily true.\r | |
3622 | \r | |
3623 | This problem also appears in the use of __USE_PROTOS which\r | |
3624 | is appropriate for both Standard C and C++ in antlr/gen.c\r | |
3625 | and antlr/lex.c\r | |
3626 | \r | |
3627 | Fixed in 1.33MR1.\r | |
3628 | \r | |
3629 | #6. (Version 1.32 ?/15-Feb-95) Name conflict for "TokenType"\r | |
3630 | \r | |
3631 | Already fixed in 1.33.\r | |
3632 | \r | |
3633 | #5. (23-Jan-95) Douglas_Cuthbertson.JTIDS@jtids_qmail.hanscom.af.mil\r | |
3634 | \r | |
3635 | The fail action following a semantic predicate is not enclosed in\r | |
3636 | "{...}". This can lead to problems when the fail action contains\r | |
3637 | more than one statement.\r | |
3638 | \r | |
3639 | Fixed in 1.33MR1.\r | |
3640 | \r | |
3641 | #4 . (Version 1.33/31-Mar-96) jlilley@empathy.com (John Lilley)\r | |
3642 | \r | |
3643 | Put briefly, a semantic predicate ought to abort a guess if it fails.\r | |
3644 | \r | |
3645 | Correction suggested by J. Lilley has been added to 1.33MR1.\r | |
3646 | \r | |
3647 | #3 . (Version 1.33) P.A.Keller@bath.ac.uk\r | |
3648 | \r | |
3649 | Extra commas are placed in the K&R style argument list for rules\r | |
3650 | when using both exceptions and ASTs.\r | |
3651 | \r | |
3652 | Fixed in 1.33MR1.\r | |
3653 | \r | |
3654 | #2. (Version 1.32b6/2-Oct-95) Brad Schick <schick@interaccess.com>\r | |
3655 | \r | |
3656 | Construct #[] generates zzastnew() in C++ mode.\r | |
3657 | \r | |
3658 | Already fixed in 1.33.\r | |
3659 | \r | |
3660 | #1. (Version 1.33) Bob Bailey (robert@oakhill.sps.mot.com)\r | |
3661 | \r | |
3662 | Previously, config.h assumed that all PC systems required\r | |
3663 | "short" file names. The user can now override that\r | |
3664 | assumption with "#define LONGFILENAMES".\r | |
3665 | \r | |
3666 | Added to 1.33MR1.\r |