]> git.proxmox.com Git - mirror_edk2.git/blob - Tools/CCode/Source/Pccts/CHANGES_FROM_133.txt
More moves for Tool Packages
[mirror_edk2.git] / Tools / CCode / Source / Pccts / CHANGES_FROM_133.txt
1 =======================================================================
2 List of Implemented Fixes and Changes for Maintenance Releases of PCCTS
3 =======================================================================
4
5 DISCLAIMER
6
7 The software and these notes are provided "as is". They may include
8 typographical or technical errors and their authors disclaims all
9 liability of any kind or nature for damages due to error, fault,
10 defect, or deficiency regardless of cause. All warranties of any
11 kind, either express or implied, including, but not limited to, the
12 implied warranties of merchantability and fitness for a particular
13 purpose are disclaimed.
14
15
16 -------------------------------------------------------
17 Note: Items #153 to #1 are now in a separate file named
18 CHANGES_FROM_133_BEFORE_MR13.txt
19 -------------------------------------------------------
20
21 #312. (Changed in MR33) Bug caused by change #299.
22
23 In change #299 a warning message was suppressed when there was
24 no LT(1) in a semantic predicate and max(k,ck) was 1. The
25 changed caused the code which set a default predicate depth for
26 the semantic predicate to be left as 0 rather than set to 1.
27
28 This manifested as an error at line #1559 of mrhost.c
29
30 Reported by Peter Dulimov.
31
32 #311. (Changed in MR33) Added sorcer/lib to Makefile.
33
34 Reported by Dale Martin.
35
36 #310. (Changed in MR32) In C mode zzastPush was spelled zzastpush in one case.
37
38 Reported by Jean-Claude Durand
39
40 #309. (Changed in MR32) Renamed baseName because of VMS name conflict
41
42 Renamed baseName to pcctsBaseName to avoid library name conflict with
43 VMS library routine. Reported by Jean-François PIÉRONNE.
44
45 #308. (Changed in MR32) Used "template" as name of formal in C routine
46
47 In astlib.h routine ast_scan a formal was named "template". This caused
48 problems when the C code was compiled with a C++ compiler. Reported by
49 Sabyasachi Dey.
50
51 #307. (Changed in MR31) Compiler dependent bug in function prototype generation
52
53 The code which generated function prototypes contained a bug which
54 was compiler/optimization dependent. Under some circumstance an
55 extra character would be included in portions of a function prototype.
56
57 Reported by David Cook.
58
59 #306. (Changed in MR30) Validating predicate following a token
60
61 A validating predicate which immediately followed a token match
62 consumed the token after the predicate rather than before. Prior
63 to this fix (in the following example) isValidTimeScaleValue() in
64 the predicate would test the text for TIMESCALE rather than for
65 NUMBER:
66
67 time_scale :
68 TIMESCALE
69 <<isValidTimeScaleValue(LT(1)->getText())>>?
70 ts:NUMBER
71 ( us:MICROSECOND << tVal = ...>>
72 | ns:NANOSECOND << tVal = ... >>
73 )
74
75 Reported by Adalbert Perbandt.
76
77 #305. (Changed in MR30) Alternatives with guess blocks inside (...)* blocks.
78
79 In MR14 change #175 fixed a bug in the prediction expressions for guess
80 blocks which were of the form (alpha)? beta. Unfortunately, this
81 resulted in a new bug as exemplified by the example below, which computed
82 the first set for r as {B} rather than {B C}:
83
84 r : ( (A)? B
85 | C
86 )*
87
88 This example doesn't make any sense as A is not a prefix of B, but it
89 illustrates the problem. This bug did not appear for:
90
91 r : ( (A)?
92 | C
93 )*
94
95 because it does not use the (alpha)? beta form.
96
97 Item #175 fixed an asymmetry in ambiguity messages for the following
98 constructs which appear to have identical ambiguities (between repeating
99 the loop vs. exiting the loop). MR30 retains this fix, but the implementation
100 is slightly different.
101
102 r_star : ( (A B)? )* A ;
103 r_plus : ( (A B)? )+ A ;
104
105 Reported by Arpad Beszedes (beszedes inf.u-szeged.hu).
106
107 #304. (Changed in MR30) Crash when mismatch between output value counts.
108
109 For a rule such as:
110
111 r1 : r2>[i,j];
112 r2 >[int i, int j] : A;
113
114 If there were extra actuals for the reference to rule r2 from rule r1
115 there antlr would crash. This bug was introduced by change #276.
116
117 Reported by Sinan Karasu.
118
119 #303. (Changed in MR30) DLGLexerBase::replchar
120
121 DLGLexerBase::replchar and the C mode routine zzreplchar did not work
122 properly when the new character was 0.
123
124 Reported with fix by Philippe Laporte
125
126 #302. (Changed in MR28) Fix significant problems in initial release of MR27.
127
128 #301. (Changed in MR27) Default tab stops set to 2 spaces.
129
130 To have antlr generate true tabs rather than spaces, use "antlr -tab 0".
131 To generate 4 spaces per tab stop use "antlr -tab 4"
132
133 #300. (Changed in MR27)
134
135 Consider the following methods of constructing an AST from ID:
136
137 rule1!
138 : id:ID << #0 = #[id]; >> ;
139
140 rule2!
141 : id:ID << #0 = #id; >> ;
142
143 rule3
144 : ID ;
145
146 rule4
147 : id:ID << #0 = #id; >> ;
148
149 For rule_2, the AST corresponding to id would always be NULL. This
150 is because the user explicitly suppressed AST construction using the
151 "!" operator on the rule. In MR27 the use of an AST expression
152 such as #id overrides the "!" operator and forces construction of
153 the AST.
154
155 This fix does not apply to C mode ASTs when the ASTs are referenced
156 using numbers rather than symbols.
157
158 For C mode, this requires that the (optional) function/macro zzmk_ast
159 be defined. This functions copies information from an attribute into
160 a previously allocated AST.
161
162 Reported by Jan Langer (jan langernetz.de)
163
164 #299. (Changed in MR27) Don't warn if k=1 and semantic predicate missing LT(i)
165
166 If a semantic does not have a reference to LT(i) or (C mode LATEXT(i))
167 then pccts doesn't know how many lookahead tokens to use for context.
168 However, if max(k,ck) is 1 then there is really only one choice and
169 the warning is unnecessary.
170
171 #298. (Changed in MR27) Removed "register" for lastpos in dlgauto.c zzgettok
172
173 #297. (Changed in MR27) Incorrect prototypes when used with classic C
174
175 There were a number of errors in function headers when antlr was
176 built with compilers that do not have __STDC__ or __cplusplus set.
177
178 The functions which have variable length argument lists now use
179 PCCTS_USE_STDARG rather than __USE_PROTOTYPES__ to determine
180 whether to use stdargs or varargs.
181
182 #296. (Changed in MR27) Complex return types in rules.
183
184 The following return type was not properly handled when
185 unpacking a struct with containing multiple return values:
186
187 rule > [int i, IIR_Bool (IIR_Decl::*constraint)()] : ...
188
189 Instead of using "constraint", the program got lost and used
190 an empty string.
191
192 Reported by P.A. Wilsey.
193
194 #295. (Changed in MR27) Extra ";" following zzGUESS_DONE sometimes.
195
196 Certain constructs with guess blocks in MR23 led to extra ";"
197 preceding the "else" clause of an "if".
198
199 Reported by P.A. Wilsey.
200
201 #294. (Changed in MR27) Infinite loop in antlr for nested blocks
202
203 An oversight in detecting an empty alternative sometimes led
204 to an infinite loop in antlr when it encountered a rule with
205 nested blocks and guess blocks.
206
207 Reported by P.A. Wilsey.
208
209 #293. (Changed in MR27) Sorcerer optimization of _t->type()
210
211 Sorcerer generated code may contain many calls to _t->type() in a
212 single statement. This change introduces a temporary variable
213 to eliminate unnnecesary function calls.
214
215 Change implemented by Tom Molteno (tim videoscript.com).
216
217 #292. (Changed in MR27)
218
219 WARNING: Item #267 changes the signature of methods in the AST class.
220
221 **** Be sure to revise your AST functions of the same name ***
222
223 #291. (Changed in MR24)
224
225 Fix to serious code generation error in MR23 for (...)+ block.
226
227 #290. (Changed in MR23)
228
229 Item #247 describes a change in the way {...} blocks handled
230 an error. Consider:
231
232 r1 : {A} b ;
233 b : B;
234
235 with input "C".
236
237 Prior to change #247, the error would resemble "expected B -
238 found C". This is correct but incomplete, and therefore
239 misleading. In #247 it was changed to "expected A, B - found
240 C". This was fine, except for users of parser exception
241 handling because the exception was generated in the epilogue
242 for {...} block rather than in rule b. This made it difficult
243 for users of parser exception handling because B was not
244 expected in that context. Those not using parser exception
245 handling didn't notice the difference.
246
247 The current change restores the behavior prior to #247 when
248 parser exceptions are present, but retains the revised behavior
249 otherwise. This change should be visible only when exceptions
250 are in use and only for {...} blocks and sub-blocks of the form
251 (something|something | something | epsilon) where epsilon represents
252 an empty production and it is the last alternative of a sub-block.
253 In contrast, (something | epsilon | something) should generate the
254 same code as before, even when exceptions are used.
255
256 Reported by Philippe Laporte (philippe at transvirtual.com).
257
258 #289. (Changed in MR23) Bug in matching complement of a #tokclass
259
260 Prior to MR23 when a #tokclass was matched in both its complemented form
261 and uncomplemented form, the bit set generated for its first use was used
262 for both cases. However, the prediction expression was correctly computed
263 in both cases. This meant that the second case would never be matched
264 because, for the second appearance, the prediction expression and the
265 set to be matched would be complements of each other.
266
267 Consider:
268
269 #token A "a"
270 #token B "b"
271 #token C "c"
272 #tokclass AB {A B}
273
274 r1 : AB /* alt 1x */
275 | ~AB /* alt 1y */
276 ;
277
278 Prior to MR23, this resulted in alternative 1y being unreachable. Had it
279 been written:
280
281 r2 : ~AB /* alt 2x */
282 : AB /* alt 2y */
283
284 then alternative 2y would have become unreachable.
285
286 This bug was only for the case of complemented #tokclass. For complemented
287 #token the proper code was generated.
288
289 #288. (Changed in MR23) #errclass not restricted to choice points
290
291 The #errclass directive is supposed to allow a programmer to define
292 print strings which should appear in syntax error messages as a replacement
293 for some combinations of tokens. For instance:
294
295 #errclass Operator {PLUS MINUS TIMES DIVIDE}
296
297 If a syntax message includes all four of these tokens, and there is no
298 "better" choice of error class, the word "Operator" will be used rather
299 than a list of the four token names.
300
301 Prior to MR23 the #errclass definitions were used only at choice points
302 (which call the FAIL macro). In other cases where there was no choice
303 (e.g. where a single token or token class were matched) the #errclass
304 information was not used.
305
306 With MR23 the #errclass declarations are used for syntax error messages
307 when matching a #tokclass, a wildcard (i.e. "*"), or the complement of a
308 #token or #tokclass (e.g. ~Operator).
309
310 Please note that #errclass may now be defined using #tokclass names
311 (see Item #284).
312
313 Reported by Philip A. Wilsey.
314
315 #287. (Changed in MR23) Print name for #tokclass
316
317 Item #148 describes how to give a print name to a #token so that,for
318 example, #token ID could have the expression "identifier" in syntax
319 error messages. This has been extended to #tokclass:
320
321 #token ID("identifier") "[a-zA-Z]+"
322 #tokclass Primitive("primitive type")
323 {INT, FLOAT, CHAR, FLOAT, DOUBLE, BOOL}
324
325 This is really a cosmetic change, since #tokclass names do not appear
326 in any error messages.
327
328 #286. (Changed in MR23) Makefile change to use of cd
329
330 In cases where a pccts subdirectory name matched a directory identified
331 in a $CDPATH environment variable the build would fail. All makefile
332 cd commands have been changed from "cd xyz" to "cd ./xyz" in order
333 to avoid this problem.
334
335 #285. (Changed in MR23) Check for null pointers in some dlg structures
336
337 An invalid regular expression can cause dlg to build an invalid
338 structure to represent the regular expression even while it issues
339 error messages. Additional pointer checks were added.
340
341 Reported by Robert Sherry.
342
343 #284. (Changed in MR23) Allow #tokclass in #errclass definitions
344
345 Previously, a #tokclass reference in the definition of an
346 #errclass was not handled properly. Instead of being expanded
347 into the set of tokens represented by the #tokclass it was
348 treated somewhat like an #errclass. However, in a later phase
349 when all #errclass were expanded into the corresponding tokens
350 the #tokclass reference was not expanded (because it wasn't an
351 #errclass). In effect the reference was ignored.
352
353 This has been fixed.
354
355 Problem reported by Mike Dimmick (mike dimmick.demon.co.uk).
356
357 #283. (Changed in MR23) Option -tmake invoke's parser's tmake
358
359 When the string #(...) appears in an action antlr replaces it with
360 a call to ASTBase::tmake(...) to construct an AST. It is sometimes
361 useful to change the tmake routine so that it has access to information
362 in the parser - something which is not possible with a static method
363 in an application where they may be multiple parsers active.
364
365 The antlr option -tmake replaces the call to ASTBase::tmake with a call
366 to a user supplied tmake routine.
367
368 #282. (Changed in MR23) Initialization error for DBG_REFCOUNTTOKEN
369
370 When the pre-processor symbol DBG_REFCOUNTTOKEN is defined
371 incorrect code is generated to initialize ANTLRRefCountToken::ctor and
372 dtor.
373
374 Fix reported by Sven Kuehn (sven sevenkuehn.de).
375
376 #281. (Changed in MR23) Addition of -noctor option for Sorcerer
377
378 Added a -noctor option to suppress generation of the blank ctor
379 for users who wish to define their own ctor.
380
381 Contributed by Jan Langer (jan langernetz.de).
382
383 #280. (Changed in MR23) Syntax error message for EOF token
384
385 The EOF token now receives special treatment in syntax error messages
386 because there is no text matched by the eof token. The token name
387 of the eof token is used unless it is "@" - in which case the string
388 "<eof>" is used.
389
390 Problem reported by Erwin Achermann (erwin.achermann switzerland.org).
391
392 #279. (Changed in MR23) Exception groups
393
394 There was a bug in the way that exception groups were attached to
395 alternatives which caused problems when there was a block contained
396 in an alternative. For instance, in the following rule;
397
398 statement : IF S { ELSE S }
399 exception ....
400 ;
401
402 the exception would be attached to the {...} block instead of the
403 entire alternative because it was attached, in error, to the last
404 alternative instead of the last OPEN alternative.
405
406 Reported by Ty Mordane (tymordane hotmail.com).
407
408 #278. (Changed in MR23) makefile changes
409
410 Contributed by Tomasz Babczynski (faster lab05-7.ict.pwr.wroc.pl).
411
412 The -cfile option is not absolutely needed: when extension of
413 source file is one of the well-known C/C++ extensions it is
414 treated as C/C++ source
415
416 The gnu make defines the CXX variable as the default C++ compiler
417 name, so I added a line to copy this (if defined) to the CCC var.
418
419 Added a -sor option: after it any -class command defines the class
420 name for sorcerer, not for ANTLR. A file extended with .sor is
421 treated as sorcerer input. Because sorcerer can be called multiple
422 times, -sor option can be repeated. Any files and classes (one class
423 per group) after each -sor makes one tree parser.
424
425 Not implemented:
426
427 1. Generate dependences for user c/c++ files.
428 2. Support for -sor in c mode not.
429
430 I have left the old genmk program in the directory as genmk_old.c.
431
432 #277. (Changed in MR23) Change in macro for failed semantic predicates
433
434 In the past, a semantic predicate that failed generated a call to
435 the macro zzfailed_pred:
436
437 #ifndef zzfailed_pred
438 #define zzfailed_pred(_p) \
439 if (guessing) { \
440 zzGUESS_FAIL; \
441 } else { \
442 something(_p)
443 }
444 #endif
445
446 If a user wished to use the failed action option for semantic predicates:
447
448 rule : <<my_predicate>>? [my_fail_action] A
449 | ...
450
451
452 the code for my_fail_action would have to contain logic for handling
453 the guess part of the zzfailed_pred macro. The user should not have
454 to be aware of the guess logic in writing the fail action.
455
456 The zzfailed_pred has been rewritten to have three arguments:
457
458 arg 1: the stringized predicate of the semantic predicate
459 arg 2: 0 => there is no user-defined fail action
460 1 => there is a user-defined fail action
461 arg 3: the user-defined fail action (if defined)
462 otherwise a no-operation
463
464 The zzfailed_pred macro is now defined as:
465
466 #ifndef zzfailed_pred
467 #define zzfailed_pred(_p,_hasuseraction,_useraction) \
468 if (guessing) { \
469 zzGUESS_FAIL; \
470 } else { \
471 zzfailed_pred_action(_p,_hasuseraction,_useraction) \
472 }
473 #endif
474
475
476 With zzfailed_pred_action defined as:
477
478 #ifndef zzfailed_pred_action
479 #define zzfailed_pred_action(_p,_hasuseraction,_useraction) \
480 if (_hasUserAction) { _useraction } else { failedSemanticPredicate(_p); }
481 #endif
482
483 In C++ mode failedSemanticPredicate() is a virtual function.
484 In C mode the default action is a fprintf statement.
485
486 Suggested by Erwin Achermann (erwin.achermann switzerland.org).
487
488 #276. (Changed in MR23) Addition of return value initialization syntax
489
490 In an attempt to reduce the problems caused by the PURIFY macro I have
491 added new syntax for initializing the return value of rules and the
492 antlr option "-nopurify".
493
494 A rule with a single return argument:
495
496 r1 > [Foo f = expr] :
497
498 now generates code that resembles:
499
500 Foo r1(void) {
501 Foo _retv = expr;
502 ...
503 }
504
505 A rule with more than one return argument:
506
507 r2 > [Foo f = expr1, Bar b = expr2 ] :
508
509 generates code that resembles:
510
511 struct _rv1 {
512 Foo f;
513 Bar b;
514 }
515
516 _rv1 r2(void) {
517 struct _rv1 _retv;
518 _retv.f = expr1;
519 _retv.b = expr2;
520 ...
521 }
522
523 C++ style comments appearing in the initialization list may cause problems.
524
525 #275. (Changed in MR23) Addition of -nopurify option to antlr
526
527 A long time ago the PURIFY macro was introduced to initialize
528 return value arguments and get rid of annying messages from program
529 that checked for unitialized variables.
530
531 This has caused significant annoyance for C++ users that had
532 classes with virtual functions or non-trivial contructors because
533 it would zero the object, including the pointer to the virtual
534 function table. This could be defeated by redefining
535 the PURIFY macro to be empty, but it was a constant surprise to
536 new C++ users of pccts.
537
538 I would like to remove it, but I fear that some existing programs
539 depend on it and would break. My temporary solution is to add
540 an antlr option -nopurify which disables generation of the PURIFY
541 macro call.
542
543 The PURIFY macro should be avoided in favor of the new syntax
544 for initializing return arguments described in item #275.
545
546 To avoid name clash, the PURIFY macro has been renamed PCCTS_PURIFY.
547
548 #274. (Changed in MR23) DLexer.cpp renamed to DLexer.h
549 (Changed in MR23) ATokPtr.cpp renamed to ATokPtrImpl.h
550
551 These two files had .cpp extensions but acted like .h files because
552 there were included in other files. This caused problems for many IDE.
553 I have renamed them. The ATokPtrImpl.h was necessary because there was
554 already an ATokPtr.h.
555
556 #273. (Changed in MR23) Default win32 library changed to multi-threaded DLL
557
558 The model used for building the Win32 debug and release libraries has changed
559 to multi-threaded DLL.
560
561 To make this change in your MSVC 6 project:
562
563 Project -> Settings
564 Select the C++ tab in the right pane of the dialog box
565 Select "Category: Code Generation"
566 Under "Use run-time library" select one of the following:
567
568 Multi-threaded DLL
569 Debug Multi-threaded DLL
570
571 Suggested by Bill Menees (bill.menees gogallagher.com)
572
573 #272. (Changed in MR23) Failed semantic predicate reported via virtual function
574
575 In the past, a failed semantic predicated reported the problem via a
576 macro which used fprintf(). The macro now expands into a call on
577 the virtual function ANTLRParser::failedSemanticPredicate().
578
579 #271. (Changed in MR23) Warning for LT(i), LATEXT(i) in token match actions
580
581 An bug (or at least an oddity) is that a reference to LT(1), LA(1),
582 or LATEXT(1) in an action which immediately follows a token match
583 in a rule refers to the token matched, not the token which is in
584 the lookahead buffer. Consider:
585
586 r : abc <<action alpha>> D <<action beta>> E;
587
588 In this case LT(1) in action alpha will refer to the next token in
589 the lookahead buffer ("D"), but LT(1) in action beta will refer to
590 the token matched by D - the preceding token.
591
592 A warning has been added for users about this when an action
593 following a token match contains a reference to LT(1), LA(1), or LATEXT(1).
594
595 This behavior should be changed, but it appears in too many programs
596 now. Another problem, perhaps more significant, is that the obvious
597 fix (moving the consume() call to before the action) could change the
598 order in which input is requested and output appears in existing programs.
599
600 This problem was reported, along with a fix by Benjamin Mandel
601 (beny sd.co.il). However, I felt that changing the behavior was too
602 dangerous for existing code.
603
604 #270. (Changed in MR23) Removed static objects from PCCTSAST.cpp
605
606 There were some statically allocated objects in PCCTSAST.cpp
607 These were changed to non-static.
608
609 #269. (Changed in MR23) dlg output for initializing static array
610
611 The output from dlg contains a construct similar to the
612 following:
613
614 struct XXX {
615 static const int size;
616 static int array1[5];
617 };
618
619 const int XXX::size = 4;
620 int XXX::array1[size+1];
621
622
623 The problem is that although the expression "size+1" used in
624 the definition of array1 is equal to 5 (the expression used to
625 declare array), it is not considered equivalent by some compilers.
626
627 Reported with fix by Volker H. Simonis (simonis informatik.uni-tuebingen.de)
628
629 #268. (Changed in MR23) syn() routine output when k > 1
630
631 The syn() routine is supposed to print out the text of the
632 token causing the syntax error. It appears that it always
633 used the text from the first lookahead token rather than the
634 appropriate one. The appropriate one is computed by comparing
635 the token codes of lookahead token i (for i = 1 to k) with
636 the FIRST(i) set.
637
638 This has been corrected in ANTLRParser::syn().
639
640 Reported by Bill Menees (bill.menees gogallagher.com)
641
642 #267. (Changed in MR23) AST traversal functions client data argument
643
644 The AST traversal functions now take an extra (optional) parameter
645 which can point to client data:
646
647 preorder_action(void* pData = NULL)
648 preorder_before_action(void* pData = NULL)
649 preorder_after_action(void* pData = NULL)
650
651 **** Warning: this changes the AST signature. ***
652 **** Be sure to revise your AST functions of the same name ***
653
654 Bill Menees (bill.menees gogallagher.com)
655
656 #266. (Changed in MR23) virtual function printMessage()
657
658 Bill Menees (bill.menees gogallagher.com) has completed the
659 tedious taks of replacing all calls to fprintf() with calls
660 to the virtual function printMessage(). For classes which
661 have a pointer to the parser it forwards the printMessage()
662 call to the parser's printMessage() routine.
663
664 This should make it significanly easier to redirect pccts
665 error and warning messages.
666
667 #265. (Changed in MR23) Remove "labase++" in C++ mode
668
669 In C++ mode labase++ is called when a token is matched.
670 It appears that labase is not used in C++ mode at all, so
671 this code has been commented out.
672
673 #264. (Changed in MR23) Complete rewrite of ParserBlackBox.h
674
675 The parser black box (PBlackBox.h) was completely rewritten
676 by Chris Uzdavinis (chris atdesk.com) to improve its robustness.
677
678 #263. (Changed in MR23) -preamble and -preamble_first rescinded
679
680 Changes for item #253 have been rescinded.
681
682 #262. (Changed in MR23) Crash with -alpha option during traceback
683
684 Under some circumstances a -alpha traceback was started at the
685 "wrong" time. As a result, internal data structures were not
686 initialized.
687
688 Reported by Arpad Beszedes (beszedes inf.u-szeged.hu).
689
690 #261. (Changed in MR23) Defer token fetch for C++ mode
691
692 Item #216 has been revised to indicate that use of the defer fetch
693 option (ZZDEFER_FETCH) requires dlg option -i.
694
695 #260. (MR22) Raise default lex buffer size from 8,000 to 32,000 bytes.
696
697 ZZLEXBUFSIZE is the size (in bytes) of the buffer used by dlg
698 generated lexers. The default value has been raised to 32,000 and
699 the value used by antlr, dlg, and sorcerer has also been raised to
700 32,000.
701
702 #259. (MR22) Default function arguments in C++ mode.
703
704 If a rule is declared:
705
706 rr [int i = 0] : ....
707
708 then the declaration generated by pccts resembles:
709
710 void rr(int i = 0);
711
712 however, the definition must omit the default argument:
713
714 void rr(int i) {...}
715
716 In the past the default value was not omitted. In MR22
717 the generated code resembles:
718
719 void rr(int i /* = 0 */ ) {...}
720
721 Implemented by Volker H. Simonis (simonis informatik.uni-tuebingen.de)
722
723
724 Note: In MR23 this was changed so that nested C style comments
725 ("/* ... */") would not cause problems.
726
727 #258. (MR22) Using a base class for your parser
728
729 In item #102 (MR10) the class statement was extended to allow one
730 to specify a base class other than ANTLRParser for the generated
731 parser. It turned out that this was less than useful because
732 the constructor still specified ANTLRParser as the base class.
733
734 The class statement now uses the first identifier appearing after
735 the ":" as the name of the base class. For example:
736
737 class MyParser : public FooParser {
738
739 Generates in MyParser.h:
740
741 class MyParser : public FooParser {
742
743 Generates in MyParser.cpp something that resembles:
744
745 MyParser::MyParser(ANTLRTokenBuffer *input) :
746 FooParser(input,1,0,0,4)
747 {
748 token_tbl = _token_tbl;
749 traceOptionValueDefault=1; // MR10 turn trace ON
750 }
751
752 The base class constructor must have a signature similar to
753 that of ANTLRParser.
754
755 #257. (MR21a) Removed dlg statement that -i has no effect in C++ mode.
756
757 This was incorrect.
758
759 #256. (MR21a) Malformed syntax graph causes crash after error message.
760
761 In the past, certain kinds of errors in the very first grammar
762 element could cause the construction of a malformed graph
763 representing the grammar. This would eventually result in a
764 fatal internal error. The code has been changed to be more
765 resistant to this particular error.
766
767 #255. (MR21a) ParserBlackBox(FILE* f)
768
769 This constructor set openByBlackBox to the wrong value.
770
771 Reported by Kees Bakker (kees_bakker tasking.nl).
772
773 #254. (MR21a) Reporting syntax error at end-of-file
774
775 When there was a syntax error at the end-of-file the syntax
776 error routine would substitute "<eof>" for the programmer's
777 end-of-file symbol. This substitution is now done only when
778 the programmer does not define his own end-of-file symbol
779 or the symbol begins with the character "@".
780
781 Reported by Kees Bakker (kees_bakker tasking.nl).
782
783 #253. (MR21) Generation of block preamble (-preamble and -preamble_first)
784
785 *** This change was rescinded by item #263 ***
786
787 The antlr option -preamble causes antlr to insert the code
788 BLOCK_PREAMBLE at the start of each rule and block. It does
789 not insert code before rules references, token references, or
790 actions. By properly defining the macro BLOCK_PREAMBLE the
791 user can generate code which is specific to the start of blocks.
792
793 The antlr option -preamble_first is similar, but inserts the
794 code BLOCK_PREAMBLE_FIRST(PreambleFirst_123) where the symbol
795 PreambleFirst_123 is equivalent to the first set defined by
796 the #FirstSetSymbol described in Item #248.
797
798 I have not investigated how these options interact with guess
799 mode (syntactic predicates).
800
801 #252. (MR21) Check for null pointer in trace routine
802
803 When some trace options are used when the parser is generated
804 without the trace enabled, the current rule name may be a
805 NULL pointer. A guard was added to check for this in
806 restoreState.
807
808 Reported by Douglas E. Forester (dougf projtech.com).
809
810 #251. (MR21) Changes to #define zzTRACE_RULES
811
812 The macro zzTRACE_RULES was being use to pass information to
813 AParser.h. If this preprocessor symbol was not properly
814 set the first time AParser.h was #included, the declaration
815 of zzTRACEdata would be omitted (it is used by the -gd option).
816 Subsequent #includes of AParser.h would be skipped because of
817 the #ifdef guard, so the declaration of zzTracePrevRuleName would
818 never be made. The result was that proper compilation was very
819 order dependent.
820
821 The declaration of zzTRACEdata was made unconditional and the
822 problem of removing unused declarations will be left to optimizers.
823
824 Diagnosed by Douglas E. Forester (dougf projtech.com).
825
826 #250. (MR21) Option for EXPERIMENTAL change to error sets for blocks
827
828 The antlr option -mrblkerr turns on an experimental feature
829 which is supposed to provide more accurate syntax error messages
830 for k=1, ck=1 grammars. When used with k>1 or ck>1 grammars the
831 behavior should be no worse than the current behavior.
832
833 There is no problem with the matching of elements or the computation
834 of prediction expressions in pccts. The task is only one of listing
835 the most appropriate tokens in the error message. The error sets used
836 in pccts error messages are approximations of the exact error set when
837 optional elements in (...)* or (...)+ are involved. While entirely
838 correct, the error messages are sometimes not 100% accurate.
839
840 There is also a minor philosophical issue. For example, suppose the
841 grammar expects the token to be an optional A followed by Z, and it
842 is X. X, of course, is neither A nor Z, so an error message is appropriate.
843 Is it appropriate to say "Expected Z" ? It is correct, it is accurate,
844 but it is not complete.
845
846 When k>1 or ck>1 the problem of providing the exactly correct
847 list of tokens for the syntax error messages ends up becoming
848 equivalent to evaluating the prediction expression for the
849 alternatives twice. However, for k=1 ck=1 grammars the prediction
850 expression can be computed easily and evaluated cheaply, so I
851 decided to try implementing it to satisfy a particular application.
852 This application uses the error set in an interactive command language
853 to provide prompts which list the alternatives available at that
854 point in the parser. The user can then enter additional tokens to
855 complete the command line. To do this required more accurate error
856 sets then previously provided by pccts.
857
858 In some cases the default pccts behavior may lead to more robust error
859 recovery or clearer error messages then having the exact set of tokens.
860 This is because (a) features like -ge allow the use of symbolic names for
861 certain sets of tokens, so having extra tokens may simply obscure things
862 and (b) the error set is use to resynchronize the parser, so a good
863 choice is sometimes more important than having the exact set.
864
865 Consider the following example:
866
867 Note: All examples code has been abbreviated
868 to the absolute minimum in order to make the
869 examples concise.
870
871 star1 : (A)* Z;
872
873 The generated code resembles:
874
875 old new (with -mrblkerr)
876 --//----------- --------------------
877 for (;;) { for (;;) {
878 match(A); match(A);
879 } }
880 match(Z); if (! A and ! Z) then
881 FAIL(...{A,Z}...);
882 }
883 match(Z);
884
885
886 With input X
887 old message: Found X, expected Z
888 new message: Found X, expected A, Z
889
890 For the example:
891
892 star2 : (A|B)* Z;
893
894 old new (with -mrblkerr)
895 ------------- --------------------
896 for (;;) { for (;;) {
897 if (!A and !B) break; if (!A and !B) break;
898 if (...) { if (...) {
899 <same ...> <same ...>
900 } }
901 else { else {
902 FAIL(...{A,B,Z}...) FAIL(...{A,B}...);
903 } }
904 } }
905 match(B); if (! A and ! B and !Z) then
906 FAIL(...{A,B,Z}...);
907 }
908 match(B);
909
910 With input X
911 old message: Found X, expected Z
912 new message: Found X, expected A, B, Z
913 With input A X
914 old message: Found X, expected Z
915 new message: Found X, expected A, B, Z
916
917 This includes the choice of looping back to the
918 star block.
919
920 The code for plus blocks:
921
922 plus1 : (A)+ Z;
923
924 The generated code resembles:
925
926 old new (with -mrblkerr)
927 ------------- --------------------
928 do { do {
929 match(A); match(A);
930 } while (A) } while (A)
931 match(Z); if (! A and ! Z) then
932 FAIL(...{A,Z}...);
933 }
934 match(Z);
935
936 With input A X
937 old message: Found X, expected Z
938 new message: Found X, expected A, Z
939
940 This includes the choice of looping back to the
941 plus block.
942
943 For the example:
944
945 plus2 : (A|B)+ Z;
946
947 old new (with -mrblkerr)
948 ------------- --------------------
949 do { do {
950 if (A) { <same>
951 match(A); <same>
952 } else if (B) { <same>
953 match(B); <same>
954 } else { <same>
955 if (cnt > 1) break; <same>
956 FAIL(...{A,B,Z}...) FAIL(...{A,B}...);
957 } }
958 cnt++; <same>
959 } }
960
961 match(Z); if (! A and ! B and !Z) then
962 FAIL(...{A,B,Z}...);
963 }
964 match(B);
965
966 With input X
967 old message: Found X, expected A, B, Z
968 new message: Found X, expected A, B
969 With input A X
970 old message: Found X, expected Z
971 new message: Found X, expected A, B, Z
972
973 This includes the choice of looping back to the
974 star block.
975
976 #249. (MR21) Changes for DEC/VMS systems
977
978 Jean-François Piéronne (jfp altavista.net) has updated some
979 VMS related command files and fixed some minor problems related
980 to building pccts under the DEC/VMS operating system. For DEC/VMS
981 users the most important differences are:
982
983 a. Revised makefile.vms
984 b. Revised genMMS for genrating VMS style makefiles.
985
986 #248. (MR21) Generate symbol for first set of an alternative
987
988 pccts can generate a symbol which represents the tokens which may
989 appear at the start of a block:
990
991 rr : #FirstSetSymbol(rr_FirstSet) ( Foo | Bar ) ;
992
993 This will generate the symbol rr_FirstSet of type SetWordType with
994 elements Foo and Bar set. The bits can be tested using code similar
995 to the following:
996
997 if (set_el(Foo, &rr_FirstSet)) { ...
998
999 This can be combined with the C array zztokens[] or the C++ routine
1000 tokenName() to get the print name of the token in the first set.
1001
1002 The size of the set is given by the newly added enum SET_SIZE, a
1003 protected member of the generated parser's class. The number of
1004 elements in the generated set will not be exactly equal to the
1005 value of SET_SIZE because of synthetic tokens created by #tokclass,
1006 #errclass, the -ge option, and meta-tokens such as epsilon, and
1007 end-of-file.
1008
1009 The #FirstSetSymbol must appear immediately before a block
1010 such as (...)+, (...)*, and {...}, and (...). It may not appear
1011 immediately before a token, a rule reference, or action. However
1012 a token or rule reference can be enclosed in a (...) in order to
1013 make the use of #pragma FirstSetSymbol legal.
1014
1015 rr_bad : #FirstSetSymbol(rr_bad_FirstSet) Foo; // Illegal
1016
1017 rr_ok : #FirstSetSymbol(rr_ok_FirstSet) (Foo); // Legal
1018
1019 Do not confuse FirstSetSymbol sets with the sets used for testing
1020 lookahead. The sets used for FirstSetSymbol have one element per bit,
1021 so the number of bytes is approximately the largest token number
1022 divided by 8. The sets used for testing lookahead store 8 lookahead
1023 sets per byte, so the length of the array is approximately the largest
1024 token number.
1025
1026 If there is demand, a similar routine for follow sets can be added.
1027
1028 #247. (MR21) Misleading error message on syntax error for optional elements.
1029
1030 ===================================================
1031 The behavior has been revised when parser exception
1032 handling is used. See Item #290
1033 ===================================================
1034
1035 Prior to MR21, tokens which were optional did not appear in syntax
1036 error messages if the block which immediately followed detected a
1037 syntax error.
1038
1039 Consider the following grammar which accepts Number, Word, and Other:
1040
1041 rr : {Number} Word;
1042
1043 For this rule the code resembles:
1044
1045 if (LA(1) == Number) {
1046 match(Number);
1047 consume();
1048 }
1049 match(Word);
1050
1051 Prior to MR21, the error message for input "$ a" would be:
1052
1053 line 1: syntax error at "$" missing Word
1054
1055 With MR21 the message will be:
1056
1057 line 1: syntax error at "$" expecting Word, Number.
1058
1059 The generate code resembles:
1060
1061 if ( (LA(1)==Number) ) {
1062 zzmatch(Number);
1063 consume();
1064 }
1065 else {
1066 if ( (LA(1)==Word) ) {
1067 /* nothing */
1068 }
1069 else {
1070 FAIL(... message for both Number and Word ...);
1071 }
1072 }
1073 match(Word);
1074
1075 The code generated for optional blocks in MR21 is slightly longer
1076 than the previous versions, but it should give better error messages.
1077
1078 The code generated for:
1079
1080 { a | b | c }
1081
1082 should now be *identical* to:
1083
1084 ( a | b | c | )
1085
1086 which was not the case prior to MR21.
1087
1088 Reported by Sue Marvin (sue siara.com).
1089
1090 #246. (Changed in MR21) Use of $(MAKE) for calls to make
1091
1092 Calls to make from the makefiles were replaced with $(MAKE)
1093 because of problems when using gmake.
1094
1095 Reported with fix by Sunil K.Vallamkonda (sunil siara.com).
1096
1097 #245. (Changed in MR21) Changes to genmk
1098
1099 The following command line options have been added to genmk:
1100
1101 -cfiles ...
1102
1103 To add a user's C or C++ files into makefile automatically.
1104 The list of files must be enclosed in apostrophes. This
1105 option may be specified multiple times.
1106
1107 -compiler ...
1108
1109 The name of the compiler to use for $(CCC) or $(CC). The
1110 default in C++ mode is "CC". The default in C mode is "cc".
1111
1112 -pccts_path ...
1113
1114 The value for $(PCCTS), the pccts directory. The default
1115 is /usr/local/pccts.
1116
1117 Contributed by Tomasz Babczynski (t.babczynski ict.pwr.wroc.pl).
1118
1119 #244. (Changed in MR21) Rename variable "not" in antlr.g
1120
1121 When antlr.g is compiled with a C++ compiler, a variable named
1122 "not" causes problems. Reported by Sinan Karasu
1123 (sinan.karasu boeing.com).
1124
1125 #243 (Changed in MR21) Replace recursion with iteration in zzfree_ast
1126
1127 Another refinement to zzfree_ast in ast.c to limit recursion.
1128
1129 NAKAJIMA Mutsuki (muc isr.co.jp).
1130
1131
1132 #242. (Changed in MR21) LineInfoFormatStr
1133
1134 Added an #ifndef/#endif around LineInfoFormatStr in pcctscfg.h.
1135
1136 #241. (Changed in MR21) Changed macro PURIFY to a no-op
1137
1138 ***********************
1139 *** NOT IMPLEMENTED ***
1140 ***********************
1141
1142 The PURIFY macro was changed to a no-op because it was causing
1143 problems when passing C++ objects.
1144
1145 The old definition:
1146
1147 #define PURIFY(r,s) memset((char *) &(r),'\\0',(s));
1148
1149 The new definition:
1150
1151 #define PURIFY(r,s) /* nothing */
1152 #endif
1153
1154 #240. (Changed in MR21) sorcerer/h/sorcerer.h _MATCH and _MATCHRANGE
1155
1156 Added test for NULL token pointer.
1157
1158 Suggested by Peter Keller (keller ebi.ac.uk)
1159
1160 #239. (Changed in MR21) C++ mode AParser::traceGuessFail
1161
1162 If tracing is turned on when the code has been generated
1163 without trace code, a failed guess generates a trace report
1164 even though there are no other trace reports. This
1165 make the behavior consistent with other parts of the
1166 trace system.
1167
1168 Reported by David Wigg (wiggjd sbu.ac.uk).
1169
1170 #238. (Changed in MR21) Namespace version #include files
1171
1172 Changed reference from CStdio to cstdio (and other
1173 #include file names) in the namespace version of pccts.
1174 Should have known better.
1175
1176 #237. (Changed in MR21) ParserBlackBox(FILE*)
1177
1178 In the past, ParserBlackBox would close the FILE in the dtor
1179 even though it was not opened by ParserBlackBox. The problem
1180 is that there were two constructors, one which accepted a file
1181 name and did an fopen, the other which accepted a FILE and did
1182 not do an fopen. There is now an extra member variable which
1183 remembers whether ParserBlackBox did the open or not.
1184
1185 Suggested by Mike Percy (mpercy scires.com).
1186
1187 #236. (Changed in MR21) tmake now reports down pointer problem
1188
1189 When ASTBase::tmake attempts to update the down pointer of
1190 an AST it checks to see if the down pointer is NULL. If it
1191 is not NULL it does not do the update and returns NULL.
1192 An attempt to update the down pointer is almost always a
1193 result of a user error. This can lead to difficult to find
1194 problems during tree construction.
1195
1196 With this change, the routine calls a virtual function
1197 reportOverwriteOfDownPointer() which calls panic to
1198 report the problem. Users who want the old behavior can
1199 redefined the virtual function in their AST class.
1200
1201 Suggested by Sinan Karasu (sinan.karasu boeing.com)
1202
1203 #235. (Changed in MR21) Made ANTLRParser::resynch() virtual
1204
1205 Suggested by Jerry Evans (jerry swsl.co.uk).
1206
1207 #234. (Changed in MR21) Implicit int for function return value
1208
1209 ATokenBuffer:bufferSize() did not specify a type for the
1210 return value.
1211
1212 Reported by Hai Vo-Ba (hai fc.hp.com).
1213
1214 #233. (Changed in MR20) Converted to MSVC 6.0
1215
1216 Due to external circumstances I have had to convert to MSVC 6.0
1217 The MSVC 5.0 project files (.dsw and .dsp) have been retained as
1218 xxx50.dsp and xxx50.dsw. The MSVC 6.0 files are named xxx60.dsp
1219 and xxx60.dsw (where xxx is the related to the directory/project).
1220
1221 #232. (Changed in MR20) Make setwd bit vectors protected in parser.h
1222
1223 The access for the setwd array in the parser header was not
1224 specified. As a result, it would depend on the code which
1225 preceded it. In MR20 it will always have access "protected".
1226
1227 Reported by Piotr Eljasiak (eljasiak zt.gdansk.tpsa.pl).
1228
1229 #231. (Changed in MR20) Error in token buffer debug code.
1230
1231 When token buffer debugging is selected via the pre-processor
1232 symbol DEBUG_TOKENBUFFER there is an erroneous check in
1233 AParser.cpp:
1234
1235 #ifdef DEBUG_TOKENBUFFER
1236 if (i >= inputTokens->bufferSize() ||
1237 inputTokens->minTokens() < LLk ) /* MR20 Was "<=" */
1238 ...
1239 #endif
1240
1241 Reported by David Wigg (wiggjd sbu.ac.uk).
1242
1243 #230. (Changed in MR20) Fixed problem with #define for -gd option
1244
1245 There was an error in setting zzTRACE_RULES for the -gd (trace) option.
1246
1247 Reported by Gary Funck (gary intrepid.com).
1248
1249 #229. (Changed in MR20) Additional "const" for literals
1250
1251 "const" was added to the token name literal table.
1252 "const" was added to some panic() and similar routine
1253
1254 #228. (Changed in MR20) dlg crashes on "()"
1255
1256 The following token defintion will cause DLG to crash.
1257
1258 #token "()"
1259
1260 When there is a syntax error in a regular expression
1261 many of the dlg routines return a structure which has
1262 null pointers. When this is accessed by callers it
1263 generates the crash.
1264
1265 I have attempted to fix the more common cases.
1266
1267 Reported by Mengue Olivier (dolmen bigfoot.com).
1268
1269 #227. (Changed in MR20) Array overwrite
1270
1271 Steveh Hand (sassth unx.sas.com) reported a problem which
1272 was traced to a temporary array which was not properly
1273 resized for deeply nested blocks. This has been fixed.
1274
1275 #226. (Changed in MR20) -pedantic conformance
1276
1277 G. Hobbelt (i_a mbh.org) and THM made many, many minor
1278 changes to create prototypes for all the functions and
1279 bring antlr, dlg, and sorcerer into conformance with
1280 the gcc -pedantic option.
1281
1282 This may require uses to add pccts/h/pcctscfg.h to some
1283 files or makefiles in order to have __USE_PROTOS defined.
1284
1285 #225 (Changed in MR20) AST stack adjustment in C mode
1286
1287 The fix in #214 for AST stack adjustment in C mode missed
1288 some cases.
1289
1290 Reported with fix by Ger Hobbelt (i_a mbh.org).
1291
1292 #224 (Changed in MR20) LL(1) and LL(2) with #pragma approx
1293
1294 This may take a record for the oldest, most trival, lexical
1295 error in pccts. The regular expressions for LL(1) and LL(2)
1296 lacked an escape for the left and right parenthesis.
1297
1298 Reported by Ger Hobbelt (i_a mbh.org).
1299
1300 #223 (Changed in MR20) Addition of IBM_VISUAL_AGE directory
1301
1302 Build files for antlr, dlg, and sorcerer under IBM Visual Age
1303 have been contributed by Anton Sergeev (ags mlc.ru). They have
1304 been placed in the pccts/IBM_VISUAL_AGE directory.
1305
1306 #222 (Changed in MR20) Replace __STDC__ with __USE_PROTOS
1307
1308 Most occurrences of __STDC__ replaced with __USE_PROTOS due to
1309 complaints from several users.
1310
1311 #221 (Changed in MR20) Added #include for DLexerBase.h to PBlackBox.
1312
1313 Added #include for DLexerBase.h to PBlackBox.
1314
1315 #220 (Changed in MR19) strcat arguments reversed in #pred parse
1316
1317 The arguments to strcat are reversed when creating a print
1318 name for a hash table entry for use with #pred feature.
1319
1320 Problem diagnosed and fix reported by Scott Harrington
1321 (seh4 ix.netcom.com).
1322
1323 #219. (Changed in MR19) C Mode routine zzfree_ast
1324
1325 Changes to reduce use of recursion for AST trees with only right
1326 links or only left links in the C mode routine zzfree_ast.
1327
1328 Implemented by SAKAI Kiyotaka (ksakai isr.co.jp).
1329
1330 #218. (Changed in MR19) Changes to support unsigned char in C mode
1331
1332 Changes to antlr.h and err.h to fix omissions in use of zzchar_t
1333
1334 Implemented by SAKAI Kiyotaka (ksakai isr.co.jp).
1335
1336 #217. (Changed in MR19) Error message when dlg -i and -CC options selected
1337
1338 *** This change was rescinded by item #257 ***
1339
1340 The parsers generated by pccts in C++ mode are not able to support the
1341 interactive lexer option (except, perhaps, when using the deferred fetch
1342 parser option.(Item #216).
1343
1344 DLG now warns when both -i and -CC are selected.
1345
1346 This warning was suggested by David Venditti (07751870267-0001 t-online.de).
1347
1348 #216. (Changed in MR19) Defer token fetch for C++ mode
1349
1350 Implemented by Volker H. Simonis (simonis informatik.uni-tuebingen.de)
1351
1352 Normally, pccts keeps the lookahead token buffer completely filled.
1353 This requires max(k,ck) tokens of lookahead. For some applications
1354 this can cause deadlock problems. For example, there may be cases
1355 when the parser can't tell when the input has been completely consumed
1356 until the parse is complete, but the parse can't be completed because
1357 the input routines are waiting for additional tokens to fill the
1358 lookahead buffer.
1359
1360 When the ANTLRParser class is built with the pre-processor option
1361 ZZDEFER_FETCH defined, the fetch of new tokens by consume() is deferred
1362 until LA(i) or LT(i) is called.
1363
1364 To test whether this option has been built into the ANTLRParser class
1365 use "isDeferFetchEnabled()".
1366
1367 Using the -gd trace option with the default tracein() and traceout()
1368 routines will defeat the effort to defer the fetch because the
1369 trace routines print out information about the lookahead token at
1370 the start of the rule.
1371
1372 Because the tracein and traceout routines are virtual it is
1373 easy to redefine them in your parser:
1374
1375 class MyParser {
1376 <<
1377 virtual void tracein(ANTLRChar * ruleName)
1378 { fprintf(stderr,"Entering: %s\n", ruleName); }
1379 virtual void traceout(ANTLRChar * ruleName)
1380 { fprintf(stderr,"Leaving: %s\n", ruleName); }
1381 >>
1382
1383 The originals for those routines are pccts/h/AParser.cpp
1384
1385 This requires use of the dlg option -i (interactive lexer).
1386
1387 This is implemented only for C++ mode.
1388
1389 This is experimental. The interaction with guess mode (syntactic
1390 predicates)is not known.
1391
1392 #215. (Changed in MR19) Addition of reset() to DLGLexerBase
1393
1394 There was no obvious way to reset the lexer for reuse. The
1395 reset() method now does this.
1396
1397 Suggested by David Venditti (07751870267-0001 t-online.de).
1398
1399 #214. (Changed in MR19) C mode: Adjust AST stack pointer at exit
1400
1401 In C mode the AST stack pointer needs to be reset if there will
1402 be multiple calls to the ANTLRx macros.
1403
1404 Reported with fix by Paul D. Smith (psmith baynetworks.com).
1405
1406 #213. (Changed in MR18) Fatal error with -mrhoistk (k>1 hoisting)
1407
1408 When rearranging code I forgot to un-comment a critical line of
1409 code that handles hoisting of predicates with k>1 lookahead. This
1410 is now fixed.
1411
1412 Reported by Reinier van den Born (reinier vnet.ibm.com).
1413
1414 #212. (Changed in MR17) Mac related changes by Kenji Tanaka
1415
1416 Kenji Tanaka (kentar osa.att.ne.jp) has made a number of changes for
1417 Macintosh users.
1418
1419 a. The following Macintosh MPW files aid in installing pccts on Mac:
1420
1421 pccts/MPW_Read_Me
1422
1423 pccts/install68K.mpw
1424 pccts/installPPC.mpw
1425
1426 pccts/antlr/antlr.r
1427 pccts/antlr/antlr68K.make
1428 pccts/antlr/antlrPPC.make
1429
1430 pccts/dlg/dlg.r
1431 pccts/dlg/dlg68K.make
1432 pccts/dlg/dlgPPC.make
1433
1434 pccts/sorcerer/sor.r
1435 pccts/sorcerer/sor68K.make
1436 pccts/sorcerer/sorPPC.make
1437
1438 They completely replace the previous Mac installation files.
1439
1440 b. The most significant is a change in the MAC_FILE_CREATOR symbol
1441 in pcctscfg.h:
1442
1443 old: #define MAC_FILE_CREATOR 'MMCC' /* Metrowerks C/C++ Text files */
1444 new: #define MAC_FILE_CREATOR 'CWIE' /* Metrowerks C/C++ Text files */
1445
1446 c. Added calls to special_fopen_actions() where necessary.
1447
1448 #211. (Changed in MR16a) C++ style comment in dlg
1449
1450 This has been fixed.
1451
1452 #210. (Changed in MR16a) Sor accepts \r\n, \r, or \n for end-of-line
1453
1454 A user requested that Sorcerer be changed to accept other forms
1455 of end-of-line.
1456
1457 #209. (Changed in MR16) Name of files changed.
1458
1459 Old: CHANGES_FROM_1.33
1460 New: CHANGES_FROM_133.txt
1461
1462 Old: KNOWN_PROBLEMS
1463 New: KNOWN_PROBLEMS.txt
1464
1465 #208. (Changed in MR16) Change in use of pccts #include files
1466
1467 There were problems with MS DevStudio when mixing Sorcerer and
1468 PCCTS in the same source file. The problem is caused by the
1469 redefinition of setjmp in the MS header file setjmp.h. In
1470 setjmp.h the pre-processor symbol setjmp was redefined to be
1471 _setjmp. A later effort to execute #include <setjmp.h> resulted
1472 in an effort to #include <_setjmp.h>. I'm not sure whether this
1473 is a bug or a feature. In any case, I decided to fix it by
1474 avoiding the use of pre-processor symbols in #include statements
1475 altogether. This has the added benefit of making pre-compiled
1476 headers work again.
1477
1478 I've replaced statements:
1479
1480 old: #include PCCTS_SETJMP_H
1481 new: #include "pccts_setjmp.h"
1482
1483 Where pccts_setjmp.h contains:
1484
1485 #ifndef __PCCTS_SETJMP_H__
1486 #define __PCCTS_SETJMP_H__
1487
1488 #ifdef PCCTS_USE_NAMESPACE_STD
1489 #include <Csetjmp>
1490 #else
1491 #include <setjmp.h>
1492 #endif
1493
1494 #endif
1495
1496 A similar change has been made for other standard header files
1497 required by pccts and sorcerer: stdlib.h, stdarg.h, stdio.h, etc.
1498
1499 Reported by Jeff Vincent (JVincent novell.com) and Dale Davis
1500 (DalDavis spectrace.com).
1501
1502 #207. (Changed in MR16) dlg reports an invalid range for: [\0x00-\0xff]
1503
1504 -----------------------------------------------------------------
1505 Note from MR23: This fix does not work. I am investigating why.
1506 -----------------------------------------------------------------
1507
1508 dlg will report that this is an invalid range.
1509
1510 Diagnosed by Piotr Eljasiak (eljasiak no-spam.zt.gdansk.tpsa.pl):
1511
1512 I think this problem is not specific to unsigned chars
1513 because dlg reports no error for the range [\0x00-\0xfe].
1514
1515 I've found that information on range is kept in field
1516 letter (unsigned char) of Attrib struct. Unfortunately
1517 the letter value internally is for some reasons increased
1518 by 1, so \0xff is represented here as 0.
1519
1520 That's why dlg complains about the range [\0x00-\0xff] in
1521 dlg_p.g:
1522
1523 if ($$.letter > $2.letter) {
1524 error("invalid range ", zzline);
1525 }
1526
1527 The fix is:
1528
1529 if ($$.letter > $2.letter && 255 != $$2.letter) {
1530 error("invalid range ", zzline);
1531 }
1532
1533 #206. (Changed in MR16) Free zzFAILtext in ANTLRParser destructor
1534
1535 The ANTLRParser destructor now frees zzFAILtext.
1536
1537 Problem and fix reported by Manfred Kogler (km cast.uni-linz.ac.at).
1538
1539 #205. (Changed in MR16) DLGStringReset argument now const
1540
1541 Changed: void DLGStringReset(DLGChar *s) {...}
1542 To: void DLGStringReset(const DLGChar *s) {...}
1543
1544 Suggested by Dale Davis (daldavis spectrace.com)
1545
1546 #204. (Changed in MR15a) Change __WATCOM__ to __WATCOMC__ in pcctscfg.h
1547
1548 Reported by Oleg Dashevskii (olegdash my-dejanews.com).
1549
1550 #203. (Changed in MR15) Addition of sorcerer to distribution kit
1551
1552 I have finally caved in to popular demand. The pccts 1.33mr15
1553 kit will include sorcerer. The separate sorcerer kit will be
1554 discontinued.
1555
1556 #202. (Changed) in MR15) Organization of MS Dev Studio Projects in Kit
1557
1558 Previously there was one workspace that contained projects for
1559 all three parts of pccts: antlr, dlg, and sorcerer. Now each
1560 part (and directory) has its own workspace/project and there
1561 is an additional workspace/project to build a library from the
1562 .cpp files in the pccts/h directory.
1563
1564 The library build will create pccts_debug.lib or pccts_release.lib
1565 according to the configuration selected.
1566
1567 If you don't want to build pccts 1.33MR15 you can download a
1568 ready-to-run kit for win32 from http://www.polhode.com/win32.zip.
1569 The ready-to-run for win32 includes executables, a pre-built static
1570 library for the .cpp files in the pccts/h directory, and a sample
1571 application
1572
1573 You will need to define the environment variable PCCTS to point to
1574 the root of the pccts directory hierarchy.
1575
1576 #201. (Changed in MR15) Several fixes by K.J. Cummings (cummings peritus.com)
1577
1578 Generation of SETJMP rather than SETJMP_H in gen.c.
1579
1580 (Sor B19) Declaration of ref_vars_inits for ref_var_inits in
1581 pccts/sorcerer/sorcerer.h.
1582
1583 #200. (Changed in MR15) Remove operator=() in AToken.h
1584
1585 User reported that WatCom couldn't handle use of
1586 explicit operator =(). Replace with equivalent
1587 using cast operator.
1588
1589 #199. (Changed in MR15) Don't allow use of empty #tokclass
1590
1591 Change antlr.g to disallow empty #tokclass sets.
1592
1593 Reported by Manfred Kogler (km cast.uni-linz.ac.at).
1594
1595 #198. Revised ANSI C grammar due to efforts by Manuel Kessler
1596
1597 Manuel Kessler (mlkessler cip.physik.uni-wuerzburg.de)
1598
1599 Allow trailing ... in function parameter lists.
1600 Add bit fields.
1601 Allow old-style function declarations.
1602 Support cv-qualified pointers.
1603 Better checking of combinations of type specifiers.
1604 Release of memory for local symbols on scope exit.
1605 Allow input file name on command line as well as by redirection.
1606
1607 and other miscellaneous tweaks.
1608
1609 This is not part of the pccts distribution kit. It must be
1610 downloaded separately from:
1611
1612 http://www.polhode.com/ansi_mr15.zip
1613
1614 #197. (Changed in MR14) Resetting the lookahead buffer of the parser
1615
1616 Explanation and fix by Sinan Karasu (sinan.karasu boeing.com)
1617
1618 Consider the code used to prime the lookahead buffer LA(i)
1619 of the parser when init() is called:
1620
1621 void
1622 ANTLRParser::
1623 prime_lookahead()
1624 {
1625 int i;
1626 for(i=1;i<=LLk; i++) consume();
1627 dirty=0;
1628 //lap = 0; // MR14 - Sinan Karasu (sinan.karusu boeing.com)
1629 //labase = 0; // MR14
1630 labase=lap; // MR14
1631 }
1632
1633 When the parser is instantiated, lap=0,labase=0 is set.
1634
1635 The "for" loop runs LLk times. In consume(), lap = lap +1 (mod LLk) is
1636 computed. Therefore, lap(before the loop) == lap (after the loop).
1637
1638 Now the only problem comes in when one does an init() of the parser
1639 after an Eof has been seen. At that time, lap could be non zero.
1640 Assume it was lap==1. Now we do a prime_lookahead(). If LLk is 2,
1641 then
1642
1643 consume()
1644 {
1645 NLA = inputTokens->getToken()->getType();
1646 dirty--;
1647 lap = (lap+1)&(LLk-1);
1648 }
1649
1650 or expanding NLA,
1651
1652 token_type[lap&(LLk-1)]) = inputTokens->getToken()->getType();
1653 dirty--;
1654 lap = (lap+1)&(LLk-1);
1655
1656 so now we prime locations 1 and 2. In prime_lookahead it used to set
1657 lap=0 and labase=0. Now, the next token will be read from location 0,
1658 NOT 1 as it should have been.
1659
1660 This was never caught before, because if a parser is just instantiated,
1661 then lap and labase are 0, the offending assignment lines are
1662 basically no-ops, since the for loop wraps around back to 0.
1663
1664 #196. (Changed in MR14) Problems with "(alpha)? beta" guess
1665
1666 Consider the following syntactic predicate in a grammar
1667 with 2 tokens of lookahead (k=2 or ck=2):
1668
1669 rule : ( alpha )? beta ;
1670 alpha : S t ;
1671 t : T U
1672 | T
1673 ;
1674 beta : S t Z ;
1675
1676 When antlr computes the prediction expression with one token
1677 of lookahead for alts 1 and 2 of rule t it finds an ambiguity.
1678
1679 Because the grammar has a lookahead of 2 it tries to compute
1680 two tokens of lookahead for alts 1 and 2 of t. Alt 1 clearly
1681 has a lookahead of (T U). Alt 2 is one token long so antlr
1682 tries to compute the follow set of alt 2, which means finding
1683 the things which can follow rule t in the context of (alpha)?.
1684 This cannot be computed, because alpha is only part of a rule,
1685 and antlr can't tell what part of beta is matched by alpha and
1686 what part remains to be matched. Thus it impossible for antlr
1687 to properly determine the follow set of rule t.
1688
1689 Prior to 1.33MR14, the follow of (alpha)? was computed as
1690 FIRST(beta) as a result of the internal representation of
1691 guess blocks.
1692
1693 With MR14 the follow set will be the empty set for that context.
1694
1695 Normally, one expects a rule appearing in a guess block to also
1696 appear elsewhere. When the follow context for this other use
1697 is "ored" with the empty set, the context from the other use
1698 results, and a reasonable follow context results. However if
1699 there is *no* other use of the rule, or it is used in a different
1700 manner then the follow context will be inaccurate - it was
1701 inaccurate even before MR14, but it will be inaccurate in a
1702 different way.
1703
1704 For the example given earlier, a reasonable way to rewrite the
1705 grammar:
1706
1707 rule : ( alpha )? beta
1708 alpha : S t ;
1709 t : T U
1710 | T
1711 ;
1712 beta : alpha Z ;
1713
1714 If there are no other uses of the rule appearing in the guess
1715 block it will generate a test for EOF - a workaround for
1716 representing a null set in the lookahead tests.
1717
1718 If you encounter such a problem you can use the -alpha option
1719 to get additional information:
1720
1721 line 2: error: not possible to compute follow set for alpha
1722 in an "(alpha)? beta" block.
1723
1724 With the antlr -alpha command line option the following information
1725 is inserted into the generated file:
1726
1727 #if 0
1728
1729 Trace of references leading to attempt to compute the follow set of
1730 alpha in an "(alpha)? beta" block. It is not possible for antlr to
1731 compute this follow set because it is not known what part of beta has
1732 already been matched by alpha and what part remains to be matched.
1733
1734 Rules which make use of the incorrect follow set will also be incorrect
1735
1736 1 #token T alpha/2 line 7 brief.g
1737 2 end alpha alpha/3 line 8 brief.g
1738 2 end (...)? block at start/1 line 2 brief.g
1739
1740 #endif
1741
1742 At the moment, with the -alpha option selected the program marks
1743 any rules which appear in the trace back chain (above) as rules with
1744 possible problems computing follow set.
1745
1746 Reported by Greg Knapen (gregory.knapen bell.ca).
1747
1748 #195. (Changed in MR14) #line directive not at column 1
1749
1750 Under certain circunstances a predicate test could generate
1751 a #line directive which was not at column 1.
1752
1753 Reported with fix by David KÃ¥gedal (davidk lysator.liu.se)
1754 (http://www.lysator.liu.se/~davidk/).
1755
1756 #194. (Changed in MR14) (C Mode only) Demand lookahead with #tokclass
1757
1758 In C mode with the demand lookahead option there is a bug in the
1759 code which handles matches for #tokclass (zzsetmatch and
1760 zzsetmatch_wsig).
1761
1762 The bug causes the lookahead pointer to get out of synchronization
1763 with the current token pointer.
1764
1765 The problem was reported with a fix by Ger Hobbelt (hobbelt axa.nl).
1766
1767 #193. (Changed in MR14) Use of PCCTS_USE_NAMESPACE_STD
1768
1769 The pcctscfg.h now contains the following definitions:
1770
1771 #ifdef PCCTS_USE_NAMESPACE_STD
1772 #define PCCTS_STDIO_H <Cstdio>
1773 #define PCCTS_STDLIB_H <Cstdlib>
1774 #define PCCTS_STDARG_H <Cstdarg>
1775 #define PCCTS_SETJMP_H <Csetjmp>
1776 #define PCCTS_STRING_H <Cstring>
1777 #define PCCTS_ASSERT_H <Cassert>
1778 #define PCCTS_ISTREAM_H <istream>
1779 #define PCCTS_IOSTREAM_H <iostream>
1780 #define PCCTS_NAMESPACE_STD namespace std {}; using namespace std;
1781 #else
1782 #define PCCTS_STDIO_H <stdio.h>
1783 #define PCCTS_STDLIB_H <stdlib.h>
1784 #define PCCTS_STDARG_H <stdarg.h>
1785 #define PCCTS_SETJMP_H <setjmp.h>
1786 #define PCCTS_STRING_H <string.h>
1787 #define PCCTS_ASSERT_H <assert.h>
1788 #define PCCTS_ISTREAM_H <istream.h>
1789 #define PCCTS_IOSTREAM_H <iostream.h>
1790 #define PCCTS_NAMESPACE_STD
1791 #endif
1792
1793 The runtime support in pccts/h uses these pre-processor symbols
1794 consistently.
1795
1796 Also, antlr and dlg have been changed to generate code which uses
1797 these pre-processor symbols rather than having the names of the
1798 #include files hard-coded in the generated code.
1799
1800 This required the addition of "#include pcctscfg.h" to a number of
1801 files in pccts/h.
1802
1803 It appears that this sometimes causes problems for MSVC 5 in
1804 combination with the "automatic" option for pre-compiled headers.
1805 In such cases disable the "automatic" pre-compiled headers option.
1806
1807 Suggested by Hubert Holin (Hubert.Holin Bigfoot.com).
1808
1809 #192. (Changed in MR14) Change setText() to accept "const ANTLRChar *"
1810
1811 Changed ANTLRToken::setText(ANTLRChar *) to setText(const ANTLRChar *).
1812 This allows literal strings to be used to initialize tokens. Since
1813 the usual token implementation (ANTLRCommonToken) makes a copy of the
1814 input string, this was an unnecessary limitation.
1815
1816 Suggested by Bob McWhirter (bob netwrench.com).
1817
1818 #191. (Changed in MR14) HP/UX aCC compiler compatibility problem
1819
1820 Needed to explicitly declare zzINF_DEF_TOKEN_BUFFER_SIZE and
1821 zzINF_BUFFER_TOKEN_CHUNK_SIZE as ints in pccts/h/AParser.cpp.
1822
1823 Reported by David Cook (dcook bmc.com).
1824
1825 #190. (Changed in MR14) IBM OS/2 CSet compiler compatibility problem
1826
1827 Name conflict with "_cs" in pccts/h/ATokenBuffer.cpp
1828
1829 Reported by David Cook (dcook bmc.com).
1830
1831 #189. (Changed in MR14) -gxt switch in C mode
1832
1833 The -gxt switch in C mode didn't work because of incorrect
1834 initialization.
1835
1836 Reported by Sinan Karasu (sinan boeing.com).
1837
1838 #188. (Changed in MR14) Added pccts/h/DLG_stream_input.h
1839
1840 This is a DLG stream class based on C++ istreams.
1841
1842 Contributed by Hubert Holin (Hubert.Holin Bigfoot.com).
1843
1844 #187. (Changed in MR14) Rename config.h to pcctscfg.h
1845
1846 The PCCTS configuration file has been renamed from config.h to
1847 pcctscfg.h. The problem with the original name is that it led
1848 to name collisions when pccts parsers were combined with other
1849 software.
1850
1851 All of the runtime support routines in pccts/h/* have been
1852 changed to use the new name. Existing software can continue
1853 to use pccts/h/config.h. The contents of pccts/h/config.h is
1854 now just "#include "pcctscfg.h".
1855
1856 I don't have a record of the user who suggested this.
1857
1858 #186. (Changed in MR14) Pre-processor symbol DllExportPCCTS class modifier
1859
1860 Classes in the C++ runtime support routines are now declared:
1861
1862 class DllExportPCCTS className ....
1863
1864 By default, the pre-processor symbol is defined as the empty
1865 string. This if for use by MSVC++ users to create DLL classes.
1866
1867 Suggested by Manfred Kogler (km cast.uni-linz.ac.at).
1868
1869 #185. (Changed in MR14) Option to not use PCCTS_AST base class for ASTBase
1870
1871 Normally, the ASTBase class is derived from PCCTS_AST which contains
1872 functions useful to Sorcerer. If these are not necessary then the
1873 user can define the pre-processor symbol "PCCTS_NOT_USING_SOR" which
1874 will cause the ASTBase class to replace references to PCCTS_AST with
1875 references to ASTBase where necessary.
1876
1877 The class ASTDoublyLinkedBase will contain a pure virtual function
1878 shallowCopy() that was formerly defined in class PCCTS_AST.
1879
1880 Suggested by Bob McWhirter (bob netwrench.com).
1881
1882 #184. (Changed in MR14) Grammars with no tokens generate invalid tokens.h
1883
1884 Reported by Hubert Holin (Hubert.Holin bigfoot.com).
1885
1886 #183. (Changed in MR14) -f to specify file with names of grammar files
1887
1888 In DEC/VMS it is difficult to specify very long command lines.
1889 The -f option allows one to place the names of the grammar files
1890 in a data file in order to bypass limitations of the DEC/VMS
1891 command language interpreter.
1892
1893 Addition supplied by Bernard Giroud (b_giroud decus.ch).
1894
1895 #182. (Changed in MR14) Output directory option for DEC/VMS
1896
1897 Fix some problems with the -o option under DEC/VMS.
1898
1899 Fix supplied by Bernard Giroud (b_giroud decus.ch).
1900
1901 #181. (Changed in MR14) Allow chars > 127 in DLGStringInput::nextChar()
1902
1903 Changed DLGStringInput to cast the character using (unsigned char)
1904 so that languages with character codes greater than 127 work
1905 without changes.
1906
1907 Suggested by Manfred Kogler (km cast.uni-linz.ac.at).
1908
1909 #180. (Added in MR14) ANTLRParser::getEofToken()
1910
1911 Added "ANTLRToken ANTLRParser::getEofToken() const" to match the
1912 setEofToken routine.
1913
1914 Requested by Manfred Kogler (km cast.uni-linz.ac.at).
1915
1916 #179. (Fixed in MR14) Memory leak for BufFileInput subclass of DLGInputStream
1917
1918 The BufFileInput class described in Item #142 neglected to release
1919 the allocated buffer when an instance was destroyed.
1920
1921 Reported by Manfred Kogler (km cast.uni-linz.ac.at).
1922
1923 #178. (Fixed in MR14) Bug in "(alpha)? beta" guess blocks first sets
1924
1925 In 1.33 vanilla, and all maintenance releases prior to MR14
1926 there is a bug in the handling of guess blocks which use the
1927 "long" form:
1928
1929 (alpha)? beta
1930
1931 inside a (...)*, (...)+, or {...} block.
1932
1933 This problem does *not* apply to the case where beta is omitted
1934 or when the syntactic predicate is on the leading edge of an
1935 alternative.
1936
1937 The problem is that both alpha and beta are stored in the
1938 syntax diagram, and that some analysis routines would fail
1939 to skip the alpha portion when it was not on the leading edge.
1940 Consider the following grammar with -ck 2:
1941
1942 r : ( (A)? B )* C D
1943
1944 | A B /* forces -ck 2 computation for old antlr */
1945 /* reports ambig for alts 1 & 2 */
1946
1947 | B C /* forces -ck 2 computation for new antlr */
1948 /* reports ambig for alts 1 & 3 */
1949 ;
1950
1951 The prediction expression for the first alternative should be
1952 LA(1)={B C} LA(2)={B C D}, but previous versions of antlr
1953 would compute the prediction expression as LA(1)={A C} LA(2)={B D}
1954
1955 Reported by Arpad Beszedes (beszedes inf.u-szeged.hu) who provided
1956 a very clear example of the problem and identified the probable cause.
1957
1958 #177. (Changed in MR14) #tokdefs and #token with regular expression
1959
1960 In MR13 the change described by Item #162 caused an existing
1961 feature of antlr to fail. Prior to the change it was possible
1962 to give regular expression definitions and actions to tokens
1963 which were defined via the #tokdefs directive.
1964
1965 This now works again.
1966
1967 Reported by Manfred Kogler (km cast.uni-linz.ac.at).
1968
1969 #176. (Changed in MR14) Support for #line in antlr source code
1970
1971 Note: this was implemented by Arpad Beszedes (beszedes inf.u-szeged.hu).
1972
1973 In 1.33MR14 it is possible for a pre-processor to generate #line
1974 directives in the antlr source and have those line numbers and file
1975 names used in antlr error messages and in the #line directives
1976 generated by antlr.
1977
1978 The #line directive may appear in the following forms:
1979
1980 #line ll "sss" xx xx ...
1981
1982 where ll represents a line number, "sss" represents the name of a file
1983 enclosed in quotation marks, and xxx are arbitrary integers.
1984
1985 The following form (without "line") is not supported at the moment:
1986
1987 # ll "sss" xx xx ...
1988
1989 The result:
1990
1991 zzline
1992
1993 is replaced with ll from the # or #line directive
1994
1995 FileStr[CurFile]
1996
1997 is updated with the contents of the string (if any)
1998 following the line number
1999
2000 Note
2001 ----
2002 The file-name string following the line number can be a complete
2003 name with a directory-path. Antlr generates the output files from
2004 the input file name (by replacing the extension from the file-name
2005 with .c or .cpp).
2006
2007 If the input file (or the file-name from the line-info) contains
2008 a path:
2009
2010 "../grammar.g"
2011
2012 the generated source code will be placed in "../grammar.cpp" (i.e.
2013 in the parent directory). This is inconvenient in some cases
2014 (even the -o switch can not be used) so the path information is
2015 removed from the #line directive. Thus, if the line-info was
2016
2017 #line 2 "../grammar.g"
2018
2019 then the current file-name will become "grammar.g"
2020
2021 In this way, the generated source code according to the grammar file
2022 will always be in the current directory, except when the -o switch
2023 is used.
2024
2025 #175. (Changed in MR14) Bug when guess block appears at start of (...)*
2026
2027 In 1.33 vanilla and all maintenance releases prior to 1.33MR14
2028 there is a bug when a guess block appears at the start of a (...)+.
2029 Consider the following k=1 (ck=1) grammar:
2030
2031 rule :
2032 ( (STAR)? ZIP )* ID ;
2033
2034 Prior to 1.33MR14, the generated code resembled:
2035
2036 ...
2037 zzGUESS_BLOCK
2038 while ( 1 ) {
2039 if ( ! LA(1)==STAR) break;
2040 zzGUESS
2041 if ( !zzrv ) {
2042 zzmatch(STAR);
2043 zzCONSUME;
2044 zzGUESS_DONE
2045 zzmatch(ZIP);
2046 zzCONSUME;
2047 ...
2048
2049 Note that the routine uses STAR for the prediction expression
2050 rather than ZIP. With 1.33MR14 the generated code resembles:
2051
2052 ...
2053 while ( 1 ) {
2054 if ( ! LA(1)==ZIP) break;
2055 ...
2056
2057 This problem existed only with (...)* blocks and was caused
2058 by the slightly more complicated graph which represents (...)*
2059 blocks. This caused the analysis routine to compute the first
2060 set for the alpha part of the "(alpha)? beta" rather than the
2061 beta part.
2062
2063 Both (...)+ and {...} blocks handled the guess block correctly.
2064
2065 Reported by Arpad Beszedes (beszedes inf.u-szeged.hu) who provided
2066 a very clear example of the problem and identified the probable cause.
2067
2068 #174. (Changed in MR14) Bug when action precedes syntactic predicate
2069
2070 In 1.33 vanilla, and all maintenance releases prior to MR14,
2071 there was a bug when a syntactic predicate was immediately
2072 preceded by an action. Consider the following -ck 2 grammar:
2073
2074 rule :
2075 <<int i;>>
2076 (alpha)? beta C
2077 | A B
2078 ;
2079
2080 alpha : A ;
2081 beta : A B;
2082
2083 Prior to MR14, the code generated for the first alternative
2084 resembled:
2085
2086 ...
2087 zzGUESS
2088 if ( !zzrv && LA(1)==A && LA(2)==A) {
2089 alpha();
2090 zzGUESS_DONE
2091 beta();
2092 zzmatch(C);
2093 zzCONSUME;
2094 } else {
2095 ...
2096
2097 The prediction expression (i.e. LA(1)==A && LA(2)==A) is clearly
2098 wrong because LA(2) should be matched to B (first[2] of beta is {B}).
2099
2100 With 1.33MR14 the prediction expression is:
2101
2102 ...
2103 if ( !zzrv && LA(1)==A && LA(2)==B) {
2104 alpha();
2105 zzGUESS_DONE
2106 beta();
2107 zzmatch(C);
2108 zzCONSUME;
2109 } else {
2110 ...
2111
2112 This will only affect users in which alpha is shorter than
2113 than max(k,ck) and there is an action immediately preceding
2114 the syntactic predicate.
2115
2116 This problem was reported by reported by Arpad Beszedes
2117 (beszedes inf.u-szeged.hu) who provided a very clear example
2118 of the problem and identified the presence of the init-action
2119 as the likely culprit.
2120
2121 #173. (Changed in MR13a) -glms for Microsoft style filenames with -gl
2122
2123 With the -gl option antlr generates #line directives using the
2124 exact name of the input files specified on the command line.
2125 An oddity of the Microsoft C and C++ compilers is that they
2126 don't accept file names in #line directives containing "\"
2127 even though these are names from the native file system.
2128
2129 With -glms option, the "\" in file names appearing in #line
2130 directives is replaced with a "/" in order to conform to
2131 Microsoft compiler requirements.
2132
2133 Reported by Erwin Achermann (erwin.achermann switzerland.org).
2134
2135 #172. (Changed in MR13) \r\n in antlr source counted as one line
2136
2137 Some MS software uses \r\n to indicate a new line. Antlr
2138 now recognizes this in counting lines.
2139
2140 Reported by Edward L. Hepler (elh ece.vill.edu).
2141
2142 #171. (Changed in MR13) #tokclass L..U now allowed
2143
2144 The following is now allowed:
2145
2146 #tokclass ABC { A..B C }
2147
2148 Reported by Dave Watola (dwatola amtsun.jpl.nasa.gov)
2149
2150 #170. (Changed in MR13) Suppression for predicates with lookahead depth >1
2151
2152 In MR12 the capability for suppression of predicates with lookahead
2153 depth=1 was introduced. With MR13 this had been extended to
2154 predicates with lookahead depth > 1 and released for use by users
2155 on an experimental basis.
2156
2157 Consider the following grammar with -ck 2 and the predicate in rule
2158 "a" with depth 2:
2159
2160 r1 : (ab)* "@"
2161 ;
2162
2163 ab : a
2164 | b
2165 ;
2166
2167 a : (A B)? => <<p(LATEXT(2))>>? A B C
2168 ;
2169
2170 b : A B C
2171 ;
2172
2173 Normally, the predicate would be hoisted into rule r1 in order to
2174 determine whether to call rule "ab". However it should *not* be
2175 hoisted because, even if p is false, there is a valid alternative
2176 in rule b. With "-mrhoistk on" the predicate will be suppressed.
2177
2178 If "-info p" command line option is present the following information
2179 will appear in the generated code:
2180
2181 while ( (LA(1)==A)
2182 #if 0
2183
2184 Part (or all) of predicate with depth > 1 suppressed by alternative
2185 without predicate
2186
2187 pred << p(LATEXT(2))>>?
2188 depth=k=2 ("=>" guard) rule a line 8 t1.g
2189 tree context:
2190 (root = A
2191 B
2192 )
2193
2194 The token sequence which is suppressed: ( A B )
2195 The sequence of references which generate that sequence of tokens:
2196
2197 1 to ab r1/1 line 1 t1.g
2198 2 ab ab/1 line 4 t1.g
2199 3 to b ab/2 line 5 t1.g
2200 4 b b/1 line 11 t1.g
2201 5 #token A b/1 line 11 t1.g
2202 6 #token B b/1 line 11 t1.g
2203
2204 #endif
2205
2206 A slightly more complicated example:
2207
2208 r1 : (ab)* "@"
2209 ;
2210
2211 ab : a
2212 | b
2213 ;
2214
2215 a : (A B)? => <<p(LATEXT(2))>>? (A B | D E)
2216 ;
2217
2218 b : <<q(LATEXT(2))>>? D E
2219 ;
2220
2221
2222 In this case, the sequence (D E) in rule "a" which lies behind
2223 the guard is used to suppress the predicate with context (D E)
2224 in rule b.
2225
2226 while ( (LA(1)==A || LA(1)==D)
2227 #if 0
2228
2229 Part (or all) of predicate with depth > 1 suppressed by alternative
2230 without predicate
2231
2232 pred << q(LATEXT(2))>>?
2233 depth=k=2 rule b line 11 t2.g
2234 tree context:
2235 (root = D
2236 E
2237 )
2238
2239 The token sequence which is suppressed: ( D E )
2240 The sequence of references which generate that sequence of tokens:
2241
2242 1 to ab r1/1 line 1 t2.g
2243 2 ab ab/1 line 4 t2.g
2244 3 to a ab/1 line 4 t2.g
2245 4 a a/1 line 8 t2.g
2246 5 #token D a/1 line 8 t2.g
2247 6 #token E a/1 line 8 t2.g
2248
2249 #endif
2250 &&
2251 #if 0
2252
2253 pred << p(LATEXT(2))>>?
2254 depth=k=2 ("=>" guard) rule a line 8 t2.g
2255 tree context:
2256 (root = A
2257 B
2258 )
2259
2260 #endif
2261
2262 (! ( LA(1)==A && LA(2)==B ) || p(LATEXT(2)) ) {
2263 ab();
2264 ...
2265
2266 #169. (Changed in MR13) Predicate test optimization for depth=1 predicates
2267
2268 When the MR12 generated a test of a predicate which had depth 1
2269 it would use the depth >1 routines, resulting in correct but
2270 inefficient behavior. In MR13, a bit test is used.
2271
2272 #168. (Changed in MR13) Token expressions in context guards
2273
2274 The token expressions appearing in context guards such as:
2275
2276 (A B)? => <<test(LT(1))>>? someRule
2277
2278 are computed during an early phase of antlr processing. As
2279 a result, prior to MR13, complex expressions such as:
2280
2281 ~B
2282 L..U
2283 ~L..U
2284 TokClassName
2285 ~TokClassName
2286
2287 were not computed properly. This resulted in incorrect
2288 context being computed for such expressions.
2289
2290 In MR13 these context guards are verified for proper semantics
2291 in the initial phase and then re-evaluated after complex token
2292 expressions have been computed in order to produce the correct
2293 behavior.
2294
2295 Reported by Arpad Beszedes (beszedes inf.u-szeged.hu).
2296
2297 #167. (Changed in MR13) ~L..U
2298
2299 Prior to MR13, the complement of a token range was
2300 not properly computed.
2301
2302 #166. (Changed in MR13) token expression L..U
2303
2304 The token U was represented as an unsigned char, restricting
2305 the use of L..U to cases where U was assigned a token number
2306 less than 256. This is corrected in MR13.
2307
2308 #165. (Changed in MR13) option -newAST
2309
2310 To create ASTs from an ANTLRTokenPtr antlr usually calls
2311 "new AST(ANTLRTokenPtr)". This option generates a call
2312 to "newAST(ANTLRTokenPtr)" instead. This allows a user
2313 to define a parser member function to create an AST object.
2314
2315 Similar changes for ASTBase::tmake and ASTBase::link were not
2316 thought necessary since they do not create AST objects, only
2317 use existing ones.
2318
2319 #164. (Changed in MR13) Unused variable _astp
2320
2321 For many compilations, we have lived with warnings about
2322 the unused variable _astp. It turns out that this varible
2323 can *never* be used because the code which references it was
2324 commented out.
2325
2326 This investigation was sparked by a note from Erwin Achermann
2327 (erwin.achermann switzerland.org).
2328
2329 #163. (Changed in MR13) Incorrect makefiles for testcpp examples
2330
2331 All the examples in pccts/testcpp/* had incorrect definitions
2332 in the makefiles for the symbol "CCC". Instead of CCC=CC they
2333 had CC=$(CCC).
2334
2335 There was an additional problem in testcpp/1/test.g due to the
2336 change in ANTLRToken::getText() to a const member function
2337 (Item #137).
2338
2339 Reported by Maurice Mass (maas cuci.nl).
2340
2341 #162. (Changed in MR13) Combining #token with #tokdefs
2342
2343 When it became possible to change the print-name of a
2344 #token (Item #148) it became useful to give a #token
2345 statement whose only purpose was to giving a print name
2346 to the #token. Prior to this change this could not be
2347 combined with the #tokdefs feature.
2348
2349 #161. (Changed in MR13) Switch -gxt inhibits generation of tokens.h
2350
2351 #160. (Changed in MR13) Omissions in list of names for remap.h
2352
2353 When a user selects the -gp option antlr creates a list
2354 of macros in remap.h to rename some of the standard
2355 antlr routines from zzXXX to userprefixXXX.
2356
2357 There were number of omissions from the remap.h name
2358 list related to the new trace facility. This was reported,
2359 along with a fix, by Bernie Solomon (bernard ug.eds.com).
2360
2361 #159. (Changed in MR13) Violations of classic C rules
2362
2363 There were a number of violations of classic C style in
2364 the distribution kit. This was reported, along with fixes,
2365 by Bernie Solomon (bernard ug.eds.com).
2366
2367 #158. (Changed in MR13) #header causes problem for pre-processors
2368
2369 A user who runs the C pre-processor on antlr source suggested
2370 that another syntax be allowed. With MR13 such directives
2371 such as #header, #pragma, etc. may be written as "\#header",
2372 "\#pragma", etc. For escaping pre-processor directives inside
2373 a #header use something like the following:
2374
2375 \#header
2376 <<
2377 \#include <stdio.h>
2378 >>
2379
2380 #157. (Fixed in MR13) empty error sets for rules with infinite recursion
2381
2382 When the first set for a rule cannot be computed due to infinite
2383 left recursion and it is the only alternative for a block then
2384 the error set for the block would be empty. This would result
2385 in a fatal error.
2386
2387 Reported by Darin Creason (creason genedax.com)
2388
2389 #156. (Changed in MR13) DLGLexerBase::getToken() now public
2390
2391 #155. (Changed in MR13) Context behind predicates can suppress
2392
2393 With -mrhoist enabled the context behind a guarded predicate can
2394 be used to suppress other predicates. Consider the following grammar:
2395
2396 r0 : (r1)+;
2397
2398 r1 : rp
2399 | rq
2400 ;
2401 rp : <<p LATEXT(1)>>? B ;
2402 rq : (A)? => <<q LATEXT(1)>>? (A|B);
2403
2404 In earlier versions both predicates "p" and "q" would be hoisted into
2405 rule r0. With MR12c predicate p is suppressed because the context which
2406 follows predicate q includes "B" which can "cover" predicate "p". In
2407 other words, in trying to decide in r0 whether to call r1, it doesn't
2408 really matter whether p is false or true because, either way, there is
2409 a valid choice within r1.
2410
2411 #154. (Changed in MR13) Making hoist suppression explicit using <<nohoist>>
2412
2413 A common error, even among experienced pccts users, is to code
2414 an init-action to inhibit hoisting rather than a leading action.
2415 An init-action does not inhibit hoisting.
2416
2417 This was coded:
2418
2419 rule1 : <<;>> rule2
2420
2421 This is what was meant:
2422
2423 rule1 : <<;>> <<;>> rule2
2424
2425 With MR13, the user can code:
2426
2427 rule1 : <<;>> <<nohoist>> rule2
2428
2429 The following will give an error message:
2430
2431 rule1 : <<nohoist>> rule2
2432
2433 If the <<nohoist>> appears as an init-action rather than a leading
2434 action an error message is issued. The meaning of an init-action
2435 containing "nohoist" is unclear: does it apply to just one
2436 alternative or to all alternatives ?
2437
2438
2439
2440
2441
2442
2443
2444
2445 -------------------------------------------------------
2446 Note: Items #153 to #1 are now in a separate file named
2447 CHANGES_FROM_133_BEFORE_MR13.txt
2448 -------------------------------------------------------