Tools/CCode/Source/Pccts/CHANGES_FROM_133.txt

   1 =======================================================================
   2 List of Implemented Fixes and Changes for Maintenance Releases of PCCTS
   3 =======================================================================
   4
   5                                DISCLAIMER
   6
   7  The software and these notes are provided "as is".  They may include
   8  typographical or technical errors and their authors disclaims all
   9  liability of any kind or nature for damages due to error, fault,
  10  defect, or deficiency regardless of cause.  All warranties of any
  11  kind, either express or implied, including, but not limited to, the
  12  implied  warranties of merchantability and fitness for a particular
  13  purpose are disclaimed.
  14
  15
  16         -------------------------------------------------------
  17         Note:  Items #153 to #1 are now in a separate file named
  18                 CHANGES_FROM_133_BEFORE_MR13.txt
  19         -------------------------------------------------------
  20
  21 #312. (Changed in MR33) Bug caused by change #299.
  22
  23         In change #299 a warning message was suppressed when there was
  24         no LT(1) in a semantic predicate and max(k,ck) was 1.  The
  25         changed caused the code which set a default predicate depth for
  26         the semantic predicate to be left as 0 rather than set to 1.
  27
  28         This manifested as an error at line #1559 of mrhost.c
  29
  30         Reported by Peter Dulimov.
  31
  32 #311. (Changed in MR33) Added sorcer/lib to Makefile.
  33
  34     Reported by Dale Martin.
  35
  36 #310. (Changed in MR32) In C mode zzastPush was spelled zzastpush in one case.
  37
  38     Reported by Jean-Claude Durand
  39
  40 #309. (Changed in MR32) Renamed baseName because of VMS name conflict
  41
  42     Renamed baseName to pcctsBaseName to avoid library name conflict with
  43     VMS library routine.  Reported by Jean-François PIÉRONNE.
  44
  45 #308. (Changed in MR32) Used "template" as name of formal in C routine
  46
  47         In astlib.h routine ast_scan a formal was named "template".  This caused
  48         problems when the C code was compiled with a C++ compiler.  Reported by
  49         Sabyasachi Dey.
  50
  51 #307. (Changed in MR31) Compiler dependent bug in function prototype generation
  52
  53     The code which generated function prototypes contained a bug which
  54     was compiler/optimization dependent.  Under some circumstance an
  55     extra character would be included in portions of a function prototype.
  56
  57     Reported by David Cook.
  58
  59 #306. (Changed in MR30) Validating predicate following a token
  60
  61     A validating predicate which immediately followed a token match
  62     consumed the token after the predicate rather than before.  Prior
  63     to this fix (in the following example) isValidTimeScaleValue() in
  64     the predicate would test the text for TIMESCALE rather than for
  65     NUMBER:
  66
  67                 time_scale :
  68                 TIMESCALE
  69                 <<isValidTimeScaleValue(LT(1)->getText())>>?
  70                 ts:NUMBER
  71                 ( us:MICROSECOND << tVal = ...>>
  72                 | ns:NANOSECOND << tVal = ...  >>
  73                 )
  74
  75         Reported by Adalbert Perbandt.
  76
  77 #305. (Changed in MR30) Alternatives with guess blocks inside (...)* blocks.
  78
  79         In MR14 change #175 fixed a bug in the prediction expressions for guess
  80         blocks which were of the form (alpha)? beta.  Unfortunately, this
  81         resulted in a new bug as exemplified by the example below, which computed
  82         the first set for r as {B} rather than {B C}:
  83
  84                                         r : ( (A)? B
  85                                             | C
  86                                                 )*
  87
  88     This example doesn't make any sense as A is not a prefix of B, but it
  89     illustrates the problem.  This bug did not appear for:
  90
  91                                 r : ( (A)?
  92                                     | C
  93                                     )*
  94
  95         because it does not use the (alpha)? beta form.
  96
  97         Item #175 fixed an asymmetry in ambiguity messages for the following
  98         constructs which appear to have identical ambiguities (between repeating
  99         the loop vs. exiting the loop).  MR30 retains this fix, but the implementation
 100         is slightly different.
 101
 102                   r_star : ( (A B)? )* A ;
 103                   r_plus : ( (A B)? )+ A ;
 104
 105     Reported by Arpad Beszedes (beszedes inf.u-szeged.hu).
 106
 107 #304. (Changed in MR30) Crash when mismatch between output value counts.
 108
 109         For a rule such as:
 110
 111                 r1 : r2>[i,j];
 112                 r2 >[int i, int j] : A;
 113
 114         If there were extra actuals for the reference to rule r2 from rule r1
 115         there antlr would crash.  This bug was introduced by change #276.
 116
 117         Reported by Sinan Karasu.
 118
 119 #303. (Changed in MR30) DLGLexerBase::replchar
 120
 121         DLGLexerBase::replchar and the C mode routine zzreplchar did not work
 122         properly when the new character was 0.
 123
 124     Reported with fix by Philippe Laporte
 125
 126 #302. (Changed in MR28) Fix significant problems in initial release of MR27.
 127
 128 #301. (Changed in MR27) Default tab stops set to 2 spaces.
 129
 130     To have antlr generate true tabs rather than spaces, use "antlr -tab 0".
 131     To generate 4 spaces per tab stop use "antlr -tab 4"
 132
 133 #300. (Changed in MR27)
 134
 135         Consider the following methods of constructing an AST from ID:
 136
 137         rule1!
 138                 : id:ID << #0 = #[id]; >> ;
 139
 140         rule2!
 141                 : id:ID << #0 = #id; >> ;
 142
 143         rule3
 144                 : ID ;
 145
 146         rule4
 147                 : id:ID << #0 = #id; >> ;
 148
 149     For rule_2, the AST corresponding to id would always be NULL.  This
 150     is because the user explicitly suppressed AST construction using the
 151     "!" operator on the rule.  In MR27 the use of an AST expression
 152     such as #id overrides the "!" operator and forces construction of
 153     the AST.
 154
 155     This fix does not apply to C mode ASTs when the ASTs are referenced
 156     using numbers rather than symbols.
 157
 158         For C mode, this requires that the (optional) function/macro zzmk_ast
 159         be defined.  This functions copies information from an attribute into
 160         a previously allocated AST.
 161
 162     Reported by Jan Langer (jan langernetz.de)
 163
 164 #299. (Changed in MR27) Don't warn if k=1 and semantic predicate missing LT(i)
 165
 166     If a semantic does not have a reference to LT(i) or (C mode LATEXT(i))
 167     then pccts doesn't know how many lookahead tokens to use for context.
 168     However, if max(k,ck) is 1 then there is really only one choice and
 169     the warning is unnecessary.
 170
 171 #298. (Changed in MR27) Removed "register" for lastpos in dlgauto.c zzgettok
 172
 173 #297. (Changed in MR27) Incorrect prototypes when used with classic C
 174
 175     There were a number of errors in function headers when antlr was
 176     built with compilers that do not have __STDC__ or __cplusplus set.
 177
 178     The functions which have variable length argument lists now use
 179     PCCTS_USE_STDARG rather than __USE_PROTOTYPES__ to determine
 180     whether to use stdargs or varargs.
 181
 182 #296. (Changed in MR27) Complex return types in rules.
 183
 184     The following return type was not properly handled when
 185     unpacking a struct with containing multiple return values:
 186
 187       rule > [int i, IIR_Bool (IIR_Decl::*constraint)()] : ...
 188
 189     Instead of using "constraint", the program got lost and used
 190     an empty string.
 191
 192     Reported by P.A. Wilsey.
 193
 194 #295. (Changed in MR27) Extra ";" following zzGUESS_DONE sometimes.
 195
 196     Certain constructs with guess blocks in MR23 led to extra ";"
 197     preceding the "else" clause of an "if".
 198
 199     Reported by P.A. Wilsey.
 200
 201 #294. (Changed in MR27) Infinite loop in antlr for nested blocks
 202
 203     An oversight in detecting an empty alternative sometimes led
 204     to an infinite loop in antlr when it encountered a rule with
 205     nested blocks and guess blocks.
 206
 207     Reported by P.A. Wilsey.
 208
 209 #293. (Changed in MR27) Sorcerer optimization of _t->type()
 210
 211     Sorcerer generated code may contain many calls to _t->type() in a
 212     single statement.  This change introduces a temporary variable
 213     to eliminate unnnecesary function calls.
 214
 215     Change implemented by Tom Molteno (tim videoscript.com).
 216
 217 #292. (Changed in MR27)
 218
 219     WARNING:  Item #267 changes the signature of methods in the AST class.
 220
 221     **** Be sure to revise your AST functions of the same name  ***
 222
 223 #291. (Changed in MR24)
 224
 225     Fix to serious code generation error in MR23 for (...)+ block.
 226
 227 #290. (Changed in MR23)
 228
 229     Item #247 describes a change in the way {...} blocks handled
 230     an error.  Consider:
 231
 232             r1 : {A} b ;
 233             b  : B;
 234
 235                 with input "C".
 236
 237     Prior to change #247, the error would resemble "expected B -
 238     found C".  This is correct but incomplete, and therefore
 239     misleading.  In #247 it was changed to "expected A, B - found
 240     C".  This was fine, except for users of parser exception
 241     handling because the exception was generated in the epilogue
 242     for {...} block rather than in rule b.  This made it difficult
 243     for users of parser exception handling because B was not
 244     expected in that context. Those not using parser exception
 245     handling didn't notice the difference.
 246
 247     The current change restores the behavior prior to #247 when
 248     parser exceptions are present, but retains the revised behavior
 249     otherwise.  This change should be visible only when exceptions
 250     are in use and only for {...} blocks and sub-blocks of the form
 251     (something|something | something | epsilon) where epsilon represents
 252     an empty production and it is the last alternative of a sub-block.
 253     In contrast, (something | epsilon | something) should generate the
 254     same code as before, even when exceptions are used.
 255
 256     Reported by Philippe Laporte (philippe at transvirtual.com).
 257
 258 #289. (Changed in MR23) Bug in matching complement of a #tokclass
 259
 260     Prior to MR23 when a #tokclass was matched in both its complemented form
 261     and uncomplemented form, the bit set generated for its first use was used
 262     for both cases.  However, the prediction expression was correctly computed
 263     in both cases.  This meant that the second case would never be matched
 264     because, for the second appearance, the prediction expression and the
 265     set to be matched would be complements of each other.
 266
 267     Consider:
 268
 269                 #token A "a"
 270                 #token B "b"
 271                 #token C "c"
 272                 #tokclass AB {A B}
 273
 274                 r1 : AB    /* alt 1x */
 275                    | ~AB   /* alt 1y */
 276                    ;
 277
 278     Prior to MR23, this resulted in alternative 1y being unreachable.  Had it
 279     been written:
 280
 281                 r2 : ~AB  /* alt 2x */
 282                    : AB   /* alt 2y */
 283
 284     then alternative 2y would have become unreachable.
 285
 286     This bug was only for the case of complemented #tokclass.  For complemented
 287     #token the proper code was generated.
 288
 289 #288. (Changed in MR23) #errclass not restricted to choice points
 290
 291     The #errclass directive is supposed to allow a programmer to define
 292     print strings which should appear in syntax error messages as a replacement
 293     for some combinations of tokens. For instance:
 294
 295             #errclass Operator {PLUS MINUS TIMES DIVIDE}
 296
 297     If a syntax message includes all four of these tokens, and there is no
 298     "better" choice of error class, the word "Operator" will be used rather
 299     than a list of the four token names.
 300
 301     Prior to MR23 the #errclass definitions were used only at choice points
 302     (which call the FAIL macro). In other cases where there was no choice
 303     (e.g. where a single token or token class were matched) the #errclass
 304     information was not used.
 305
 306     With MR23 the #errclass declarations are used for syntax error messages
 307     when matching a #tokclass, a wildcard (i.e. "*"), or the complement of a
 308     #token or #tokclass (e.g. ~Operator).
 309
 310     Please note that #errclass may now be defined using #tokclass names
 311     (see Item #284).
 312
 313     Reported by Philip A. Wilsey.
 314
 315 #287. (Changed in MR23) Print name for #tokclass
 316
 317     Item #148 describes how to give a print name to a #token so that,for
 318     example, #token ID could have the expression "identifier" in syntax
 319     error messages.  This has been extended to #tokclass:
 320
 321             #token ID("identifier")  "[a-zA-Z]+"
 322             #tokclass Primitive("primitive type")
 323                                     {INT, FLOAT, CHAR, FLOAT, DOUBLE, BOOL}
 324
 325     This is really a cosmetic change, since #tokclass names do not appear
 326     in any error messages.
 327
 328 #286. (Changed in MR23) Makefile change to use of cd
 329
 330     In cases where a pccts subdirectory name matched a directory identified
 331     in a $CDPATH environment variable the build would fail.  All makefile
 332     cd commands have been changed from "cd xyz" to "cd ./xyz" in order
 333     to avoid this problem.
 334
 335 #285. (Changed in MR23) Check for null pointers in some dlg structures
 336
 337     An invalid regular expression can cause dlg to build an invalid
 338     structure to represent the regular expression even while it issues
 339     error messages.  Additional pointer checks were added.
 340
 341     Reported by Robert Sherry.
 342
 343 #284. (Changed in MR23) Allow #tokclass in #errclass definitions
 344
 345     Previously, a #tokclass reference in the definition of an
 346     #errclass was not handled properly. Instead of being expanded
 347     into the set of tokens represented by the #tokclass it was
 348     treated somewhat like an #errclass.  However, in a later phase
 349     when all #errclass were expanded into the corresponding tokens
 350     the #tokclass reference was not expanded (because it wasn't an
 351     #errclass).  In effect the reference was ignored.
 352
 353     This has been fixed.
 354
 355     Problem reported by Mike Dimmick (mike dimmick.demon.co.uk).
 356
 357 #283. (Changed in MR23) Option -tmake invoke's parser's tmake
 358
 359     When the string #(...) appears in an action antlr replaces it with
 360     a call to ASTBase::tmake(...) to construct an AST.  It is sometimes
 361     useful to change the tmake routine so that it has access to information
 362     in the parser - something which is not possible with a static method
 363     in an application where they may be multiple parsers active.
 364
 365     The antlr option -tmake replaces the call to ASTBase::tmake with a call
 366     to a user supplied tmake routine.
 367
 368 #282. (Changed in MR23) Initialization error for DBG_REFCOUNTTOKEN
 369
 370     When the pre-processor symbol DBG_REFCOUNTTOKEN is defined
 371     incorrect code is generated to initialize ANTLRRefCountToken::ctor and
 372     dtor.
 373
 374     Fix reported by Sven Kuehn (sven sevenkuehn.de).
 375
 376 #281. (Changed in MR23) Addition of -noctor option for Sorcerer
 377
 378     Added a -noctor option to suppress generation of the blank ctor
 379     for users who wish to define their own ctor.
 380
 381     Contributed by Jan Langer (jan langernetz.de).
 382
 383 #280. (Changed in MR23) Syntax error message for EOF token
 384
 385     The EOF token now receives special treatment in syntax error messages
 386     because there is no text matched by the eof token.  The token name
 387     of the eof token is used unless it is "@" - in which case the string
 388     "<eof>" is used.
 389
 390     Problem reported by Erwin Achermann (erwin.achermann switzerland.org).
 391
 392 #279. (Changed in MR23) Exception groups
 393
 394     There was a bug in the way that exception groups were attached to
 395     alternatives which caused problems when there was a block contained
 396     in an alternative.  For instance, in the following rule;
 397
 398         statement : IF S { ELSE S }
 399                         exception ....
 400         ;
 401
 402     the exception would be attached to the {...} block instead of the
 403     entire alternative because it was attached, in error, to the last
 404     alternative instead of the last OPEN alternative.
 405
 406     Reported by Ty Mordane (tymordane hotmail.com).
 407
 408 #278. (Changed in MR23) makefile changes
 409
 410     Contributed by Tomasz Babczynski (faster lab05-7.ict.pwr.wroc.pl).
 411
 412     The -cfile option is not absolutely needed: when extension of
 413     source file is one of the well-known C/C++ extensions it is
 414     treated as C/C++ source
 415
 416     The gnu make defines the CXX variable as the default C++ compiler
 417     name, so I added a line to copy this (if defined) to the CCC var.
 418
 419     Added a -sor option: after it any -class command defines the class
 420     name for sorcerer, not for ANTLR.  A file extended with .sor is
 421     treated as sorcerer input.  Because sorcerer can be called multiple
 422     times, -sor option can be repeated.  Any files and classes (one class
 423     per group) after each -sor makes one tree parser.
 424
 425     Not implemented:
 426
 427         1. Generate dependences for user c/c++ files.
 428         2. Support for -sor in c mode not.
 429
 430     I have left the old genmk program in the directory as genmk_old.c.
 431
 432 #277. (Changed in MR23) Change in macro for failed semantic predicates
 433
 434     In the past, a semantic predicate that failed generated a call to
 435     the macro zzfailed_pred:
 436
 437         #ifndef zzfailed_pred
 438         #define zzfailed_pred(_p) \
 439           if (guessing) { \
 440             zzGUESS_FAIL; \
 441           } else { \
 442             something(_p)
 443           }
 444         #endif
 445
 446     If a user wished to use the failed action option for semantic predicates:
 447
 448         rule : <<my_predicate>>? [my_fail_action] A
 449              | ...
 450
 451
 452     the code for my_fail_action would have to contain logic for handling
 453     the guess part of the zzfailed_pred macro.  The user should not have
 454     to be aware of the guess logic in writing the fail action.
 455
 456     The zzfailed_pred has been rewritten to have three arguments:
 457
 458             arg 1: the stringized predicate of the semantic predicate
 459             arg 2: 0 => there is no user-defined fail action
 460                    1 => there is a user-defined fail action
 461             arg 3: the user-defined fail action (if defined)
 462                    otherwise a no-operation
 463
 464     The zzfailed_pred macro is now defined as:
 465
 466         #ifndef zzfailed_pred
 467         #define zzfailed_pred(_p,_hasuseraction,_useraction) \
 468           if (guessing) { \
 469             zzGUESS_FAIL; \
 470           } else { \
 471             zzfailed_pred_action(_p,_hasuseraction,_useraction) \
 472           }
 473         #endif
 474
 475
 476     With zzfailed_pred_action defined as:
 477
 478         #ifndef zzfailed_pred_action
 479         #define zzfailed_pred_action(_p,_hasuseraction,_useraction) \
 480             if (_hasUserAction) { _useraction } else { failedSemanticPredicate(_p); }
 481         #endif
 482
 483     In C++ mode failedSemanticPredicate() is a virtual function.
 484     In C mode the default action is a fprintf statement.
 485
 486     Suggested by Erwin Achermann (erwin.achermann switzerland.org).
 487
 488 #276. (Changed in MR23) Addition of return value initialization syntax
 489
 490     In an attempt to reduce the problems caused by the PURIFY macro I have
 491     added new syntax for initializing the return value of rules and the
 492     antlr option "-nopurify".
 493
 494     A rule with a single return argument:
 495
 496         r1 > [Foo f = expr] :
 497
 498     now generates code that resembles:
 499
 500         Foo r1(void) {
 501           Foo _retv = expr;
 502           ...
 503         }
 504
 505     A rule with more than one return argument:
 506
 507         r2 > [Foo f = expr1, Bar b = expr2 ] :
 508
 509     generates code that resembles:
 510
 511         struct _rv1 {
 512             Foo f;
 513             Bar b;
 514         }
 515
 516         _rv1 r2(void) {
 517           struct _rv1 _retv;
 518           _retv.f = expr1;
 519           _retv.b = expr2;
 520           ...
 521         }
 522
 523     C++ style comments appearing in the initialization list may cause problems.
 524
 525 #275. (Changed in MR23) Addition of -nopurify option to antlr
 526
 527     A long time ago the PURIFY macro was introduced to initialize
 528     return value arguments and get rid of annying messages from program
 529     that checked for unitialized variables.
 530
 531     This has caused significant annoyance for C++ users that had
 532     classes with virtual functions or non-trivial contructors because
 533     it would zero the object, including the pointer to the virtual
 534     function table.  This could be defeated by redefining
 535     the PURIFY macro to be empty, but it was a constant surprise to
 536     new C++ users of pccts.
 537
 538     I would like to remove it, but I fear that some existing programs
 539     depend on it and would break.  My temporary solution is to add
 540     an antlr option -nopurify which disables generation of the PURIFY
 541     macro call.
 542
 543     The PURIFY macro should be avoided in favor of the new syntax
 544     for initializing return arguments described in item #275.
 545
 546     To avoid name clash, the PURIFY macro has been renamed PCCTS_PURIFY.
 547
 548 #274. (Changed in MR23) DLexer.cpp renamed to DLexer.h
 549       (Changed in MR23) ATokPtr.cpp renamed to ATokPtrImpl.h
 550
 551     These two files had .cpp extensions but acted like .h files because
 552     there were included in other files. This caused problems for many IDE.
 553     I have renamed them.  The ATokPtrImpl.h was necessary because there was
 554     already an ATokPtr.h.
 555
 556 #273. (Changed in MR23) Default win32 library changed to multi-threaded DLL
 557
 558     The model used for building the Win32 debug and release libraries has changed
 559     to multi-threaded DLL.
 560
 561     To make this change in your MSVC 6 project:
 562
 563         Project -> Settings
 564         Select the C++ tab in the right pane of the dialog box
 565         Select "Category: Code Generation"
 566         Under "Use run-time library" select one of the following:
 567
 568             Multi-threaded DLL
 569             Debug Multi-threaded DLL
 570
 571     Suggested by Bill Menees (bill.menees gogallagher.com)
 572
 573 #272. (Changed in MR23) Failed semantic predicate reported via virtual function
 574
 575     In the past, a failed semantic predicated reported the problem via a
 576     macro which used fprintf().  The macro now expands into a call on
 577     the virtual function ANTLRParser::failedSemanticPredicate().
 578
 579 #271. (Changed in MR23) Warning for LT(i), LATEXT(i) in token match actions
 580
 581     An bug (or at least an oddity) is that a reference to LT(1), LA(1),
 582     or LATEXT(1) in an action which immediately follows a token match
 583     in a rule refers to the token matched, not the token which is in
 584     the lookahead buffer.  Consider:
 585
 586         r : abc <<action alpha>> D <<action beta>> E;
 587
 588     In this case LT(1) in action alpha will refer to the next token in
 589     the lookahead buffer ("D"), but LT(1) in action beta will refer to
 590     the token matched by D - the preceding token.
 591
 592     A warning has been added for users about this when an action
 593     following a token match contains a reference to LT(1), LA(1), or LATEXT(1).
 594
 595     This behavior should be changed, but it appears in too many programs
 596     now.  Another problem, perhaps more significant, is that the obvious
 597     fix (moving the consume() call to before the action) could change the
 598     order in which input is requested and output appears in existing programs.
 599
 600     This problem was reported, along with a fix by Benjamin Mandel
 601     (beny sd.co.il).  However, I felt that changing the behavior was too
 602     dangerous for existing code.
 603
 604 #270. (Changed in MR23) Removed static objects from PCCTSAST.cpp
 605
 606     There were some statically allocated objects in PCCTSAST.cpp
 607     These were changed to non-static.
 608
 609 #269. (Changed in MR23) dlg output for initializing static array
 610
 611     The output from dlg contains a construct similar to the
 612     following:
 613
 614         struct XXX {
 615           static const int size;
 616           static int array1[5];
 617         };
 618
 619         const int XXX::size = 4;
 620         int XXX::array1[size+1];
 621
 622
 623     The problem is that although the expression "size+1" used in
 624     the definition of array1 is equal to 5 (the expression used to
 625     declare array), it is not considered equivalent by some compilers.
 626
 627     Reported with fix by Volker H. Simonis (simonis informatik.uni-tuebingen.de)
 628
 629 #268. (Changed in MR23) syn() routine output when k > 1
 630
 631     The syn() routine is supposed to print out the text of the
 632     token causing the syntax error.  It appears that it always
 633     used the text from the first lookahead token rather than the
 634     appropriate one.  The appropriate one is computed by comparing
 635     the token codes of lookahead token i (for i = 1 to k) with
 636     the FIRST(i) set.
 637
 638     This has been corrected in ANTLRParser::syn().
 639
 640     Reported by Bill Menees (bill.menees gogallagher.com)
 641
 642 #267. (Changed in MR23) AST traversal functions client data argument
 643
 644     The AST traversal functions now take an extra (optional) parameter
 645     which can point to client data:
 646
 647         preorder_action(void* pData = NULL)
 648         preorder_before_action(void* pData = NULL)
 649         preorder_after_action(void* pData = NULL)
 650
 651     ****       Warning: this changes the AST signature.         ***
 652     **** Be sure to revise your AST functions of the same name  ***
 653
 654     Bill Menees (bill.menees gogallagher.com)
 655
 656 #266. (Changed in MR23) virtual function printMessage()
 657
 658     Bill Menees (bill.menees gogallagher.com) has completed the
 659     tedious taks of replacing all calls to fprintf() with calls
 660     to the virtual function printMessage().  For classes which
 661     have a pointer to the parser it forwards the printMessage()
 662     call to the parser's printMessage() routine.
 663
 664     This should make it significanly easier to redirect pccts
 665     error and warning messages.
 666
 667 #265. (Changed in MR23) Remove "labase++" in C++ mode
 668
 669     In C++ mode labase++ is called when a token is matched.
 670     It appears that labase is not used in C++ mode at all, so
 671     this code has been commented out.
 672
 673 #264. (Changed in MR23) Complete rewrite of ParserBlackBox.h
 674
 675     The parser black box (PBlackBox.h) was completely rewritten
 676     by Chris Uzdavinis (chris atdesk.com) to improve its robustness.
 677
 678 #263. (Changed in MR23) -preamble and -preamble_first rescinded
 679
 680     Changes for item #253 have been rescinded.
 681
 682 #262. (Changed in MR23) Crash with -alpha option during traceback
 683
 684     Under some circumstances a -alpha traceback was started at the
 685     "wrong" time.  As a result, internal data structures were not
 686     initialized.
 687
 688     Reported by Arpad Beszedes (beszedes inf.u-szeged.hu).
 689
 690 #261. (Changed in MR23) Defer token fetch for C++ mode
 691
 692     Item #216 has been revised to indicate that use of the defer fetch
 693     option (ZZDEFER_FETCH) requires dlg option -i.
 694
 695 #260. (MR22) Raise default lex buffer size from 8,000 to 32,000 bytes.
 696
 697     ZZLEXBUFSIZE is the size (in bytes) of the buffer used by dlg
 698     generated lexers.  The default value has been raised to 32,000 and
 699     the value used by antlr, dlg, and sorcerer has also been raised to
 700     32,000.
 701
 702 #259. (MR22) Default function arguments in C++ mode.
 703
 704     If a rule is declared:
 705
 706             rr [int i = 0] : ....
 707
 708     then the declaration generated by pccts resembles:
 709
 710             void rr(int i = 0);
 711
 712     however, the definition must omit the default argument:
 713
 714             void rr(int i) {...}
 715
 716     In the past the default value was not omitted.  In MR22
 717     the generated code resembles:
 718
 719             void rr(int i /* = 0 */ ) {...}
 720
 721     Implemented by Volker H. Simonis (simonis informatik.uni-tuebingen.de)
 722
 723
 724     Note: In MR23 this was changed so that nested C style comments
 725     ("/* ... */") would not cause problems.
 726
 727 #258. (MR22)  Using a base class for your parser
 728
 729     In item #102 (MR10) the class statement was extended to allow one
 730     to specify a base class other than ANTLRParser for the generated
 731     parser.  It turned out that this was less than useful because
 732     the constructor still specified ANTLRParser as the base class.
 733
 734     The class statement now uses the first identifier appearing after
 735     the ":" as the name of the base class.  For example:
 736
 737         class MyParser : public FooParser {
 738
 739     Generates in MyParser.h:
 740
 741             class MyParser : public FooParser {
 742
 743     Generates in MyParser.cpp something that resembles:
 744
 745             MyParser::MyParser(ANTLRTokenBuffer *input) :
 746                                          FooParser(input,1,0,0,4)
 747             {
 748                 token_tbl = _token_tbl;
 749                 traceOptionValueDefault=1;    // MR10 turn trace ON
 750             }
 751
 752     The base class constructor must have a signature similar to
 753     that of ANTLRParser.
 754
 755 #257. (MR21a) Removed dlg statement that -i has no effect in C++ mode.
 756
 757     This was incorrect.
 758
 759 #256. (MR21a) Malformed syntax graph causes crash after error message.
 760
 761     In the past, certain kinds of errors in the very first grammar
 762     element could cause the construction of a malformed graph
 763     representing the grammar.  This would eventually result in a
 764     fatal internal error.  The code has been changed to be more
 765     resistant to this particular error.
 766
 767 #255. (MR21a) ParserBlackBox(FILE* f)
 768
 769     This constructor set openByBlackBox to the wrong value.
 770
 771     Reported by Kees Bakker (kees_bakker tasking.nl).
 772
 773 #254. (MR21a) Reporting syntax error at end-of-file
 774
 775     When there was a syntax error at the end-of-file the syntax
 776     error routine would substitute "<eof>" for the programmer's
 777     end-of-file symbol.  This substitution is now done only when
 778     the programmer does not define his own end-of-file symbol
 779     or the symbol begins with the character "@".
 780
 781     Reported by Kees Bakker (kees_bakker tasking.nl).
 782
 783 #253. (MR21) Generation of block preamble (-preamble and -preamble_first)
 784
 785         *** This change was rescinded by item #263 ***
 786
 787     The antlr option -preamble causes antlr to insert the code
 788     BLOCK_PREAMBLE at the start of each rule and block.  It does
 789     not insert code before rules references, token references, or
 790     actions.  By properly defining the macro BLOCK_PREAMBLE the
 791     user can generate code which is specific to the start of blocks.
 792
 793     The antlr option -preamble_first is similar, but inserts the
 794     code BLOCK_PREAMBLE_FIRST(PreambleFirst_123) where the symbol
 795     PreambleFirst_123 is equivalent to the first set defined by
 796     the #FirstSetSymbol described in Item #248.
 797
 798     I have not investigated how these options interact with guess
 799     mode (syntactic predicates).
 800
 801 #252. (MR21) Check for null pointer in trace routine
 802
 803     When some trace options are used when the parser is generated
 804     without the trace enabled, the current rule name may be a
 805     NULL pointer.  A guard was added to check for this in
 806     restoreState.
 807
 808     Reported by Douglas E. Forester (dougf projtech.com).
 809
 810 #251. (MR21) Changes to #define zzTRACE_RULES
 811
 812     The macro zzTRACE_RULES was being use to pass information to
 813     AParser.h.  If this preprocessor symbol was not properly
 814     set the first time AParser.h was #included, the declaration
 815     of zzTRACEdata would be omitted (it is used by the -gd option).
 816     Subsequent #includes of AParser.h would be skipped because of
 817     the #ifdef guard, so the declaration of zzTracePrevRuleName would
 818     never be made.  The result was that proper compilation was very
 819     order dependent.
 820
 821     The declaration of zzTRACEdata was made unconditional and the
 822     problem of removing unused declarations will be left to optimizers.
 823
 824     Diagnosed by Douglas E. Forester (dougf projtech.com).
 825
 826 #250. (MR21) Option for EXPERIMENTAL change to error sets for blocks
 827
 828     The antlr option -mrblkerr turns on an experimental feature
 829     which is supposed to provide more accurate syntax error messages
 830     for k=1, ck=1 grammars.  When used with k>1 or ck>1 grammars the
 831     behavior should be no worse than the current behavior.
 832
 833     There is no problem with the matching of elements or the computation
 834     of prediction expressions in pccts.  The task is only one of listing
 835     the most appropriate tokens in the error message.  The error sets used
 836     in pccts error messages are approximations of the exact error set when
 837     optional elements in (...)* or (...)+ are involved.  While entirely
 838     correct, the error messages are sometimes not 100% accurate.
 839
 840     There is also a minor philosophical issue.  For example, suppose the
 841     grammar expects the token to be an optional A followed by Z, and it
 842     is X.  X, of course, is neither A nor Z, so an error message is appropriate.
 843     Is it appropriate to say "Expected Z" ?  It is correct, it is accurate,
 844     but it is not complete.
 845
 846     When k>1 or ck>1 the problem of providing the exactly correct
 847     list of tokens for the syntax error messages ends up becoming
 848     equivalent to evaluating the prediction expression for the
 849     alternatives twice. However, for k=1 ck=1 grammars the prediction
 850     expression can be computed easily and evaluated cheaply, so I
 851     decided to try implementing it to satisfy a particular application.
 852     This application uses the error set in an interactive command language
 853     to provide prompts which list the alternatives available at that
 854     point in the parser.  The user can then enter additional tokens to
 855     complete the command line.  To do this required more accurate error
 856     sets then previously provided by pccts.
 857
 858     In some cases the default pccts behavior may lead to more robust error
 859     recovery or clearer error messages then having the exact set of tokens.
 860     This is because (a) features like -ge allow the use of symbolic names for
 861     certain sets of tokens, so having extra tokens may simply obscure things
 862     and (b) the error set is use to resynchronize the parser, so a good
 863     choice is sometimes more important than having the exact set.
 864
 865     Consider the following example:
 866
 867             Note:  All examples code has been abbreviated
 868             to the absolute minimum in order to make the
 869             examples concise.
 870
 871         star1 : (A)* Z;
 872
 873     The generated code resembles:
 874
 875            old                new (with -mrblkerr)
 876         --//-----------         --------------------
 877         for (;;) {            for (;;) {
 878             match(A);           match(A);
 879         }                     }
 880         match(Z);             if (! A and ! Z) then
 881                                 FAIL(...{A,Z}...);
 882                               }
 883                               match(Z);
 884
 885
 886         With input X
 887             old message: Found X, expected Z
 888             new message: Found X, expected A, Z
 889
 890     For the example:
 891
 892         star2 : (A|B)* Z;
 893
 894            old                      new (with -mrblkerr)
 895         -------------               --------------------
 896         for (;;) {                  for (;;) {
 897           if (!A and !B) break;       if (!A and !B) break;
 898           if (...) {                  if (...) {
 899             <same ...>                  <same ...>
 900           }                           }
 901           else {                      else {
 902             FAIL(...{A,B,Z}...)         FAIL(...{A,B}...);
 903           }                           }
 904         }                           }
 905         match(B);                   if (! A and ! B and !Z) then
 906                                         FAIL(...{A,B,Z}...);
 907                                     }
 908                                     match(B);
 909
 910         With input X
 911             old message: Found X, expected Z
 912             new message: Found X, expected A, B, Z
 913         With input A X
 914             old message: Found X, expected Z
 915             new message: Found X, expected A, B, Z
 916
 917             This includes the choice of looping back to the
 918             star block.
 919
 920     The code for plus blocks:
 921
 922         plus1 : (A)+ Z;
 923
 924     The generated code resembles:
 925
 926            old                  new (with -mrblkerr)
 927         -------------           --------------------
 928         do {                    do {
 929           match(A);               match(A);
 930         } while (A)             } while (A)
 931         match(Z);               if (! A and ! Z) then
 932                                   FAIL(...{A,Z}...);
 933                                 }
 934                                 match(Z);
 935
 936         With input A X
 937             old message: Found X, expected Z
 938             new message: Found X, expected A, Z
 939
 940             This includes the choice of looping back to the
 941             plus block.
 942
 943     For the example:
 944
 945         plus2 : (A|B)+ Z;
 946
 947            old                    new (with -mrblkerr)
 948         -------------             --------------------
 949         do {                        do {
 950           if (A) {                    <same>
 951             match(A);                 <same>
 952           } else if (B) {             <same>
 953             match(B);                 <same>
 954           } else {                    <same>
 955             if (cnt > 1) break;       <same>
 956             FAIL(...{A,B,Z}...)         FAIL(...{A,B}...);
 957           }                           }
 958           cnt++;                      <same>
 959         }                           }
 960
 961         match(Z);                   if (! A and ! B and !Z) then
 962                                         FAIL(...{A,B,Z}...);
 963                                     }
 964                                     match(B);
 965
 966         With input X
 967             old message: Found X, expected A, B, Z
 968             new message: Found X, expected A, B
 969         With input A X
 970             old message: Found X, expected Z
 971             new message: Found X, expected A, B, Z
 972
 973             This includes the choice of looping back to the
 974             star block.
 975
 976 #249. (MR21) Changes for DEC/VMS systems
 977
 978     Jean-François Piéronne (jfp altavista.net) has updated some
 979     VMS related command files and fixed some minor problems related
 980     to building pccts under the DEC/VMS operating system.  For DEC/VMS
 981     users the most important differences are:
 982
 983         a.  Revised makefile.vms
 984         b.  Revised genMMS for genrating VMS style makefiles.
 985
 986 #248. (MR21) Generate symbol for first set of an alternative
 987
 988     pccts can generate a symbol which represents the tokens which may
 989     appear at the start of a block:
 990
 991         rr : #FirstSetSymbol(rr_FirstSet)  ( Foo | Bar ) ;
 992
 993     This will generate the symbol rr_FirstSet of type SetWordType with
 994     elements Foo and Bar set. The bits can be tested using code similar
 995     to the following:
 996
 997         if (set_el(Foo, &rr_FirstSet)) { ...
 998
 999     This can be combined with the C array zztokens[] or the C++ routine
1000     tokenName() to get the print name of the token in the first set.
1001
1002     The size of the set is given by the newly added enum SET_SIZE, a
1003     protected member of the generated parser's class.  The number of
1004     elements in the generated set will not be exactly equal to the
1005     value of SET_SIZE because of synthetic tokens created by #tokclass,
1006     #errclass, the -ge option, and meta-tokens such as epsilon, and
1007     end-of-file.
1008
1009     The #FirstSetSymbol must appear immediately before a block
1010     such as (...)+, (...)*, and {...}, and (...).  It may not appear
1011     immediately before a token, a rule reference, or action.  However
1012     a token or rule reference can be enclosed in a (...) in order to
1013     make the use of #pragma FirstSetSymbol legal.
1014
1015             rr_bad : #FirstSetSymbol(rr_bad_FirstSet) Foo;   //  Illegal
1016
1017             rr_ok :  #FirstSetSymbol(rr_ok_FirstSet) (Foo);  //  Legal
1018
1019     Do not confuse FirstSetSymbol sets with the sets used for testing
1020     lookahead. The sets used for FirstSetSymbol have one element per bit,
1021     so the number of bytes  is approximately the largest token number
1022     divided by 8.  The sets used for testing lookahead store 8 lookahead
1023     sets per byte, so the length of the array is approximately the largest
1024     token number.
1025
1026     If there is demand, a similar routine for follow sets can be added.
1027
1028 #247. (MR21) Misleading error message on syntax error for optional elements.
1029
1030         ===================================================
1031         The behavior has been revised when parser exception
1032         handling is used.  See Item #290
1033         ===================================================
1034
1035     Prior to MR21, tokens which were optional did not appear in syntax
1036     error messages if the block which immediately followed detected a
1037     syntax error.
1038
1039     Consider the following grammar which accepts Number, Word, and Other:
1040
1041             rr : {Number} Word;
1042
1043     For this rule the code resembles:
1044
1045             if (LA(1) == Number) {
1046                 match(Number);
1047                 consume();
1048             }
1049             match(Word);
1050
1051     Prior to MR21, the error message for input "$ a" would be:
1052
1053             line 1: syntax error at "$" missing Word
1054
1055     With MR21 the message will be:
1056
1057             line 1: syntax error at "$" expecting Word, Number.
1058
1059     The generate code resembles:
1060
1061             if ( (LA(1)==Number) ) {
1062                 zzmatch(Number);
1063                 consume();
1064             }
1065             else {
1066                 if ( (LA(1)==Word) ) {
1067                     /* nothing */
1068                 }
1069                 else {
1070                     FAIL(... message for both Number and Word ...);
1071                 }
1072             }
1073             match(Word);
1074
1075     The code generated for optional blocks in MR21 is slightly longer
1076     than the previous versions, but it should give better error messages.
1077
1078     The code generated for:
1079
1080             { a | b | c }
1081
1082     should now be *identical* to:
1083
1084             ( a | b | c | )
1085
1086     which was not the case prior to MR21.
1087
1088     Reported by Sue Marvin (sue siara.com).
1089
1090 #246. (Changed in MR21) Use of $(MAKE) for calls to make
1091
1092     Calls to make from the makefiles were replaced with $(MAKE)
1093     because of problems when using gmake.
1094
1095     Reported with fix by Sunil K.Vallamkonda (sunil siara.com).
1096
1097 #245. (Changed in MR21) Changes to genmk
1098
1099     The following command line options have been added to genmk:
1100
1101         -cfiles ...
1102
1103             To add a user's C or C++ files into makefile automatically.
1104             The list of files must be enclosed in apostrophes.  This
1105             option may be specified multiple times.
1106
1107         -compiler ...
1108
1109             The name of the compiler to use for $(CCC) or $(CC).  The
1110             default in C++ mode is "CC".  The default in C mode is "cc".
1111
1112         -pccts_path ...
1113
1114             The value for $(PCCTS), the pccts directory.  The default
1115             is /usr/local/pccts.
1116
1117     Contributed by Tomasz Babczynski (t.babczynski ict.pwr.wroc.pl).
1118
1119 #244. (Changed in MR21) Rename variable "not" in antlr.g
1120
1121     When antlr.g is compiled with a C++ compiler, a variable named
1122     "not" causes problems.  Reported by Sinan Karasu
1123     (sinan.karasu boeing.com).
1124
1125 #243  (Changed in MR21) Replace recursion with iteration in zzfree_ast
1126
1127     Another refinement to zzfree_ast in ast.c to limit recursion.
1128
1129     NAKAJIMA Mutsuki (muc isr.co.jp).
1130
1131
1132 #242.  (Changed in MR21) LineInfoFormatStr
1133
1134     Added an #ifndef/#endif around LineInfoFormatStr in pcctscfg.h.
1135
1136 #241. (Changed in MR21) Changed macro PURIFY to a no-op
1137
1138                 ***********************
1139                 *** NOT IMPLEMENTED ***
1140                 ***********************
1141
1142         The PURIFY macro was changed to a no-op because it was causing
1143         problems when passing C++ objects.
1144
1145         The old definition:
1146
1147             #define PURIFY(r,s)     memset((char *) &(r),'\\0',(s));
1148
1149         The new definition:
1150
1151             #define PURIFY(r,s)     /* nothing */
1152 #endif
1153
1154 #240. (Changed in MR21) sorcerer/h/sorcerer.h _MATCH and _MATCHRANGE
1155
1156     Added test for NULL token pointer.
1157
1158     Suggested by Peter Keller (keller ebi.ac.uk)
1159
1160 #239. (Changed in MR21) C++ mode AParser::traceGuessFail
1161
1162     If tracing is turned on when the code has been generated
1163     without trace code, a failed guess generates a trace report
1164     even though there are no other trace reports.  This
1165     make the behavior consistent with other parts of the
1166     trace system.
1167
1168     Reported by David Wigg (wiggjd sbu.ac.uk).
1169
1170 #238. (Changed in MR21) Namespace version #include files
1171
1172     Changed reference from CStdio to cstdio (and other
1173     #include file names) in the namespace version of pccts.
1174     Should have known better.
1175
1176 #237. (Changed in MR21) ParserBlackBox(FILE*)
1177
1178     In the past, ParserBlackBox would close the FILE in the dtor
1179     even though it was not opened by ParserBlackBox.  The problem
1180     is that there were two constructors, one which accepted a file
1181     name and did an fopen, the other which accepted a FILE and did
1182     not do an fopen.  There is now an extra member variable which
1183     remembers whether ParserBlackBox did the open or not.
1184
1185     Suggested by Mike Percy (mpercy scires.com).
1186
1187 #236. (Changed in MR21) tmake now reports down pointer problem
1188
1189     When ASTBase::tmake attempts to update the down pointer of
1190     an AST it checks to see if the down pointer is NULL.  If it
1191     is not NULL it does not do the update and returns NULL.
1192     An attempt to update the down pointer is almost always a
1193     result of a user error.  This can lead to difficult to find
1194     problems during tree construction.
1195
1196     With this change, the routine calls a virtual function
1197     reportOverwriteOfDownPointer() which calls panic to
1198     report the problem.  Users who want the old behavior can
1199     redefined the virtual function in their AST class.
1200
1201     Suggested by Sinan Karasu (sinan.karasu boeing.com)
1202
1203 #235. (Changed in MR21) Made ANTLRParser::resynch() virtual
1204
1205     Suggested by Jerry Evans (jerry swsl.co.uk).
1206
1207 #234. (Changed in MR21) Implicit int for function return value
1208
1209     ATokenBuffer:bufferSize() did not specify a type for the
1210     return value.
1211
1212     Reported by Hai Vo-Ba (hai fc.hp.com).
1213
1214 #233. (Changed in MR20) Converted to MSVC 6.0
1215
1216     Due to external circumstances I have had to convert to MSVC 6.0
1217     The MSVC 5.0 project files (.dsw and .dsp) have been retained as
1218     xxx50.dsp and xxx50.dsw.  The MSVC 6.0 files are named xxx60.dsp
1219     and xxx60.dsw (where xxx is the related to the directory/project).
1220
1221 #232. (Changed in MR20) Make setwd bit vectors protected in parser.h
1222
1223     The access for the setwd array in the parser header was not
1224     specified.  As a result, it would depend on the code which
1225     preceded it.  In MR20 it will always have access "protected".
1226
1227     Reported by Piotr Eljasiak (eljasiak zt.gdansk.tpsa.pl).
1228
1229 #231. (Changed in MR20) Error in token buffer debug code.
1230
1231     When token buffer debugging is selected via the pre-processor
1232     symbol DEBUG_TOKENBUFFER there is an erroneous check in
1233     AParser.cpp:
1234
1235         #ifdef DEBUG_TOKENBUFFER
1236             if (i >= inputTokens->bufferSize() ||
1237                 inputTokens->minTokens() < LLk )     /* MR20 Was "<=" */
1238         ...
1239         #endif
1240
1241     Reported by David Wigg (wiggjd sbu.ac.uk).
1242
1243 #230. (Changed in MR20) Fixed problem with #define for -gd option
1244
1245     There was an error in setting zzTRACE_RULES for the -gd (trace) option.
1246
1247     Reported by Gary Funck (gary intrepid.com).
1248
1249 #229. (Changed in MR20) Additional "const" for literals
1250
1251     "const" was added to the token name literal table.
1252     "const" was added to some panic() and similar routine
1253
1254 #228. (Changed in MR20) dlg crashes on "()"
1255
1256     The following token defintion will cause DLG to crash.
1257
1258         #token "()"
1259
1260     When there is a syntax error in a regular expression
1261     many of the dlg routines return a structure which has
1262     null pointers.  When this is accessed by callers it
1263     generates the crash.
1264
1265     I have attempted to fix the more common cases.
1266
1267     Reported by  Mengue Olivier (dolmen bigfoot.com).
1268
1269 #227. (Changed in MR20) Array overwrite
1270
1271     Steveh Hand (sassth unx.sas.com) reported a problem which
1272     was traced to a temporary array which was not properly
1273     resized for deeply nested blocks.  This has been fixed.
1274
1275 #226. (Changed in MR20) -pedantic conformance
1276
1277     G. Hobbelt (i_a mbh.org) and THM made many, many minor
1278     changes to create prototypes for all the functions and
1279     bring antlr, dlg, and sorcerer into conformance with
1280     the gcc -pedantic option.
1281
1282     This may require uses to add pccts/h/pcctscfg.h to some
1283     files or makefiles in order to have __USE_PROTOS defined.
1284
1285 #225  (Changed in MR20) AST stack adjustment in C mode
1286
1287     The fix in #214 for AST stack adjustment in C mode missed
1288     some cases.
1289
1290     Reported with fix by Ger Hobbelt (i_a mbh.org).
1291
1292 #224  (Changed in MR20) LL(1) and LL(2) with #pragma approx
1293
1294     This may take a record for the oldest, most trival, lexical
1295     error in pccts.  The regular expressions for LL(1) and LL(2)
1296     lacked an escape for the left and right parenthesis.
1297
1298     Reported by Ger Hobbelt (i_a mbh.org).
1299
1300 #223  (Changed in MR20) Addition of IBM_VISUAL_AGE directory
1301
1302     Build files for antlr, dlg, and sorcerer under IBM Visual Age
1303     have been contributed by Anton Sergeev (ags mlc.ru).  They have
1304     been placed in the pccts/IBM_VISUAL_AGE directory.
1305
1306 #222  (Changed in MR20) Replace __STDC__ with __USE_PROTOS
1307
1308     Most occurrences of __STDC__ replaced with __USE_PROTOS due to
1309     complaints from several users.
1310
1311 #221  (Changed in MR20) Added #include for DLexerBase.h to PBlackBox.
1312
1313     Added #include for DLexerBase.h to PBlackBox.
1314
1315 #220  (Changed in MR19) strcat arguments reversed in #pred parse
1316
1317     The arguments to strcat are reversed when creating a print
1318     name for a hash table entry for use with #pred feature.
1319
1320     Problem diagnosed and fix reported by Scott Harrington
1321     (seh4 ix.netcom.com).
1322
1323 #219. (Changed in MR19) C Mode routine zzfree_ast
1324
1325     Changes to reduce use of recursion for AST trees with only right
1326     links or only left links in the C mode routine zzfree_ast.
1327
1328     Implemented by SAKAI Kiyotaka (ksakai isr.co.jp).
1329
1330 #218. (Changed in MR19) Changes to support unsigned char in C mode
1331
1332     Changes to antlr.h and err.h to fix omissions in use of zzchar_t
1333
1334     Implemented by SAKAI Kiyotaka (ksakai isr.co.jp).
1335
1336 #217. (Changed in MR19) Error message when dlg -i and -CC options selected
1337
1338     *** This change was rescinded by item #257 ***
1339
1340     The parsers generated by pccts in C++ mode are not able to support the
1341     interactive lexer option (except, perhaps, when using the deferred fetch
1342     parser option.(Item #216).
1343
1344     DLG now warns when both -i and -CC are selected.
1345
1346     This warning was suggested by David Venditti (07751870267-0001 t-online.de).
1347
1348 #216. (Changed in MR19) Defer token fetch for C++ mode
1349
1350     Implemented by Volker H. Simonis (simonis informatik.uni-tuebingen.de)
1351
1352     Normally, pccts keeps the lookahead token buffer completely filled.
1353     This requires max(k,ck) tokens of lookahead.  For some applications
1354     this can cause deadlock problems.  For example, there may be cases
1355     when the parser can't tell when the input has been completely consumed
1356     until the parse is complete, but the parse can't be completed because
1357     the input routines are waiting for additional tokens to fill the
1358     lookahead buffer.
1359
1360     When the ANTLRParser class is built with the pre-processor option
1361     ZZDEFER_FETCH defined, the fetch of new tokens by consume() is deferred
1362     until LA(i) or LT(i) is called.
1363
1364     To test whether this option has been built into the ANTLRParser class
1365     use "isDeferFetchEnabled()".
1366
1367     Using the -gd trace option with the default tracein() and traceout()
1368     routines will defeat the effort to defer the fetch because the
1369     trace routines print out information about the lookahead token at
1370     the start of the rule.
1371
1372     Because the tracein and traceout routines are virtual it is
1373     easy to redefine them in your parser:
1374
1375         class MyParser {
1376         <<
1377             virtual void tracein(ANTLRChar * ruleName)
1378                 { fprintf(stderr,"Entering: %s\n", ruleName); }
1379             virtual void traceout(ANTLRChar * ruleName)
1380                 { fprintf(stderr,"Leaving: %s\n", ruleName); }
1381         >>
1382
1383     The originals for those routines are pccts/h/AParser.cpp
1384
1385     This requires use of the dlg option -i (interactive lexer).
1386
1387     This is implemented only for C++ mode.
1388
1389     This is experimental.  The interaction with guess mode (syntactic
1390     predicates)is not known.
1391
1392 #215. (Changed in MR19) Addition of reset() to DLGLexerBase
1393
1394     There was no obvious way to reset the lexer for reuse.  The
1395     reset() method now does this.
1396
1397     Suggested by David Venditti (07751870267-0001 t-online.de).
1398
1399 #214. (Changed in MR19)  C mode: Adjust AST stack pointer at exit
1400
1401     In C mode the AST stack pointer needs to be reset if there will
1402     be multiple calls to the ANTLRx macros.
1403
1404     Reported with fix by Paul D. Smith (psmith baynetworks.com).
1405
1406 #213. (Changed in MR18)  Fatal error with -mrhoistk (k>1 hoisting)
1407
1408     When rearranging code I forgot to un-comment a critical line of
1409     code that handles hoisting of predicates with k>1 lookahead.  This
1410     is now fixed.
1411
1412     Reported by Reinier van den Born (reinier vnet.ibm.com).
1413
1414 #212. (Changed in MR17)  Mac related changes by Kenji Tanaka
1415
1416     Kenji Tanaka (kentar osa.att.ne.jp) has made a number of changes for
1417     Macintosh users.
1418
1419     a.  The following Macintosh MPW files aid in installing pccts on Mac:
1420
1421             pccts/MPW_Read_Me
1422
1423             pccts/install68K.mpw
1424             pccts/installPPC.mpw
1425
1426             pccts/antlr/antlr.r
1427             pccts/antlr/antlr68K.make
1428             pccts/antlr/antlrPPC.make
1429
1430             pccts/dlg/dlg.r
1431             pccts/dlg/dlg68K.make
1432             pccts/dlg/dlgPPC.make
1433
1434             pccts/sorcerer/sor.r
1435             pccts/sorcerer/sor68K.make
1436             pccts/sorcerer/sorPPC.make
1437
1438        They completely replace the previous Mac installation files.
1439
1440     b. The most significant is a change in the MAC_FILE_CREATOR symbol
1441        in pcctscfg.h:
1442
1443         old: #define MAC_FILE_CREATOR 'MMCC'   /* Metrowerks C/C++ Text files */
1444         new: #define MAC_FILE_CREATOR 'CWIE'   /* Metrowerks C/C++ Text files */
1445
1446     c.  Added calls to special_fopen_actions() where necessary.
1447
1448 #211. (Changed in MR16a)  C++ style comment in dlg
1449
1450     This has been fixed.
1451
1452 #210. (Changed in MR16a)  Sor accepts \r\n, \r, or \n for end-of-line
1453
1454     A user requested that Sorcerer be changed to accept other forms
1455     of end-of-line.
1456
1457 #209. (Changed in MR16) Name of files changed.
1458
1459         Old:  CHANGES_FROM_1.33
1460         New:  CHANGES_FROM_133.txt
1461
1462         Old:  KNOWN_PROBLEMS
1463         New:  KNOWN_PROBLEMS.txt
1464
1465 #208. (Changed in MR16) Change in use of pccts #include files
1466
1467     There were problems with MS DevStudio when mixing Sorcerer and
1468     PCCTS in the same source file.  The problem is caused by the
1469     redefinition of setjmp in the MS header file setjmp.h.  In
1470     setjmp.h the pre-processor symbol setjmp was redefined to be
1471     _setjmp.  A later effort to execute #include <setjmp.h> resulted
1472     in an effort to #include <_setjmp.h>.  I'm not sure whether this
1473     is a bug or a feature.  In any case, I decided to fix it by
1474     avoiding the use of pre-processor symbols in #include statements
1475     altogether.  This has the added benefit of making pre-compiled
1476     headers work again.
1477
1478     I've replaced statements:
1479
1480         old: #include PCCTS_SETJMP_H
1481         new: #include "pccts_setjmp.h"
1482
1483     Where pccts_setjmp.h contains:
1484
1485             #ifndef __PCCTS_SETJMP_H__
1486             #define __PCCTS_SETJMP_H__
1487
1488             #ifdef PCCTS_USE_NAMESPACE_STD
1489             #include <Csetjmp>
1490             #else
1491             #include <setjmp.h>
1492             #endif
1493
1494             #endif
1495
1496     A similar change has been made for other standard header files
1497     required by pccts and sorcerer: stdlib.h, stdarg.h, stdio.h, etc.
1498
1499     Reported by Jeff Vincent (JVincent novell.com) and Dale Davis
1500     (DalDavis spectrace.com).
1501
1502 #207. (Changed in MR16) dlg reports an invalid range for: [\0x00-\0xff]
1503
1504      -----------------------------------------------------------------
1505      Note from MR23:  This fix does not work.  I am investigating why.
1506      -----------------------------------------------------------------
1507
1508     dlg will report that this is an invalid range.
1509
1510     Diagnosed by Piotr Eljasiak (eljasiak no-spam.zt.gdansk.tpsa.pl):
1511
1512         I think this problem is not specific to unsigned chars
1513         because dlg reports no error for the range [\0x00-\0xfe].
1514
1515         I've found that information on range is kept in field
1516         letter (unsigned char) of Attrib struct. Unfortunately
1517         the letter value internally is for some reasons increased
1518         by 1, so \0xff is represented here as 0.
1519
1520         That's why dlg complains about the range [\0x00-\0xff] in
1521         dlg_p.g:
1522
1523         if ($$.letter > $2.letter) {
1524           error("invalid range  ", zzline);
1525         }
1526
1527     The fix is:
1528
1529         if ($$.letter > $2.letter && 255 != $$2.letter) {
1530           error("invalid range  ", zzline);
1531         }
1532
1533 #206. (Changed in MR16) Free zzFAILtext in ANTLRParser destructor
1534
1535     The ANTLRParser destructor now frees zzFAILtext.
1536
1537     Problem and fix reported by Manfred Kogler (km cast.uni-linz.ac.at).
1538
1539 #205. (Changed in MR16) DLGStringReset argument now const
1540
1541     Changed: void DLGStringReset(DLGChar *s) {...}
1542     To:      void DLGStringReset(const DLGChar *s) {...}
1543
1544     Suggested by Dale Davis (daldavis spectrace.com)
1545
1546 #204. (Changed in MR15a) Change __WATCOM__ to __WATCOMC__ in pcctscfg.h
1547
1548     Reported by Oleg Dashevskii (olegdash my-dejanews.com).
1549
1550 #203. (Changed in MR15) Addition of sorcerer to distribution kit
1551
1552     I have finally caved in to popular demand.  The pccts 1.33mr15
1553     kit will include sorcerer.  The separate sorcerer kit will be
1554     discontinued.
1555
1556 #202. (Changed) in MR15) Organization of MS Dev Studio Projects in Kit
1557
1558     Previously there was one workspace that contained projects for
1559     all three parts of pccts: antlr, dlg, and sorcerer.  Now each
1560     part (and directory) has its own workspace/project and there
1561     is an additional workspace/project to build a library from the
1562     .cpp files in the pccts/h directory.
1563
1564     The library build will create pccts_debug.lib or pccts_release.lib
1565     according to the configuration selected.
1566
1567     If you don't want to build pccts 1.33MR15 you can download a
1568     ready-to-run kit for win32 from http://www.polhode.com/win32.zip.
1569     The ready-to-run for win32 includes executables, a pre-built static
1570     library for the .cpp files in the pccts/h directory, and a  sample
1571     application
1572
1573     You will need to define the environment variable PCCTS to point to
1574     the root of the pccts directory hierarchy.
1575
1576 #201. (Changed in MR15) Several fixes by K.J. Cummings (cummings peritus.com)
1577
1578       Generation of SETJMP rather than SETJMP_H in gen.c.
1579
1580       (Sor B19) Declaration of ref_vars_inits for ref_var_inits in
1581       pccts/sorcerer/sorcerer.h.
1582
1583 #200. (Changed in MR15) Remove operator=() in AToken.h
1584
1585       User reported that WatCom couldn't handle use of
1586       explicit operator =().  Replace with equivalent
1587       using cast operator.
1588
1589 #199. (Changed in MR15) Don't allow use of empty #tokclass
1590
1591       Change antlr.g to disallow empty #tokclass sets.
1592
1593       Reported by Manfred Kogler (km cast.uni-linz.ac.at).
1594
1595 #198. Revised ANSI C grammar due to efforts by Manuel Kessler
1596
1597       Manuel Kessler (mlkessler cip.physik.uni-wuerzburg.de)
1598
1599           Allow trailing ... in function parameter lists.
1600           Add bit fields.
1601           Allow old-style function declarations.
1602           Support cv-qualified pointers.
1603           Better checking of combinations of type specifiers.
1604           Release of memory for local symbols on scope exit.
1605           Allow input file name on command line as well as by redirection.
1606
1607               and other miscellaneous tweaks.
1608
1609       This is not part of the pccts distribution kit. It must be
1610       downloaded separately from:
1611
1612             http://www.polhode.com/ansi_mr15.zip
1613
1614 #197. (Changed in MR14) Resetting the lookahead buffer of the parser
1615
1616       Explanation and fix by Sinan Karasu (sinan.karasu boeing.com)
1617
1618       Consider the code used to prime the lookahead buffer LA(i)
1619       of the parser when init() is called:
1620
1621         void
1622         ANTLRParser::
1623         prime_lookahead()
1624         {
1625             int i;
1626             for(i=1;i<=LLk; i++) consume();
1627             dirty=0;
1628             //lap = 0;      // MR14 - Sinan Karasu (sinan.karusu boeing.com)
1629             //labase = 0;   // MR14
1630             labase=lap;     // MR14
1631         }
1632
1633       When the parser is instantiated, lap=0,labase=0 is set.
1634
1635       The "for" loop runs LLk times. In consume(), lap = lap +1 (mod LLk) is
1636       computed.  Therefore, lap(before the loop) == lap (after the loop).
1637
1638       Now the only problem comes in when one does an init() of the parser
1639       after an Eof has been seen. At that time, lap could be non zero.
1640       Assume it was lap==1. Now we do a prime_lookahead(). If LLk is 2,
1641       then
1642
1643         consume()
1644         {
1645             NLA = inputTokens->getToken()->getType();
1646             dirty--;
1647             lap = (lap+1)&(LLk-1);
1648         }
1649
1650       or expanding NLA,
1651
1652         token_type[lap&(LLk-1)]) = inputTokens->getToken()->getType();
1653         dirty--;
1654         lap = (lap+1)&(LLk-1);
1655
1656       so now we prime locations 1 and 2.  In prime_lookahead it used to set
1657       lap=0 and labase=0.  Now, the next token will be read from location 0,
1658       NOT 1 as it should have been.
1659
1660       This was never caught before, because if a parser is just instantiated,
1661       then lap and labase are 0, the offending assignment lines are
1662       basically no-ops, since the for loop wraps around back to 0.
1663
1664 #196. (Changed in MR14) Problems with "(alpha)? beta" guess
1665
1666     Consider the following syntactic predicate in a grammar
1667     with 2 tokens of lookahead (k=2 or ck=2):
1668
1669         rule  : ( alpha )? beta ;
1670         alpha : S t ;
1671         t     : T U
1672               | T
1673               ;
1674         beta  : S t Z ;
1675
1676     When antlr computes the prediction expression with one token
1677     of lookahead for alts 1 and 2 of rule t it finds an ambiguity.
1678
1679     Because the grammar has a lookahead of 2 it tries to compute
1680     two tokens of lookahead for alts 1 and 2 of t.  Alt 1 clearly
1681     has a lookahead of (T U).  Alt 2 is one token long so antlr
1682     tries to compute the follow set of alt 2, which means finding
1683     the things which can follow rule t in the context of (alpha)?.
1684     This cannot be computed, because alpha is only part of a rule,
1685     and antlr can't tell what part of beta is matched by alpha and
1686     what part remains to be matched.  Thus it impossible for antlr
1687     to  properly determine the follow set of rule t.
1688
1689     Prior to 1.33MR14, the follow of (alpha)? was computed as
1690     FIRST(beta) as a result of the internal representation of
1691     guess blocks.
1692
1693     With MR14 the follow set will be the empty set for that context.
1694
1695     Normally, one expects a rule appearing in a guess block to also
1696     appear elsewhere.  When the follow context for this other use
1697     is "ored" with the empty set, the context from the other use
1698     results, and a reasonable follow context results.  However if
1699     there is *no* other use of the rule, or it is used in a different
1700     manner then the follow context will be inaccurate - it was
1701     inaccurate even before MR14, but it will be inaccurate in a
1702     different way.
1703
1704     For the example given earlier, a reasonable way to rewrite the
1705     grammar:
1706
1707         rule  : ( alpha )? beta
1708         alpha : S t ;
1709         t     : T U
1710               | T
1711               ;
1712         beta  : alpha Z ;
1713
1714     If there are no other uses of the rule appearing in the guess
1715     block it will generate a test for EOF - a workaround for
1716     representing a null set in the lookahead tests.
1717
1718     If you encounter such a problem you can use the -alpha option
1719     to get additional information:
1720
1721     line 2: error: not possible to compute follow set for alpha
1722               in an "(alpha)? beta" block.
1723
1724     With the antlr -alpha command line option the following information
1725     is inserted into the generated file:
1726
1727     #if 0
1728
1729       Trace of references leading to attempt to compute the follow set of
1730       alpha in an "(alpha)? beta" block. It is not possible for antlr to
1731       compute this follow set because it is not known what part of beta has
1732       already been matched by alpha and what part remains to be matched.
1733
1734       Rules which make use of the incorrect follow set will also be incorrect
1735
1736          1 #token T              alpha/2   line 7     brief.g
1737          2 end alpha             alpha/3   line 8     brief.g
1738          2 end (...)? block at   start/1   line 2     brief.g
1739
1740     #endif
1741
1742     At the moment, with the -alpha option selected the program marks
1743     any rules which appear in the trace back chain (above) as rules with
1744     possible problems computing follow set.
1745
1746     Reported by Greg Knapen (gregory.knapen bell.ca).
1747
1748 #195. (Changed in MR14) #line directive not at column 1
1749
1750       Under certain circunstances a predicate test could generate
1751       a #line directive which was not at column 1.
1752
1753       Reported with fix by David Kågedal  (davidk lysator.liu.se)
1754       (http://www.lysator.liu.se/~davidk/).
1755
1756 #194. (Changed in MR14) (C Mode only) Demand lookahead with #tokclass
1757
1758       In C mode with the demand lookahead option there is a bug in the
1759       code which handles matches for #tokclass (zzsetmatch and
1760       zzsetmatch_wsig).
1761
1762       The bug causes the lookahead pointer to get out of synchronization
1763       with the current token pointer.
1764
1765       The problem was reported with a fix by Ger Hobbelt (hobbelt axa.nl).
1766
1767 #193. (Changed in MR14) Use of PCCTS_USE_NAMESPACE_STD
1768
1769       The pcctscfg.h now contains the following definitions:
1770
1771         #ifdef PCCTS_USE_NAMESPACE_STD
1772         #define PCCTS_STDIO_H     <Cstdio>
1773         #define PCCTS_STDLIB_H    <Cstdlib>
1774         #define PCCTS_STDARG_H    <Cstdarg>
1775         #define PCCTS_SETJMP_H    <Csetjmp>
1776         #define PCCTS_STRING_H    <Cstring>
1777         #define PCCTS_ASSERT_H    <Cassert>
1778         #define PCCTS_ISTREAM_H   <istream>
1779         #define PCCTS_IOSTREAM_H  <iostream>
1780         #define PCCTS_NAMESPACE_STD     namespace std {}; using namespace std;
1781         #else
1782         #define PCCTS_STDIO_H     <stdio.h>
1783         #define PCCTS_STDLIB_H    <stdlib.h>
1784         #define PCCTS_STDARG_H    <stdarg.h>
1785         #define PCCTS_SETJMP_H    <setjmp.h>
1786         #define PCCTS_STRING_H    <string.h>
1787         #define PCCTS_ASSERT_H    <assert.h>
1788         #define PCCTS_ISTREAM_H   <istream.h>
1789         #define PCCTS_IOSTREAM_H  <iostream.h>
1790         #define PCCTS_NAMESPACE_STD
1791         #endif
1792
1793       The runtime support in pccts/h uses these pre-processor symbols
1794       consistently.
1795
1796       Also, antlr and dlg have been changed to generate code which uses
1797       these pre-processor symbols rather than having the names of the
1798       #include files hard-coded in the generated code.
1799
1800       This required the addition of "#include pcctscfg.h" to a number of
1801       files in pccts/h.
1802
1803       It appears that this sometimes causes problems for MSVC 5 in
1804       combination with the "automatic" option for pre-compiled headers.
1805       In such cases disable the "automatic" pre-compiled headers option.
1806
1807       Suggested by Hubert Holin (Hubert.Holin Bigfoot.com).
1808
1809 #192. (Changed in MR14) Change setText() to accept "const ANTLRChar *"
1810
1811       Changed ANTLRToken::setText(ANTLRChar *) to setText(const ANTLRChar *).
1812       This allows literal strings to be used to initialize tokens.  Since
1813       the usual token implementation (ANTLRCommonToken)  makes a copy of the
1814       input string, this was an unnecessary limitation.
1815
1816       Suggested by Bob McWhirter (bob netwrench.com).
1817
1818 #191. (Changed in MR14) HP/UX aCC compiler compatibility problem
1819
1820       Needed to explicitly declare zzINF_DEF_TOKEN_BUFFER_SIZE and
1821       zzINF_BUFFER_TOKEN_CHUNK_SIZE as ints in pccts/h/AParser.cpp.
1822
1823       Reported by David Cook (dcook bmc.com).
1824
1825 #190. (Changed in MR14) IBM OS/2 CSet compiler compatibility problem
1826
1827       Name conflict with "_cs" in pccts/h/ATokenBuffer.cpp
1828
1829       Reported by David Cook (dcook bmc.com).
1830
1831 #189. (Changed in MR14) -gxt switch in C mode
1832
1833       The -gxt switch in C mode didn't work because of incorrect
1834       initialization.
1835
1836       Reported by Sinan Karasu (sinan boeing.com).
1837
1838 #188. (Changed in MR14) Added pccts/h/DLG_stream_input.h
1839
1840       This is a DLG stream class based on C++ istreams.
1841
1842       Contributed by Hubert Holin (Hubert.Holin Bigfoot.com).
1843
1844 #187. (Changed in MR14) Rename config.h to pcctscfg.h
1845
1846       The PCCTS configuration file has been renamed from config.h to
1847       pcctscfg.h.  The problem with the original name is that it led
1848       to name collisions when pccts parsers were combined with other
1849       software.
1850
1851       All of the runtime support routines in pccts/h/* have been
1852       changed to use the new name.  Existing software can continue
1853       to use pccts/h/config.h. The contents of pccts/h/config.h is
1854       now just "#include "pcctscfg.h".
1855
1856       I don't have a record of the user who suggested this.
1857
1858 #186. (Changed in MR14) Pre-processor symbol DllExportPCCTS class modifier
1859
1860       Classes in the C++ runtime support routines are now declared:
1861
1862         class DllExportPCCTS className ....
1863
1864       By default, the pre-processor symbol is defined as the empty
1865       string.  This if for use by MSVC++ users to create DLL classes.
1866
1867       Suggested by Manfred Kogler (km cast.uni-linz.ac.at).
1868
1869 #185. (Changed in MR14) Option to not use PCCTS_AST base class for ASTBase
1870
1871       Normally, the ASTBase class is derived from PCCTS_AST which contains
1872       functions useful to Sorcerer.  If these are not necessary then the
1873       user can define the pre-processor symbol "PCCTS_NOT_USING_SOR" which
1874       will cause the ASTBase class to replace references to PCCTS_AST with
1875       references to ASTBase where necessary.
1876
1877       The class ASTDoublyLinkedBase will contain a pure virtual function
1878       shallowCopy() that was formerly defined in class PCCTS_AST.
1879
1880       Suggested by Bob McWhirter (bob netwrench.com).
1881
1882 #184. (Changed in MR14) Grammars with no tokens generate invalid tokens.h
1883
1884       Reported by Hubert Holin (Hubert.Holin bigfoot.com).
1885
1886 #183. (Changed in MR14) -f to specify file with names of grammar files
1887
1888       In DEC/VMS it is difficult to specify very long command lines.
1889       The -f option allows one to place the names of the grammar files
1890       in a data file in order to bypass limitations of the DEC/VMS
1891       command language interpreter.
1892
1893       Addition supplied by Bernard Giroud (b_giroud decus.ch).
1894
1895 #182. (Changed in MR14) Output directory option for DEC/VMS
1896
1897       Fix some problems with the -o option under DEC/VMS.
1898
1899       Fix supplied by Bernard Giroud (b_giroud decus.ch).
1900
1901 #181. (Changed in MR14) Allow chars > 127 in DLGStringInput::nextChar()
1902
1903       Changed DLGStringInput to cast the character using (unsigned char)
1904       so that languages with character codes greater than 127 work
1905       without changes.
1906
1907       Suggested by Manfred Kogler (km cast.uni-linz.ac.at).
1908
1909 #180. (Added in MR14) ANTLRParser::getEofToken()
1910
1911       Added "ANTLRToken ANTLRParser::getEofToken() const" to match the
1912       setEofToken routine.
1913
1914       Requested by Manfred Kogler (km cast.uni-linz.ac.at).
1915
1916 #179. (Fixed in MR14) Memory leak for BufFileInput subclass of DLGInputStream
1917
1918       The BufFileInput class described in Item #142 neglected to release
1919       the allocated buffer when an instance was destroyed.
1920
1921       Reported by Manfred Kogler (km cast.uni-linz.ac.at).
1922
1923 #178. (Fixed in MR14) Bug in "(alpha)? beta" guess blocks first sets
1924
1925       In 1.33 vanilla, and all maintenance releases prior to MR14
1926       there is a bug in the handling of guess blocks which use the
1927       "long" form:
1928
1929                   (alpha)? beta
1930
1931       inside a (...)*, (...)+, or {...} block.
1932
1933       This problem does *not* apply to the case where beta is omitted
1934       or when the syntactic predicate is on the leading edge of an
1935       alternative.
1936
1937       The problem is that both alpha and beta are stored in the
1938       syntax diagram, and that some analysis routines would fail
1939       to skip the alpha portion when it was not on the leading edge.
1940       Consider the following grammar with -ck 2:
1941
1942                 r : ( (A)? B )* C D
1943
1944                   | A B      /* forces -ck 2 computation for old antlr    */
1945                              /*              reports ambig for alts 1 & 2 */
1946
1947                   | B C      /* forces -ck 2 computation for new antlr    */
1948                              /*              reports ambig for alts 1 & 3 */
1949                   ;
1950
1951       The prediction expression for the first alternative should be
1952       LA(1)={B C} LA(2)={B C D}, but previous versions of antlr
1953       would compute the prediction expression as LA(1)={A C} LA(2)={B D}
1954
1955       Reported by Arpad Beszedes (beszedes inf.u-szeged.hu) who provided
1956       a very clear example of the problem and identified the probable cause.
1957
1958 #177. (Changed in MR14) #tokdefs and #token with regular expression
1959
1960       In MR13 the change described by Item #162 caused an existing
1961       feature of antlr to fail.  Prior to the change it was possible
1962       to give regular expression definitions and actions to tokens
1963       which were defined via the #tokdefs directive.
1964
1965       This now works again.
1966
1967       Reported by Manfred Kogler (km cast.uni-linz.ac.at).
1968
1969 #176. (Changed in MR14) Support for #line in antlr source code
1970
1971       Note: this was implemented by Arpad Beszedes (beszedes inf.u-szeged.hu).
1972
1973       In 1.33MR14 it is possible for a pre-processor to generate #line
1974       directives in the antlr source and have those line numbers and file
1975       names used in antlr error messages and in the #line directives
1976       generated by antlr.
1977
1978       The #line directive may appear in the following forms:
1979
1980             #line ll "sss" xx xx ...
1981
1982       where ll represents a line number, "sss" represents the name of a file
1983       enclosed in quotation marks, and xxx are arbitrary integers.
1984
1985       The following form (without "line") is not supported at the moment:
1986
1987             # ll "sss" xx xx ...
1988
1989       The result:
1990
1991         zzline
1992
1993             is replaced with ll from the # or #line directive
1994
1995         FileStr[CurFile]
1996
1997             is updated with the contents of the string (if any)
1998             following the line number
1999
2000       Note
2001       ----
2002       The file-name string following the line number can be a complete
2003       name with a directory-path. Antlr generates the output files from
2004       the input file name (by replacing the extension from the file-name
2005       with .c or .cpp).
2006
2007       If the input file (or the file-name from the line-info) contains
2008       a path:
2009
2010         "../grammar.g"
2011
2012       the generated source code will be placed in "../grammar.cpp" (i.e.
2013       in the parent directory).  This is inconvenient in some cases
2014       (even the -o switch can not be used) so the path information is
2015       removed from the #line directive.  Thus, if the line-info was
2016
2017         #line 2 "../grammar.g"
2018
2019       then the current file-name will become "grammar.g"
2020
2021       In this way, the generated source code according to the grammar file
2022       will always be in the current directory, except when the -o switch
2023       is used.
2024
2025 #175. (Changed in MR14) Bug when guess block appears at start of (...)*
2026
2027       In 1.33 vanilla and all maintenance releases prior to 1.33MR14
2028       there is a bug when a guess block appears at the start of a (...)+.
2029       Consider the following k=1 (ck=1) grammar:
2030
2031             rule :
2032                   ( (STAR)? ZIP )* ID ;
2033
2034       Prior to 1.33MR14, the generated code resembled:
2035
2036         ...
2037         zzGUESS_BLOCK
2038         while ( 1 ) {
2039             if ( ! LA(1)==STAR) break;
2040             zzGUESS
2041             if ( !zzrv ) {
2042                 zzmatch(STAR);
2043                 zzCONSUME;
2044                 zzGUESS_DONE
2045                 zzmatch(ZIP);
2046                 zzCONSUME;
2047             ...
2048
2049       Note that the routine uses STAR for the prediction expression
2050       rather than ZIP.  With 1.33MR14 the generated code resembles:
2051
2052         ...
2053         while ( 1 ) {
2054             if ( ! LA(1)==ZIP) break;
2055         ...
2056
2057       This problem existed only with (...)* blocks and was caused
2058       by the slightly more complicated graph which represents (...)*
2059       blocks.  This caused the analysis routine to compute the first
2060       set for the alpha part of the "(alpha)? beta" rather than the
2061       beta part.
2062
2063       Both (...)+ and {...} blocks handled the guess block correctly.
2064
2065       Reported by Arpad Beszedes (beszedes inf.u-szeged.hu) who provided
2066       a very clear example of the problem and identified the probable cause.
2067
2068 #174. (Changed in MR14) Bug when action precedes syntactic predicate
2069
2070       In 1.33 vanilla, and all maintenance releases prior to MR14,
2071       there was a bug when a syntactic predicate was immediately
2072       preceded by an action.  Consider the following -ck 2 grammar:
2073
2074             rule :
2075                    <<int i;>>
2076                    (alpha)? beta C
2077                  | A B
2078                  ;
2079
2080             alpha : A ;
2081             beta  : A B;
2082
2083       Prior to MR14, the code generated for the first alternative
2084       resembled:
2085
2086         ...
2087         zzGUESS
2088         if ( !zzrv && LA(1)==A && LA(2)==A) {
2089             alpha();
2090             zzGUESS_DONE
2091             beta();
2092             zzmatch(C);
2093             zzCONSUME;
2094         } else {
2095         ...
2096
2097       The prediction expression (i.e. LA(1)==A && LA(2)==A) is clearly
2098       wrong because LA(2) should be matched to B (first[2] of beta is {B}).
2099
2100       With 1.33MR14 the prediction expression is:
2101
2102         ...
2103         if ( !zzrv && LA(1)==A && LA(2)==B) {
2104             alpha();
2105             zzGUESS_DONE
2106             beta();
2107             zzmatch(C);
2108             zzCONSUME;
2109         } else {
2110         ...
2111
2112       This will only affect users in which alpha is shorter than
2113       than max(k,ck) and there is an action immediately preceding
2114       the syntactic predicate.
2115
2116       This problem was reported by reported by Arpad Beszedes
2117       (beszedes inf.u-szeged.hu) who provided a very clear example
2118       of the problem and identified the presence of the init-action
2119       as the likely culprit.
2120
2121 #173. (Changed in MR13a) -glms for Microsoft style filenames with -gl
2122
2123       With the -gl option antlr generates #line directives using the
2124       exact name of the input files specified on the command line.
2125       An oddity of the Microsoft C and C++ compilers is that they
2126       don't accept file names in #line directives containing "\"
2127       even though these are names from the native file system.
2128
2129       With -glms option, the "\" in file names appearing in #line
2130       directives is replaced with a "/" in order to conform to
2131       Microsoft compiler requirements.
2132
2133       Reported by Erwin Achermann (erwin.achermann switzerland.org).
2134
2135 #172. (Changed in MR13) \r\n in antlr source counted as one line
2136
2137       Some MS software uses \r\n to indicate a new line.  Antlr
2138       now recognizes this in counting lines.
2139
2140       Reported by Edward L. Hepler (elh ece.vill.edu).
2141
2142 #171. (Changed in MR13) #tokclass L..U now allowed
2143
2144       The following is now allowed:
2145
2146             #tokclass ABC { A..B C }
2147
2148       Reported by Dave Watola (dwatola amtsun.jpl.nasa.gov)
2149
2150 #170. (Changed in MR13) Suppression for predicates with lookahead depth >1
2151
2152       In MR12 the capability for suppression of predicates with lookahead
2153       depth=1 was introduced.  With MR13 this had been extended to
2154       predicates with lookahead depth > 1 and released for use by users
2155       on an experimental basis.
2156
2157       Consider the following grammar with -ck 2 and the predicate in rule
2158       "a" with depth 2:
2159
2160             r1  : (ab)* "@"
2161                 ;
2162
2163             ab  : a
2164                 | b
2165                 ;
2166
2167             a   : (A B)? => <<p(LATEXT(2))>>? A B C
2168                 ;
2169
2170             b   : A B C
2171                 ;
2172
2173       Normally, the predicate would be hoisted into rule r1 in order to
2174       determine whether to call rule "ab".  However it should *not* be
2175       hoisted because, even if p is false, there is a valid alternative
2176       in rule b.  With "-mrhoistk on" the predicate will be suppressed.
2177
2178       If "-info p" command line option is present the following information
2179       will appear in the generated code:
2180
2181                 while ( (LA(1)==A)
2182         #if 0
2183
2184         Part (or all) of predicate with depth > 1 suppressed by alternative
2185             without predicate
2186
2187         pred  <<  p(LATEXT(2))>>?
2188                   depth=k=2  ("=>" guard)  rule a  line 8  t1.g
2189           tree context:
2190             (root = A
2191                B
2192             )
2193
2194         The token sequence which is suppressed: ( A B )
2195         The sequence of references which generate that sequence of tokens:
2196
2197            1 to ab          r1/1       line 1     t1.g
2198            2 ab             ab/1       line 4     t1.g
2199            3 to b           ab/2       line 5     t1.g
2200            4 b              b/1        line 11    t1.g
2201            5 #token A       b/1        line 11    t1.g
2202            6 #token B       b/1        line 11    t1.g
2203
2204         #endif
2205
2206       A slightly more complicated example:
2207
2208             r1  : (ab)* "@"
2209                 ;
2210
2211             ab  : a
2212                 | b
2213                 ;
2214
2215             a   : (A B)? => <<p(LATEXT(2))>>? (A  B | D E)
2216                 ;
2217
2218             b   : <<q(LATEXT(2))>>? D E
2219                 ;
2220
2221
2222       In this case, the sequence (D E) in rule "a" which lies behind
2223       the guard is used to suppress the predicate with context (D E)
2224       in rule b.
2225
2226                 while ( (LA(1)==A || LA(1)==D)
2227             #if 0
2228
2229             Part (or all) of predicate with depth > 1 suppressed by alternative
2230                 without predicate
2231
2232             pred  <<  q(LATEXT(2))>>?
2233                               depth=k=2  rule b  line 11  t2.g
2234               tree context:
2235                 (root = D
2236                    E
2237                 )
2238
2239             The token sequence which is suppressed: ( D E )
2240             The sequence of references which generate that sequence of tokens:
2241
2242                1 to ab          r1/1       line 1     t2.g
2243                2 ab             ab/1       line 4     t2.g
2244                3 to a           ab/1       line 4     t2.g
2245                4 a              a/1        line 8     t2.g
2246                5 #token D       a/1        line 8     t2.g
2247                6 #token E       a/1        line 8     t2.g
2248
2249             #endif
2250             &&
2251             #if 0
2252
2253             pred  <<  p(LATEXT(2))>>?
2254                               depth=k=2  ("=>" guard)  rule a  line 8  t2.g
2255               tree context:
2256                 (root = A
2257                    B
2258                 )
2259
2260             #endif
2261
2262             (! ( LA(1)==A && LA(2)==B ) || p(LATEXT(2)) )  {
2263                 ab();
2264                 ...
2265
2266 #169. (Changed in MR13) Predicate test optimization for depth=1 predicates
2267
2268       When the MR12 generated a test of a predicate which had depth 1
2269       it would use the depth >1 routines, resulting in correct but
2270       inefficient behavior.  In MR13, a bit test is used.
2271
2272 #168. (Changed in MR13) Token expressions in context guards
2273
2274       The token expressions appearing in context guards such as:
2275
2276             (A B)? => <<test(LT(1))>>?  someRule
2277
2278       are computed during an early phase of antlr processing.  As
2279       a result, prior to MR13, complex expressions such as:
2280
2281             ~B
2282             L..U
2283             ~L..U
2284             TokClassName
2285             ~TokClassName
2286
2287       were not computed properly.  This resulted in incorrect
2288       context being computed for such expressions.
2289
2290       In MR13 these context guards are verified for proper semantics
2291       in the initial phase and then re-evaluated after complex token
2292       expressions have been computed in order to produce the correct
2293       behavior.
2294
2295       Reported by Arpad Beszedes (beszedes inf.u-szeged.hu).
2296
2297 #167. (Changed in MR13) ~L..U
2298
2299       Prior to MR13, the complement of a token range was
2300       not properly computed.
2301
2302 #166. (Changed in MR13) token expression L..U
2303
2304       The token U was represented as an unsigned char, restricting
2305       the use of L..U to cases where U was assigned a token number
2306       less than 256.  This is corrected in MR13.
2307
2308 #165. (Changed in MR13) option -newAST
2309
2310       To create ASTs from an ANTLRTokenPtr antlr usually calls
2311       "new AST(ANTLRTokenPtr)".  This option generates a call
2312       to "newAST(ANTLRTokenPtr)" instead.  This allows a user
2313       to define a parser member function to create an AST object.
2314
2315       Similar changes for ASTBase::tmake and ASTBase::link were not
2316       thought necessary since they do not create AST objects, only
2317       use existing ones.
2318
2319 #164. (Changed in MR13) Unused variable _astp
2320
2321       For many compilations, we have lived with warnings about
2322       the unused variable _astp.  It turns out that this varible
2323       can *never* be used because the code which references it was
2324       commented out.
2325
2326       This investigation was sparked by a note from Erwin Achermann
2327       (erwin.achermann switzerland.org).
2328
2329 #163. (Changed in MR13) Incorrect makefiles for testcpp examples
2330
2331       All the examples in pccts/testcpp/* had incorrect definitions
2332       in the makefiles for the symbol "CCC".  Instead of CCC=CC they
2333       had CC=$(CCC).
2334
2335       There was an additional problem in testcpp/1/test.g due to the
2336       change in ANTLRToken::getText() to a const member function
2337       (Item #137).
2338
2339       Reported by Maurice Mass (maas cuci.nl).
2340
2341 #162. (Changed in MR13) Combining #token with #tokdefs
2342
2343       When it became possible to change the print-name of a
2344       #token (Item #148) it became useful to give a #token
2345       statement whose only purpose was to giving a print name
2346       to the #token.  Prior to this change this could not be
2347       combined with the #tokdefs feature.
2348
2349 #161. (Changed in MR13) Switch -gxt inhibits generation of tokens.h
2350
2351 #160. (Changed in MR13) Omissions in list of names for remap.h
2352
2353       When a user selects the -gp option antlr creates a list
2354       of macros in remap.h to rename some of the standard
2355       antlr routines from zzXXX to userprefixXXX.
2356
2357       There were number of omissions from the remap.h name
2358       list related to the new trace facility.  This was reported,
2359       along with a fix, by Bernie Solomon (bernard ug.eds.com).
2360
2361 #159. (Changed in MR13) Violations of classic C rules
2362
2363       There were a number of violations of classic C style in
2364       the distribution kit.  This was reported, along with fixes,
2365       by Bernie Solomon (bernard ug.eds.com).
2366
2367 #158. (Changed in MR13) #header causes problem for pre-processors
2368
2369       A user who runs the C pre-processor on antlr source suggested
2370       that another syntax be allowed.  With MR13 such directives
2371       such as #header, #pragma, etc. may be written as "\#header",
2372       "\#pragma", etc.  For escaping pre-processor directives inside
2373       a #header use something like the following:
2374
2375             \#header
2376             <<
2377                 \#include <stdio.h>
2378             >>
2379
2380 #157. (Fixed in MR13) empty error sets for rules with infinite recursion
2381
2382       When the first set for a rule cannot be computed due to infinite
2383       left recursion and it is the only alternative for a block then
2384       the error set for the block would be empty.  This would result
2385       in a fatal error.
2386
2387       Reported by Darin Creason (creason genedax.com)
2388
2389 #156. (Changed in MR13) DLGLexerBase::getToken() now public
2390
2391 #155. (Changed in MR13) Context behind predicates can suppress
2392
2393       With -mrhoist enabled the context behind a guarded predicate can
2394       be used to suppress other predicates.  Consider the following grammar:
2395
2396         r0 : (r1)+;
2397
2398         r1  : rp
2399             | rq
2400             ;
2401         rp  : <<p LATEXT(1)>>? B ;
2402         rq : (A)? => <<q LATEXT(1)>>? (A|B);
2403
2404       In earlier versions both predicates "p" and "q" would be hoisted into
2405       rule r0. With MR12c predicate p is suppressed because the context which
2406       follows predicate q includes "B" which can "cover" predicate "p".  In
2407       other words, in trying to decide in r0 whether to call r1, it doesn't
2408       really matter whether p is false or true because, either way, there is
2409       a valid choice within r1.
2410
2411 #154. (Changed in MR13) Making hoist suppression explicit using <<nohoist>>
2412
2413       A common error, even among experienced pccts users, is to code
2414       an init-action to inhibit hoisting rather than a leading action.
2415       An init-action does not inhibit hoisting.
2416
2417       This was coded:
2418
2419         rule1 : <<;>> rule2
2420
2421       This is what was meant:
2422
2423         rule1 : <<;>> <<;>> rule2
2424
2425       With MR13, the user can code:
2426
2427         rule1 : <<;>> <<nohoist>> rule2
2428
2429       The following will give an error message:
2430
2431         rule1 : <<nohoist>> rule2
2432
2433       If the <<nohoist>> appears as an init-action rather than a leading
2434       action an error message is issued.  The meaning of an init-action
2435       containing "nohoist" is unclear: does it apply to just one
2436       alternative or to all alternatives ?
2437
2438
2439
2440
2441
2442
2443
2444
2445         -------------------------------------------------------
2446         Note:  Items #153 to #1 are now in a separate file named
2447                 CHANGES_FROM_133_BEFORE_MR13.txt
2448         -------------------------------------------------------