EdkCompatibilityPkg/Other/Maintained/Tools/Pccts/CHANGES_FROM_133.txt

   1 =======================================================================
   2 List of Implemented Fixes and Changes for Maintenance Releases of PCCTS
   3
   4
   5  For a summary of the most significant changes see CHANGES_SUMMARY.TXT
   6
   7 =======================================================================
   8
   9                                DISCLAIMER
  10
  11  The software and these notes are provided "as is".  They may include
  12  typographical or technical errors and their authors disclaims all
  13  liability of any kind or nature for damages due to error, fault,
  14  defect, or deficiency regardless of cause.  All warranties of any
  15  kind, either express or implied, including, but not limited to, the
  16  implied  warranties of merchantability and fitness for a particular
  17  purpose are disclaimed.
  18
  19
  20         -------------------------------------------------------
  21         Note:  Items #153 to #1 are now in a separate file named
  22                 CHANGES_FROM_133_BEFORE_MR13.txt
  23         -------------------------------------------------------
  24
  25 #261. (Changed in MR19) Defer token fetch for C++ mode
  26
  27     Item #216 has been revised to indicate that use of the defer fetch
  28     option (ZZDEFER_FETCH) requires dlg option -i.
  29
  30 #260. (MR22) Raise default lex buffer size from 8,000 to 32,000 bytes.
  31
  32     ZZLEXBUFSIZE is the size (in bytes) of the buffer used by dlg
  33     generated lexers.  The default value has been raised to 32,000 and
  34     the value used by antlr, dlg, and sorcerer has also been raised to
  35     32,000.
  36
  37 #259. (MR22) Default function arguments in C++ mode.
  38
  39     If a rule is declared:
  40
  41             rr [int i = 0] : ....
  42
  43     then the declaration generated by pccts resembles:
  44
  45             void rr(int i = 0);
  46
  47     however, the definition must omit the default argument:
  48
  49             void rr(int i) {...}
  50
  51     In the past the default value was not omitted.  In MR22
  52     the generated code resembles:
  53
  54             void rr(int i /* = 0 */ ) {...}
  55
  56     Implemented by Volker H. Simonis (simonis@informatik.uni-tuebingen.de)
  57
  58 #258. (MR22)  Using a base class for your parser
  59
  60     In item #102 (MR10) the class statement was extended to allow one
  61     to specify a base class other than ANTLRParser for the generated
  62     parser.  It turned out that this was less than useful because
  63     the constructor still specified ANTLRParser as the base class.
  64
  65     The class statement now uses the first identifier appearing after
  66     the ":" as the name of the base class.  For example:
  67
  68         class MyParser : public FooParser {
  69
  70     Generates in MyParser.h:
  71
  72             class MyParser : public FooParser {
  73
  74     Generates in MyParser.cpp something that resembles:
  75
  76             MyParser::MyParser(ANTLRTokenBuffer *input) :
  77                                          FooParser(input,1,0,0,4)
  78             {
  79                 token_tbl = _token_tbl;
  80                 traceOptionValueDefault=1;              // MR10 turn trace ON
  81             }
  82
  83     The base class must constructor must have a signature similar to
  84     that of ANTLRParser.
  85
  86 #257. (MR21a) Removed dlg statement that -i has no effect in C++ mode.
  87
  88     This was incorrect.
  89
  90 #256. (MR21a) Malformed syntax graph causes crash after error message.
  91
  92     In the past, certain kinds of errors in the very first grammar
  93     element could cause the construction of a malformed graph
  94     representing the grammar.  This would eventually result in a
  95     fatal internal error.  The code has been changed to be more
  96     resistant to this particular error.
  97
  98 #255. (MR21a) ParserBlackBox(FILE* f)
  99
 100     This constructor set openByBlackBox to the wrong value.
 101
 102     Reported by Kees Bakker (kees_bakker@tasking.nl).
 103
 104 #254. (MR21a) Reporting syntax error at end-of-file
 105
 106     When there was a syntax error at the end-of-file the syntax
 107     error routine would substitute "<eof>" for the programmer's
 108     end-of-file symbol.  This substitution is now done only when
 109     the programmer does not define his own end-of-file symbol
 110     or the symbol begins with the character "@".
 111
 112     Reported by Kees Bakker (kees_bakker@tasking.nl).
 113
 114 #253. (MR21) Generation of block preamble (-preamble and -preamble_first)
 115
 116     The antlr option -preamble causes antlr to insert the code
 117     BLOCK_PREAMBLE at the start of each rule and block.  It does
 118     not insert code before rules references, token references, or
 119     actions.  By properly defining the macro BLOCK_PREAMBLE the
 120     user can generate code which is specific to the start of blocks.
 121
 122     The antlr option -preamble_first is similar, but inserts the
 123     code BLOCK_PREAMBLE_FIRST(PreambleFirst_123) where the symbol
 124     PreambleFirst_123 is equivalent to the first set defined by
 125     the #FirstSetSymbol described in Item #248.
 126
 127     I have not investigated how these options interact with guess
 128     mode (syntactic predicates).
 129
 130 #252. (MR21) Check for null pointer in trace routine
 131
 132     When some trace options are used when the parser is generated
 133     without the trace enabled, the current rule name may be a
 134     NULL pointer.  A guard was added to check for this in
 135     restoreState.
 136
 137     Reported by Douglas E. Forester (dougf@projtech.com).
 138
 139 #251. (MR21) Changes to #define zzTRACE_RULES
 140
 141     The macro zzTRACE_RULES was being use to pass information to
 142     AParser.h.  If this preprocessor symbol was not properly
 143     set the first time AParser.h was #included, the declaration
 144     of zzTRACEdata would be omitted (it is used by the -gd option).
 145     Subsequent #includes of AParser.h would be skipped because of
 146     the #ifdef guard, so the declaration of zzTracePrevRuleName would
 147     never be made.  The result was that proper compilation was very
 148     order dependent.
 149
 150     The declaration of zzTRACEdata was made unconditional and the
 151     problem of removing unused declarations will be left to optimizers.
 152
 153     Diagnosed by Douglas E. Forester (dougf@projtech.com).
 154
 155 #250. (MR21) Option for EXPERIMENTAL change to error sets for blocks
 156
 157     The antlr option -mrblkerr turns on an experimental feature
 158     which is supposed to provide more accurate syntax error messages
 159     for k=1, ck=1 grammars.  When used with k>1 or ck>1 grammars the
 160     behavior should be no worse than the current behavior.
 161
 162     There is no problem with the matching of elements or the computation
 163     of prediction expressions in pccts.  The task is only one of listing
 164     the most appropriate tokens in the error message.  The error sets used
 165     in pccts error messages are approximations of the exact error set when
 166     optional elements in (...)* or (...)+ are involved.  While entirely
 167     correct, the error messages are sometimes not 100% accurate.
 168
 169     There is also a minor philosophical issue.  For example, suppose the
 170     grammar expects the token to be an optional A followed by Z, and it
 171     is X.  X, of course, is neither A nor Z, so an error message is appropriate.
 172     Is it appropriate to say "Expected Z" ?  It is correct, it is accurate,
 173     but it is not complete.
 174
 175     When k>1 or ck>1 the problem of providing the exactly correct
 176     list of tokens for the syntax error messages ends up becoming
 177     equivalent to evaluating the prediction expression for the
 178     alternatives twice. However, for k=1 ck=1 grammars the prediction
 179     expression can be computed easily and evaluated cheaply, so I
 180     decided to try implementing it to satisfy a particular application.
 181     This application uses the error set in an interactive command language
 182     to provide prompts which list the alternatives available at that
 183     point in the parser.  The user can then enter additional tokens to
 184     complete the command line.  To do this required more accurate error
 185     sets then previously provided by pccts.
 186
 187     In some cases the default pccts behavior may lead to more robust error
 188     recovery or clearer error messages then having the exact set of tokens.
 189     This is because (a) features like -ge allow the use of symbolic names for
 190     certain sets of tokens, so having extra tokens may simply obscure things
 191     and (b) the error set is use to resynchronize the parser, so a good
 192     choice is sometimes more important than having the exact set.
 193
 194     Consider the following example:
 195
 196             Note:  All examples code has been abbreviated
 197             to the absolute minimum in order to make the
 198             examples concise.
 199
 200         star1 : (A)* Z;
 201
 202     The generated code resembles:
 203
 204            old                new (with -mrblkerr)
 205         -------------         --------------------
 206         for (;;) {            for (;;) {
 207             match(A);           match(A);
 208         }                     }
 209         match(Z);             if (! A and ! Z) then
 210                                 FAIL(...{A,Z}...);
 211                               }
 212                               match(Z);
 213
 214
 215         With input X
 216             old message: Found X, expected Z
 217             new message: Found X, expected A, Z
 218
 219     For the example:
 220
 221         star2 : (A|B)* Z;
 222
 223            old                      new (with -mrblkerr)
 224         -------------               --------------------
 225         for (;;) {                  for (;;) {
 226           if (!A and !B) break;       if (!A and !B) break;
 227           if (...) {                  if (...) {
 228             <same ...>                  <same ...>
 229           }                           }
 230           else {                      else {
 231             FAIL(...{A,B,Z}...)         FAIL(...{A,B}...);
 232           }                           }
 233         }                           }
 234         match(B);                   if (! A and ! B and !Z) then
 235                                         FAIL(...{A,B,Z}...);
 236                                     }
 237                                     match(B);
 238
 239         With input X
 240             old message: Found X, expected Z
 241             new message: Found X, expected A, B, Z
 242         With input A X
 243             old message: Found X, expected Z
 244             new message: Found X, expected A, B, Z
 245
 246             This includes the choice of looping back to the
 247             star block.
 248
 249     The code for plus blocks:
 250
 251         plus1 : (A)+ Z;
 252
 253     The generated code resembles:
 254
 255            old                  new (with -mrblkerr)
 256         -------------           --------------------
 257         do {                    do {
 258           match(A);               match(A);
 259         } while (A)             } while (A)
 260         match(Z);               if (! A and ! Z) then
 261                                   FAIL(...{A,Z}...);
 262                                 }
 263                                 match(Z);
 264
 265         With input A X
 266             old message: Found X, expected Z
 267             new message: Found X, expected A, Z
 268
 269             This includes the choice of looping back to the
 270             plus block.
 271
 272     For the example:
 273
 274         plus2 : (A|B)+ Z;
 275
 276            old                    new (with -mrblkerr)
 277         -------------             --------------------
 278         do {                        do {
 279           if (A) {                    <same>
 280             match(A);                 <same>
 281           } else if (B) {             <same>
 282             match(B);                 <same>
 283           } else {                    <same>
 284             if (cnt > 1) break;       <same>
 285             FAIL(...{A,B,Z}...)         FAIL(...{A,B}...);
 286           }                           }
 287           cnt++;                      <same>
 288         }                           }
 289
 290         match(Z);                   if (! A and ! B and !Z) then
 291                                         FAIL(...{A,B,Z}...);
 292                                     }
 293                                     match(B);
 294
 295         With input X
 296             old message: Found X, expected A, B, Z
 297             new message: Found X, expected A, B
 298         With input A X
 299             old message: Found X, expected Z
 300             new message: Found X, expected A, B, Z
 301
 302             This includes the choice of looping back to the
 303             star block.
 304
 305 #249. (MR21) Changes for DEC/VMS systems
 306
 307     Jean-François Piéronne (jfp@altavista.net) has updated some
 308     VMS related command files and fixed some minor problems related
 309     to building pccts under the DEC/VMS operating system.  For DEC/VMS
 310     users the most important differences are:
 311
 312         a.  Revised makefile.vms
 313         b.  Revised genMMS for genrating VMS style makefiles.
 314
 315 #248. (MR21) Generate symbol for first set of an alternative
 316
 317     pccts can generate a symbol which represents the tokens which may
 318     appear at the start of a block:
 319
 320         rr : #FirstSetSymbol(rr_FirstSet)  ( Foo | Bar ) ;
 321
 322     This will generate the symbol rr_FirstSet of type SetWordType with
 323     elements Foo and Bar set. The bits can be tested using code similar
 324     to the following:
 325
 326         if (set_el(Foo, &rr_FirstSet)) { ...
 327
 328     This can be combined with the C array zztokens[] or the C++ routine
 329     tokenName() to get the print name of the token in the first set.
 330
 331     The size of the set is given by the newly added enum SET_SIZE, a
 332     protected member of the generated parser's class.  The number of
 333     elements in the generated set will not be exactly equal to the
 334     value of SET_SIZE because of synthetic tokens created by #tokclass,
 335     #errclass, the -ge option, and meta-tokens such as epsilon, and
 336     end-of-file.
 337
 338     The #FirstSetSymbol must appear immediately before a block
 339     such as (...)+, (...)*, and {...}, and (...).  It may not appear
 340     immediately before a token, a rule reference, or action.  However
 341     a token or rule reference can be enclosed in a (...) in order to
 342     make the use of #pragma FirstSetSymbol legal.
 343
 344             rr_bad : #FirstSetSymbol(rr_bad_FirstSet) Foo;   //  Illegal
 345
 346             rr_ok :  #FirstSetSymbol(rr_ok_FirstSet) (Foo);  //  Legal
 347
 348     Do not confuse FirstSetSymbol sets with the sets used for testing
 349     lookahead. The sets used for FirstSetSymbol have one element per bit,
 350     so the number of bytes  is approximately the largest token number
 351     divided by 8.  The sets used for testing lookahead store 8 lookahead
 352     sets per byte, so the length of the array is approximately the largest
 353     token number.
 354
 355     If there is demand, a similar routine for follow sets can be added.
 356
 357 #247. (MR21) Misleading error message on syntax error for optional elements.
 358
 359     Prior to MR21, tokens which were optional did not appear in syntax
 360     error messages if the block which immediately followed detected a
 361     syntax error.
 362
 363     Consider the following grammar which accepts Number, Word, and Other:
 364
 365             rr : {Number} Word;
 366
 367     For this rule the code resembles:
 368
 369             if (LA(1) == Number) {
 370                 match(Number);
 371                 consume();
 372             }
 373             match(Word);
 374
 375     Prior to MR21, the error message for input "$ a" would be:
 376
 377             line 1: syntax error at "$" missing Word
 378
 379     With MR21 the message will be:
 380
 381             line 1: syntax error at "$" expecting Word, Number.
 382
 383     The generate code resembles:
 384
 385             if ( (LA(1)==Number) ) {
 386                 zzmatch(Number);
 387                 consume();
 388             }
 389             else {
 390                 if ( (LA(1)==Word) ) {
 391                     /* nothing */
 392                 }
 393                 else {
 394                     FAIL(... message for both Number and Word ...);
 395                 }
 396             }
 397             match(Word);
 398
 399     The code generated for optional blocks in MR21 is slightly longer
 400     than the previous versions, but it should give better error messages.
 401
 402     The code generated for:
 403
 404             { a | b | c }
 405
 406     should now be *identical* to:
 407
 408             ( a | b | c | )
 409
 410     which was not the case prior to MR21.
 411
 412     Reported by Sue Marvin (sue@siara.com).
 413
 414 #246. (Changed in MR21) Use of $(MAKE) for calls to make
 415
 416     Calls to make from the makefiles were replaced with $(MAKE)
 417     because of problems when using gmake.
 418
 419     Reported with fix by Sunil K.Vallamkonda (sunil@siara.com).
 420
 421 #245. (Changed in MR21) Changes to genmk
 422
 423     The following command line options have been added to genmk:
 424
 425         -cfiles ...
 426
 427             To add a user's C or C++ files into makefile automatically.
 428             The list of files must be enclosed in apostrophes.  This
 429             option may be specified multiple times.
 430
 431         -compiler ...
 432
 433             The name of the compiler to use for $(CCC) or $(CC).  The
 434             default in C++ mode is "CC".  The default in C mode is "cc".
 435
 436         -pccts_path ...
 437
 438             The value for $(PCCTS), the pccts directory.  The default
 439             is /usr/local/pccts.
 440
 441     Contributed by Tomasz Babczynski (t.babczynski@ict.pwr.wroc.pl).
 442
 443 #244. (Changed in MR21) Rename variable "not" in antlr.g
 444
 445     When antlr.g is compiled with a C++ compiler, a variable named
 446     "not" causes problems.  Reported by Sinan Karasu
 447     (sinan.karasu@boeing.com).
 448
 449 #243  (Changed in MR21) Replace recursion with iteration in zzfree_ast
 450
 451     Another refinement to zzfree_ast in ast.c to limit recursion.
 452
 453     NAKAJIMA Mutsuki (muc@isr.co.jp).
 454
 455
 456 #242.  (Changed in MR21) LineInfoFormatStr
 457
 458     Added an #ifndef/#endif around LineInfoFormatStr in pcctscfg.h.
 459
 460 #241. (Changed in MR21) Changed macro PURIFY to a no-op
 461
 462                 ***********************
 463                 *** NOT IMPLEMENTED ***
 464                 ***********************
 465
 466         The PURIFY macro was changed to a no-op because it was causing
 467         problems when passing C++ objects.
 468
 469         The old definition:
 470
 471             #define PURIFY(r,s)     memset((char *) &(r),'\\0',(s));
 472
 473         The new definition:
 474
 475             #define PURIFY(r,s)     /* nothing */
 476 #endif
 477
 478 #240. (Changed in MR21) sorcerer/h/sorcerer.h _MATCH and _MATCHRANGE
 479
 480     Added test for NULL token pointer.
 481
 482     Suggested by Peter Keller (keller@ebi.ac.uk)
 483
 484 #239. (Changed in MR21) C++ mode AParser::traceGuessFail
 485
 486     If tracing is turned on when the code has been generated
 487     without trace code, a failed guess generates a trace report
 488     even though there are no other trace reports.  This
 489     make the behavior consistent with other parts of the
 490     trace system.
 491
 492     Reported by David Wigg (wiggjd@sbu.ac.uk).
 493
 494 #238. (Changed in MR21) Namespace version #include files
 495
 496     Changed reference from CStdio to cstdio (and other
 497     #include file names) in the namespace version of pccts.
 498     Should have known better.
 499
 500 #237. (Changed in MR21) ParserBlackBox(FILE*)
 501
 502     In the past, ParserBlackBox would close the FILE in the dtor
 503     even though it was not opened by ParserBlackBox.  The problem
 504     is that there were two constructors, one which accepted a file
 505     name and did an fopen, the other which accepted a FILE and did
 506     not do an fopen.  There is now an extra member variable which
 507     remembers whether ParserBlackBox did the open or not.
 508
 509     Suggested by Mike Percy (mpercy@scires.com).
 510
 511 #236. (Changed in MR21) tmake now reports down pointer problem
 512
 513     When ASTBase::tmake attempts to update the down pointer of
 514     an AST it checks to see if the down pointer is NULL.  If it
 515     is not NULL it does not do the update and returns NULL.
 516     An attempt to update the down pointer is almost always a
 517     result of a user error.  This can lead to difficult to find
 518     problems during tree construction.
 519
 520     With this change, the routine calls a virtual function
 521     reportOverwriteOfDownPointer() which calls panic to
 522     report the problem.  Users who want the old behavior can
 523     redefined the virtual function in their AST class.
 524
 525     Suggested by Sinan Karasu (sinan.karasu@boeing.com)
 526
 527 #235. (Changed in MR21) Made ANTLRParser::resynch() virtual
 528
 529     Suggested by Jerry Evans (jerry@swsl.co.uk).
 530
 531 #234. (Changed in MR21) Implicit int for function return value
 532
 533     ATokenBuffer:bufferSize() did not specify a type for the
 534     return value.
 535
 536     Reported by Hai Vo-Ba (hai@fc.hp.com).
 537
 538 #233. (Changed in MR20) Converted to MSVC 6.0
 539
 540     Due to external circumstances I have had to convert to MSVC 6.0
 541     The MSVC 5.0 project files (.dsw and .dsp) have been retained as
 542     xxx50.dsp and xxx50.dsw.  The MSVC 6.0 files are named xxx60.dsp
 543     and xxx60.dsw (where xxx is the related to the directory/project).
 544
 545 #232. (Changed in MR20) Make setwd bit vectors protected in parser.h
 546
 547     The access for the setwd array in the parser header was not
 548     specified.  As a result, it would depend on the code which
 549     preceded it.  In MR20 it will always have access "protected".
 550
 551     Reported by Piotr Eljasiak (eljasiak@zt.gdansk.tpsa.pl).
 552
 553 #231. (Changed in MR20) Error in token buffer debug code.
 554
 555     When token buffer debugging is selected via the pre-processor
 556     symbol DEBUG_TOKENBUFFER there is an erroneous check in
 557     AParser.cpp:
 558
 559         #ifdef DEBUG_TOKENBUFFER
 560             if (i >= inputTokens->bufferSize() ||
 561                 inputTokens->minTokens() < LLk )     /* MR20 Was "<=" */
 562         ...
 563         #endif
 564
 565     Reported by David Wigg (wiggjd@sbu.ac.uk).
 566
 567 #230. (Changed in MR20) Fixed problem with #define for -gd option
 568
 569     There was an error in setting zzTRACE_RULES for the -gd (trace) option.
 570
 571     Reported by Gary Funck (gary@intrepid.com).
 572
 573 #229. (Changed in MR20) Additional "const" for literals
 574
 575     "const" was added to the token name literal table.
 576     "const" was added to some panic() and similar routine
 577
 578 #228. (Changed in MR20) dlg crashes on "()"
 579
 580     The following token defintion will cause DLG to crash.
 581
 582         #token "()"
 583
 584     When there is a syntax error in a regular expression
 585     many of the dlg routines return a structure which has
 586     null pointers.  When this is accessed by callers it
 587     generates the crash.
 588
 589     I have attempted to fix the more common cases.
 590
 591     Reported by  Mengue Olivier (dolmen@bigfoot.com).
 592
 593 #227. (Changed in MR20) Array overwrite
 594
 595     Steveh Hand (sassth@unx.sas.com) reported a problem which
 596     was traced to a temporary array which was not properly
 597     resized for deeply nested blocks.  This has been fixed.
 598
 599 #226. (Changed in MR20) -pedantic conformance
 600
 601     G. Hobbelt (i_a@mbh.org) and THM made many, many minor
 602     changes to create prototypes for all the functions and
 603     bring antlr, dlg, and sorcerer into conformance with
 604     the gcc -pedantic option.
 605
 606     This may require uses to add pccts/h/pcctscfg.h to some
 607     files or makefiles in order to have __USE_PROTOS defined.
 608
 609 #225  (Changed in MR20) AST stack adjustment in C mode
 610
 611     The fix in #214 for AST stack adjustment in C mode missed
 612     some cases.
 613
 614     Reported with fix by Ger Hobbelt (i_a@mbh.org).
 615
 616 #224  (Changed in MR20) LL(1) and LL(2) with #pragma approx
 617
 618     This may take a record for the oldest, most trival, lexical
 619     error in pccts.  The regular expressions for LL(1) and LL(2)
 620     lacked an escape for the left and right parenthesis.
 621
 622     Reported by Ger Hobbelt (i_a@mbh.org).
 623
 624 #223  (Changed in MR20) Addition of IBM_VISUAL_AGE directory
 625
 626     Build files for antlr, dlg, and sorcerer under IBM Visual Age
 627     have been contributed by Anton Sergeev (ags@mlc.ru).  They have
 628     been placed in the pccts/IBM_VISUAL_AGE directory.
 629
 630 #222  (Changed in MR20) Replace __STDC__ with __USE_PROTOS
 631
 632     Most occurrences of __STDC__ replaced with __USE_PROTOS due to
 633     complaints from several users.
 634
 635 #221  (Changed in MR20) Added #include for DLexerBase.h to PBlackBox.
 636
 637     Added #include for DLexerBase.h to PBlackBox.
 638
 639 #220  (Changed in MR19) strcat arguments reversed in #pred parse
 640
 641     The arguments to strcat are reversed when creating a print
 642     name for a hash table entry for use with #pred feature.
 643
 644     Problem diagnosed and fix reported by Scott Harrington
 645     (seh4@ix.netcom.com).
 646
 647 #219. (Changed in MR19) C Mode routine zzfree_ast
 648
 649     Changes to reduce use of recursion for AST trees with only right
 650     links or only left links in the C mode routine zzfree_ast.
 651
 652     Implemented by SAKAI Kiyotaka (ksakai@isr.co.jp).
 653
 654 #218. (Changed in MR19) Changes to support unsigned char in C mode
 655
 656     Changes to antlr.h and err.h to fix omissions in use of zzchar_t
 657
 658     Implemented by SAKAI Kiyotaka (ksakai@isr.co.jp).
 659
 660 #217. (Changed in MR19) Error message when dlg -i and -CC options selected
 661
 662     *** This change was rescinded by item #257 ***
 663
 664     The parsers generated by pccts in C++ mode are not able to support the
 665     interactive lexer option (except, perhaps, when using the deferred fetch
 666     parser option.(Item #216).
 667
 668     DLG now warns when both -i and -CC are selected.
 669
 670     This warning was suggested by David Venditti (07751870267-0001@t-online.de).
 671
 672 #216. (Changed in MR19) Defer token fetch for C++ mode
 673
 674     Implemented by Volker H. Simonis (simonis@informatik.uni-tuebingen.de)
 675
 676     Normally, pccts keeps the lookahead token buffer completely filled.
 677     This requires max(k,ck) tokens of lookahead.  For some applications
 678     this can cause deadlock problems.  For example, there may be cases
 679     when the parser can't tell when the input has been completely consumed
 680     until the parse is complete, but the parse can't be completed because
 681     the input routines are waiting for additional tokens to fill the
 682     lookahead buffer.
 683
 684     When the ANTLRParser class is built with the pre-processor option
 685     ZZDEFER_FETCH defined, the fetch of new tokens by consume() is deferred
 686     until LA(i) or LT(i) is called.
 687
 688     To test whether this option has been built into the ANTLRParser class
 689     use "isDeferFetchEnabled()".
 690
 691     Using the -gd trace option with the default tracein() and traceout()
 692     routines will defeat the effort to defer the fetch because the
 693     trace routines print out information about the lookahead token at
 694     the start of the rule.
 695
 696     Because the tracein and traceout routines are virtual it is
 697     easy to redefine them in your parser:
 698
 699         class MyParser {
 700         <<
 701             virtual void tracein(ANTLRChar * ruleName)
 702                 { fprintf(stderr,"Entering: %s\n", ruleName); }
 703             virtual void traceout(ANTLRChar * ruleName)
 704                 { fprintf(stderr,"Leaving: %s\n", ruleName); }
 705         >>
 706
 707     The originals for those routines are pccts/h/AParser.cpp
 708
 709     This requires use of the dlg option -i (interactive lexer).
 710
 711     This is experimental.  The interaction with guess mode (syntactic
 712     predicates)is not known.
 713
 714 #215. (Changed in MR19) Addition of reset() to DLGLexerBase
 715
 716     There was no obvious way to reset the lexer for reuse.  The
 717     reset() method now does this.
 718
 719     Suggested by David Venditti (07751870267-0001@t-online.de).
 720
 721 #214. (Changed in MR19)  C mode: Adjust AST stack pointer at exit
 722
 723     In C mode the AST stack pointer needs to be reset if there will
 724     be multiple calls to the ANTLRx macros.
 725
 726     Reported with fix by Paul D. Smith (psmith@baynetworks.com).
 727
 728 #213. (Changed in MR18)  Fatal error with -mrhoistk (k>1 hoisting)
 729
 730     When rearranging code I forgot to un-comment a critical line of
 731     code that handles hoisting of predicates with k>1 lookahead.  This
 732     is now fixed.
 733
 734     Reported by Reinier van den Born (reinier@vnet.ibm.com).
 735
 736 #212. (Changed in MR17)  Mac related changes by Kenji Tanaka
 737
 738     Kenji Tanaka (kentar@osa.att.ne.jp) has made a number of changes for
 739     Macintosh users.
 740
 741     a.  The following Macintosh MPW files aid in installing pccts on Mac:
 742
 743             pccts/MPW_Read_Me
 744
 745             pccts/install68K.mpw
 746             pccts/installPPC.mpw
 747
 748             pccts/antlr/antlr.r
 749             pccts/antlr/antlr68K.make
 750             pccts/antlr/antlrPPC.make
 751
 752             pccts/dlg/dlg.r
 753             pccts/dlg/dlg68K.make
 754             pccts/dlg/dlgPPC.make
 755
 756             pccts/sorcerer/sor.r
 757             pccts/sorcerer/sor68K.make
 758             pccts/sorcerer/sorPPC.make
 759
 760        They completely replace the previous Mac installation files.
 761
 762     b. The most significant is a change in the MAC_FILE_CREATOR symbol
 763        in pcctscfg.h:
 764
 765         old: #define MAC_FILE_CREATOR 'MMCC'   /* Metrowerks C/C++ Text files */
 766         new: #define MAC_FILE_CREATOR 'CWIE'   /* Metrowerks C/C++ Text files */
 767
 768     c.  Added calls to special_fopen_actions() where necessary.
 769
 770 #211. (Changed in MR16a)  C++ style comment in dlg
 771
 772     This has been fixed.
 773
 774 #210. (Changed in MR16a)  Sor accepts \r\n, \r, or \n for end-of-line
 775
 776     A user requested that Sorcerer be changed to accept other forms
 777     of end-of-line.
 778
 779 #209. (Changed in MR16) Name of files changed.
 780
 781         Old:  CHANGES_FROM_1.33
 782         New:  CHANGES_FROM_133.txt
 783
 784         Old:  KNOWN_PROBLEMS
 785         New:  KNOWN_PROBLEMS.txt
 786
 787 #208. (Changed in MR16) Change in use of pccts #include files
 788
 789     There were problems with MS DevStudio when mixing Sorcerer and
 790     PCCTS in the same source file.  The problem is caused by the
 791     redefinition of setjmp in the MS header file setjmp.h.  In
 792     setjmp.h the pre-processor symbol setjmp was redefined to be
 793     _setjmp.  A later effort to execute #include <setjmp.h> resulted
 794     in an effort to #include <_setjmp.h>.  I'm not sure whether this
 795     is a bug or a feature.  In any case, I decided to fix it by
 796     avoiding the use of pre-processor symbols in #include statements
 797     altogether.  This has the added benefit of making pre-compiled
 798     headers work again.
 799
 800     I've replaced statements:
 801
 802         old: #include PCCTS_SETJMP_H
 803         new: #include "pccts_setjmp.h"
 804
 805     Where pccts_setjmp.h contains:
 806
 807             #ifndef __PCCTS_SETJMP_H__
 808             #define __PCCTS_SETJMP_H__
 809
 810             #ifdef PCCTS_USE_NAMESPACE_STD
 811             #include <Csetjmp>
 812             #else
 813             #include <setjmp.h>
 814             #endif
 815
 816             #endif
 817
 818     A similar change has been made for other standard header files
 819     required by pccts and sorcerer: stdlib.h, stdarg.h, stdio.h, etc.
 820
 821     Reported by Jeff Vincent (JVincent@novell.com) and Dale Davis
 822     (DalDavis@spectrace.com).
 823
 824 #207. (Changed in MR16) dlg reports an invalid range for: [\0x00-\0xff]
 825
 826     dlg will report that this is an invalid range.
 827
 828     Diagnosed by Piotr Eljasiak (eljasiak@no-spam.zt.gdansk.tpsa.pl):
 829
 830         I think this problem is not specific to unsigned chars
 831         because dlg reports no error for the range [\0x00-\0xfe].
 832
 833         I've found that information on range is kept in field
 834         letter (unsigned char) of Attrib struct. Unfortunately
 835         the letter value internally is for some reasons increased
 836         by 1, so \0xff is represented here as 0.
 837
 838         That's why dlg complains about the range [\0x00-\0xff] in
 839         dlg_p.g:
 840
 841         if ($$.letter > $2.letter) {
 842           error("invalid range  ", zzline);
 843         }
 844
 845     The fix is:
 846
 847         if ($$.letter > $2.letter && 255 != $$2.letter) {
 848           error("invalid range  ", zzline);
 849         }
 850
 851 #206. (Changed in MR16) Free zzFAILtext in ANTLRParser destructor
 852
 853     The ANTLRParser destructor now frees zzFAILtext.
 854
 855     Problem and fix reported by Manfred Kogler (km@cast.uni-linz.ac.at).
 856
 857 #205. (Changed in MR16) DLGStringReset argument now const
 858
 859     Changed: void DLGStringReset(DLGChar *s) {...}
 860     To:      void DLGStringReset(const DLGChar *s) {...}
 861
 862     Suggested by Dale Davis (daldavis@spectrace.com)
 863
 864 #204. (Changed in MR15a) Change __WATCOM__ to __WATCOMC__ in pcctscfg.h
 865
 866     Reported by Oleg Dashevskii (olegdash@my-dejanews.com).
 867
 868 #203. (Changed in MR15) Addition of sorcerer to distribution kit
 869
 870     I have finally caved in to popular demand.  The pccts 1.33mr15
 871     kit will include sorcerer.  The separate sorcerer kit will be
 872     discontinued.
 873
 874 #202. (Changed) in MR15) Organization of MS Dev Studio Projects in Kit
 875
 876     Previously there was one workspace that contained projects for
 877     all three parts of pccts: antlr, dlg, and sorcerer.  Now each
 878     part (and directory) has its own workspace/project and there
 879     is an additional workspace/project to build a library from the
 880     .cpp files in the pccts/h directory.
 881
 882     The library build will create pccts_debug.lib or pccts_release.lib
 883     according to the configuration selected.
 884
 885     If you don't want to build pccts 1.33MR15 you can download a
 886     ready-to-run kit for win32 from http://www.polhode.com/win32.zip.
 887     The ready-to-run for win32 includes executables, a pre-built static
 888     library for the .cpp files in the pccts/h directory, and a  sample
 889     application
 890
 891     You will need to define the environment variable PCCTS to point to
 892     the root of the pccts directory hierarchy.
 893
 894 #201. (Changed in MR15) Several fixes by K.J. Cummings (cummings@peritus.com)
 895
 896       Generation of SETJMP rather than SETJMP_H in gen.c.
 897
 898       (Sor B19) Declaration of ref_vars_inits for ref_var_inits in
 899       pccts/sorcerer/sorcerer.h.
 900
 901 #200. (Changed in MR15) Remove operator=() in AToken.h
 902
 903       User reported that WatCom couldn't handle use of
 904       explicit operator =().  Replace with equivalent
 905       using cast operator.
 906
 907 #199. (Changed in MR15) Don't allow use of empty #tokclass
 908
 909       Change antlr.g to disallow empty #tokclass sets.
 910
 911       Reported by Manfred Kogler (km@cast.uni-linz.ac.at).
 912
 913 #198. Revised ANSI C grammar due to efforts by Manuel Kessler
 914
 915       Manuel Kessler (mlkessler@cip.physik.uni-wuerzburg.de)
 916
 917           Allow trailing ... in function parameter lists.
 918           Add bit fields.
 919           Allow old-style function declarations.
 920           Support cv-qualified pointers.
 921           Better checking of combinations of type specifiers.
 922           Release of memory for local symbols on scope exit.
 923           Allow input file name on command line as well as by redirection.
 924
 925               and other miscellaneous tweaks.
 926
 927       This is not part of the pccts distribution kit. It must be
 928       downloaded separately from:
 929
 930             http://www.polhode.com/ansi_mr15.zip
 931
 932 #197. (Changed in MR14) Resetting the lookahead buffer of the parser
 933
 934       Explanation and fix by Sinan Karasu (sinan.karasu@boeing.com)
 935
 936       Consider the code used to prime the lookahead buffer LA(i)
 937       of the parser when init() is called:
 938
 939         void
 940         ANTLRParser::
 941         prime_lookahead()
 942         {
 943             int i;
 944             for(i=1;i<=LLk; i++) consume();
 945             dirty=0;
 946             //lap = 0;      // MR14 - Sinan Karasu (sinan.karusu@boeing.com)
 947             //labase = 0;   // MR14
 948             labase=lap;     // MR14
 949         }
 950
 951       When the parser is instantiated, lap=0,labase=0 is set.
 952
 953       The "for" loop runs LLk times. In consume(), lap = lap +1 (mod LLk) is
 954       computed.  Therefore, lap(before the loop) == lap (after the loop).
 955
 956       Now the only problem comes in when one does an init() of the parser
 957       after an Eof has been seen. At that time, lap could be non zero.
 958       Assume it was lap==1. Now we do a prime_lookahead(). If LLk is 2,
 959       then
 960
 961         consume()
 962         {
 963             NLA = inputTokens->getToken()->getType();
 964             dirty--;
 965             lap = (lap+1)&(LLk-1);
 966         }
 967
 968       or expanding NLA,
 969
 970         token_type[lap&(LLk-1)]) = inputTokens->getToken()->getType();
 971         dirty--;
 972         lap = (lap+1)&(LLk-1);
 973
 974       so now we prime locations 1 and 2.  In prime_lookahead it used to set
 975       lap=0 and labase=0.  Now, the next token will be read from location 0,
 976       NOT 1 as it should have been.
 977
 978       This was never caught before, because if a parser is just instantiated,
 979       then lap and labase are 0, the offending assignment lines are
 980       basically no-ops, since the for loop wraps around back to 0.
 981
 982 #196. (Changed in MR14) Problems with "(alpha)? beta" guess
 983
 984     Consider the following syntactic predicate in a grammar
 985     with 2 tokens of lookahead (k=2 or ck=2):
 986
 987         rule  : ( alpha )? beta ;
 988         alpha : S t ;
 989         t     : T U
 990               | T
 991               ;
 992         beta  : S t Z ;
 993
 994     When antlr computes the prediction expression with one token
 995     of lookahead for alts 1 and 2 of rule t it finds an ambiguity.
 996
 997     Because the grammar has a lookahead of 2 it tries to compute
 998     two tokens of lookahead for alts 1 and 2 of t.  Alt 1 clearly
 999     has a lookahead of (T U).  Alt 2 is one token long so antlr
1000     tries to compute the follow set of alt 2, which means finding
1001     the things which can follow rule t in the context of (alpha)?.
1002     This cannot be computed, because alpha is only part of a rule,
1003     and antlr can't tell what part of beta is matched by alpha and
1004     what part remains to be matched.  Thus it impossible for antlr
1005     to  properly determine the follow set of rule t.
1006
1007     Prior to 1.33MR14, the follow of (alpha)? was computed as
1008     FIRST(beta) as a result of the internal representation of
1009     guess blocks.
1010
1011     With MR14 the follow set will be the empty set for that context.
1012
1013     Normally, one expects a rule appearing in a guess block to also
1014     appear elsewhere.  When the follow context for this other use
1015     is "ored" with the empty set, the context from the other use
1016     results, and a reasonable follow context results.  However if
1017     there is *no* other use of the rule, or it is used in a different
1018     manner then the follow context will be inaccurate - it was
1019     inaccurate even before MR14, but it will be inaccurate in a
1020     different way.
1021
1022     For the example given earlier, a reasonable way to rewrite the
1023     grammar:
1024
1025         rule  : ( alpha )? beta
1026         alpha : S t ;
1027         t     : T U
1028               | T
1029               ;
1030         beta  : alpha Z ;
1031
1032     If there are no other uses of the rule appearing in the guess
1033     block it will generate a test for EOF - a workaround for
1034     representing a null set in the lookahead tests.
1035
1036     If you encounter such a problem you can use the -alpha option
1037     to get additional information:
1038
1039     line 2: error: not possible to compute follow set for alpha
1040               in an "(alpha)? beta" block.
1041
1042     With the antlr -alpha command line option the following information
1043     is inserted into the generated file:
1044
1045     #if 0
1046
1047       Trace of references leading to attempt to compute the follow set of
1048       alpha in an "(alpha)? beta" block. It is not possible for antlr to
1049       compute this follow set because it is not known what part of beta has
1050       already been matched by alpha and what part remains to be matched.
1051
1052       Rules which make use of the incorrect follow set will also be incorrect
1053
1054          1 #token T              alpha/2   line 7     brief.g
1055          2 end alpha             alpha/3   line 8     brief.g
1056          2 end (...)? block at   start/1   line 2     brief.g
1057
1058     #endif
1059
1060     At the moment, with the -alpha option selected the program marks
1061     any rules which appear in the trace back chain (above) as rules with
1062     possible problems computing follow set.
1063
1064     Reported by Greg Knapen (gregory.knapen@bell.ca).
1065
1066 #195. (Changed in MR14) #line directive not at column 1
1067
1068       Under certain circunstances a predicate test could generate
1069       a #line directive which was not at column 1.
1070
1071       Reported with fix by David Kågedal  (davidk@lysator.liu.se)
1072       (http://www.lysator.liu.se/~davidk/).
1073
1074 #194. (Changed in MR14) (C Mode only) Demand lookahead with #tokclass
1075
1076       In C mode with the demand lookahead option there is a bug in the
1077       code which handles matches for #tokclass (zzsetmatch and
1078       zzsetmatch_wsig).
1079
1080       The bug causes the lookahead pointer to get out of synchronization
1081       with the current token pointer.
1082
1083       The problem was reported with a fix by Ger Hobbelt (hobbelt@axa.nl).
1084
1085 #193. (Changed in MR14) Use of PCCTS_USE_NAMESPACE_STD
1086
1087       The pcctscfg.h now contains the following definitions:
1088
1089         #ifdef PCCTS_USE_NAMESPACE_STD
1090         #define PCCTS_STDIO_H     <Cstdio>
1091         #define PCCTS_STDLIB_H    <Cstdlib>
1092         #define PCCTS_STDARG_H    <Cstdarg>
1093         #define PCCTS_SETJMP_H    <Csetjmp>
1094         #define PCCTS_STRING_H    <Cstring>
1095         #define PCCTS_ASSERT_H    <Cassert>
1096         #define PCCTS_ISTREAM_H   <istream>
1097         #define PCCTS_IOSTREAM_H  <iostream>
1098         #define PCCTS_NAMESPACE_STD     namespace std {}; using namespace std;
1099         #else
1100         #define PCCTS_STDIO_H     <stdio.h>
1101         #define PCCTS_STDLIB_H    <stdlib.h>
1102         #define PCCTS_STDARG_H    <stdarg.h>
1103         #define PCCTS_SETJMP_H    <setjmp.h>
1104         #define PCCTS_STRING_H    <string.h>
1105         #define PCCTS_ASSERT_H    <assert.h>
1106         #define PCCTS_ISTREAM_H   <istream.h>
1107         #define PCCTS_IOSTREAM_H  <iostream.h>
1108         #define PCCTS_NAMESPACE_STD
1109         #endif
1110
1111       The runtime support in pccts/h uses these pre-processor symbols
1112       consistently.
1113
1114       Also, antlr and dlg have been changed to generate code which uses
1115       these pre-processor symbols rather than having the names of the
1116       #include files hard-coded in the generated code.
1117
1118       This required the addition of "#include pcctscfg.h" to a number of
1119       files in pccts/h.
1120
1121       It appears that this sometimes causes problems for MSVC 5 in
1122       combination with the "automatic" option for pre-compiled headers.
1123       In such cases disable the "automatic" pre-compiled headers option.
1124
1125       Suggested by Hubert Holin (Hubert.Holin@Bigfoot.com).
1126
1127 #192. (Changed in MR14) Change setText() to accept "const ANTLRChar *"
1128
1129       Changed ANTLRToken::setText(ANTLRChar *) to setText(const ANTLRChar *).
1130       This allows literal strings to be used to initialize tokens.  Since
1131       the usual token implementation (ANTLRCommonToken)  makes a copy of the
1132       input string, this was an unnecessary limitation.
1133
1134       Suggested by Bob McWhirter (bob@netwrench.com).
1135
1136 #191. (Changed in MR14) HP/UX aCC compiler compatibility problem
1137
1138       Needed to explicitly declare zzINF_DEF_TOKEN_BUFFER_SIZE and
1139       zzINF_BUFFER_TOKEN_CHUNK_SIZE as ints in pccts/h/AParser.cpp.
1140
1141       Reported by David Cook (dcook@bmc.com).
1142
1143 #190. (Changed in MR14) IBM OS/2 CSet compiler compatibility problem
1144
1145       Name conflict with "_cs" in pccts/h/ATokenBuffer.cpp
1146
1147       Reported by David Cook (dcook@bmc.com).
1148
1149 #189. (Changed in MR14) -gxt switch in C mode
1150
1151       The -gxt switch in C mode didn't work because of incorrect
1152       initialization.
1153
1154       Reported by Sinan Karasu (sinan@boeing.com).
1155
1156 #188. (Changed in MR14) Added pccts/h/DLG_stream_input.h
1157
1158       This is a DLG stream class based on C++ istreams.
1159
1160       Contributed by Hubert Holin (Hubert.Holin@Bigfoot.com).
1161
1162 #187. (Changed in MR14) Rename config.h to pcctscfg.h
1163
1164       The PCCTS configuration file has been renamed from config.h to
1165       pcctscfg.h.  The problem with the original name is that it led
1166       to name collisions when pccts parsers were combined with other
1167       software.
1168
1169       All of the runtime support routines in pccts/h/* have been
1170       changed to use the new name.  Existing software can continue
1171       to use pccts/h/config.h. The contents of pccts/h/config.h is
1172       now just "#include "pcctscfg.h".
1173
1174       I don't have a record of the user who suggested this.
1175
1176 #186. (Changed in MR14) Pre-processor symbol DllExportPCCTS class modifier
1177
1178       Classes in the C++ runtime support routines are now declared:
1179
1180         class DllExportPCCTS className ....
1181
1182       By default, the pre-processor symbol is defined as the empty
1183       string.  This if for use by MSVC++ users to create DLL classes.
1184
1185       Suggested by Manfred Kogler (km@cast.uni-linz.ac.at).
1186
1187 #185. (Changed in MR14) Option to not use PCCTS_AST base class for ASTBase
1188
1189       Normally, the ASTBase class is derived from PCCTS_AST which contains
1190       functions useful to Sorcerer.  If these are not necessary then the
1191       user can define the pre-processor symbol "PCCTS_NOT_USING_SOR" which
1192       will cause the ASTBase class to replace references to PCCTS_AST with
1193       references to ASTBase where necessary.
1194
1195       The class ASTDoublyLinkedBase will contain a pure virtual function
1196       shallowCopy() that was formerly defined in class PCCTS_AST.
1197
1198       Suggested by Bob McWhirter (bob@netwrench.com).
1199
1200 #184. (Changed in MR14) Grammars with no tokens generate invalid tokens.h
1201
1202       Reported by Hubert Holin (Hubert.Holin@bigfoot.com).
1203
1204 #183. (Changed in MR14) -f to specify file with names of grammar files
1205
1206       In DEC/VMS it is difficult to specify very long command lines.
1207       The -f option allows one to place the names of the grammar files
1208       in a data file in order to bypass limitations of the DEC/VMS
1209       command language interpreter.
1210
1211       Addition supplied by Bernard Giroud (b_giroud@decus.ch).
1212
1213 #182. (Changed in MR14) Output directory option for DEC/VMS
1214
1215       Fix some problems with the -o option under DEC/VMS.
1216
1217       Fix supplied by Bernard Giroud (b_giroud@decus.ch).
1218
1219 #181. (Changed in MR14) Allow chars > 127 in DLGStringInput::nextChar()
1220
1221       Changed DLGStringInput to cast the character using (unsigned char)
1222       so that languages with character codes greater than 127 work
1223       without changes.
1224
1225       Suggested by Manfred Kogler (km@cast.uni-linz.ac.at).
1226
1227 #180. (Added in MR14) ANTLRParser::getEofToken()
1228
1229       Added "ANTLRToken ANTLRParser::getEofToken() const" to match the
1230       setEofToken routine.
1231
1232       Requested by Manfred Kogler (km@cast.uni-linz.ac.at).
1233
1234 #179. (Fixed in MR14) Memory leak for BufFileInput subclass of DLGInputStream
1235
1236       The BufFileInput class described in Item #142 neglected to release
1237       the allocated buffer when an instance was destroyed.
1238
1239       Reported by Manfred Kogler (km@cast.uni-linz.ac.at).
1240
1241 #178. (Fixed in MR14) Bug in "(alpha)? beta" guess blocks first sets
1242
1243       In 1.33 vanilla, and all maintenance releases prior to MR14
1244       there is a bug in the handling of guess blocks which use the
1245       "long" form:
1246
1247                   (alpha)? beta
1248
1249       inside a (...)*, (...)+, or {...} block.
1250
1251       This problem does *not* apply to the case where beta is omitted
1252       or when the syntactic predicate is on the leading edge of an
1253       alternative.
1254
1255       The problem is that both alpha and beta are stored in the
1256       syntax diagram, and that some analysis routines would fail
1257       to skip the alpha portion when it was not on the leading edge.
1258       Consider the following grammar with -ck 2:
1259
1260                 r : ( (A)? B )* C D
1261
1262                   | A B      /* forces -ck 2 computation for old antlr    */
1263                              /*              reports ambig for alts 1 & 2 */
1264
1265                   | B C      /* forces -ck 2 computation for new antlr    */
1266                              /*              reports ambig for alts 1 & 3 */
1267                   ;
1268
1269       The prediction expression for the first alternative should be
1270       LA(1)={B C} LA(2)={B C D}, but previous versions of antlr
1271       would compute the prediction expression as LA(1)={A C} LA(2)={B D}
1272
1273       Reported by Arpad Beszedes (beszedes@inf.u-szeged.hu) who provided
1274       a very clear example of the problem and identified the probable cause.
1275
1276 #177. (Changed in MR14) #tokdefs and #token with regular expression
1277
1278       In MR13 the change described by Item #162 caused an existing
1279       feature of antlr to fail.  Prior to the change it was possible
1280       to give regular expression definitions and actions to tokens
1281       which were defined via the #tokdefs directive.
1282
1283       This now works again.
1284
1285       Reported by Manfred Kogler (km@cast.uni-linz.ac.at).
1286
1287 #176. (Changed in MR14) Support for #line in antlr source code
1288
1289       Note: this was implemented by Arpad Beszedes (beszedes@inf.u-szeged.hu).
1290
1291       In 1.33MR14 it is possible for a pre-processor to generate #line
1292       directives in the antlr source and have those line numbers and file
1293       names used in antlr error messages and in the #line directives
1294       generated by antlr.
1295
1296       The #line directive may appear in the following forms:
1297
1298             #line ll "sss" xx xx ...
1299
1300       where ll represents a line number, "sss" represents the name of a file
1301       enclosed in quotation marks, and xxx are arbitrary integers.
1302
1303       The following form (without "line") is not supported at the moment:
1304
1305             # ll "sss" xx xx ...
1306
1307       The result:
1308
1309         zzline
1310
1311             is replaced with ll from the # or #line directive
1312
1313         FileStr[CurFile]
1314
1315             is updated with the contents of the string (if any)
1316             following the line number
1317
1318       Note
1319       ----
1320       The file-name string following the line number can be a complete
1321       name with a directory-path. Antlr generates the output files from
1322       the input file name (by replacing the extension from the file-name
1323       with .c or .cpp).
1324
1325       If the input file (or the file-name from the line-info) contains
1326       a path:
1327
1328         "../grammar.g"
1329
1330       the generated source code will be placed in "../grammar.cpp" (i.e.
1331       in the parent directory).  This is inconvenient in some cases
1332       (even the -o switch can not be used) so the path information is
1333       removed from the #line directive.  Thus, if the line-info was
1334
1335         #line 2 "../grammar.g"
1336
1337       then the current file-name will become "grammar.g"
1338
1339       In this way, the generated source code according to the grammar file
1340       will always be in the current directory, except when the -o switch
1341       is used.
1342
1343 #175. (Changed in MR14) Bug when guess block appears at start of (...)*
1344
1345       In 1.33 vanilla and all maintenance releases prior to 1.33MR14
1346       there is a bug when a guess block appears at the start of a (...)+.
1347       Consider the following k=1 (ck=1) grammar:
1348
1349             rule :
1350                   ( (STAR)? ZIP )* ID ;
1351
1352       Prior to 1.33MR14, the generated code resembled:
1353
1354         ...
1355         zzGUESS_BLOCK
1356         while ( 1 ) {
1357             if ( ! LA(1)==STAR) break;
1358             zzGUESS
1359             if ( !zzrv ) {
1360                 zzmatch(STAR);
1361                 zzCONSUME;
1362                 zzGUESS_DONE
1363                 zzmatch(ZIP);
1364                 zzCONSUME;
1365             ...
1366
1367       Note that the routine uses STAR for the prediction expression
1368       rather than ZIP.  With 1.33MR14 the generated code resembles:
1369
1370         ...
1371         while ( 1 ) {
1372             if ( ! LA(1)==ZIP) break;
1373         ...
1374
1375       This problem existed only with (...)* blocks and was caused
1376       by the slightly more complicate graph which represents (...)*
1377       blocks.  This caused the analysis routine to compute the first
1378       set for the alpha part of the "(alpha)? beta" rather than the
1379       beta part.
1380
1381       Both (...)+ and {...} blocks handled the guess block correctly.
1382
1383       Reported by Arpad Beszedes (beszedes@inf.u-szeged.hu) who provided
1384       a very clear example of the problem and identified the probable cause.
1385
1386 #174. (Changed in MR14) Bug when action precedes syntactic predicate
1387
1388       In 1.33 vanilla, and all maintenance releases prior to MR14,
1389       there was a bug when a syntactic predicate was immediately
1390       preceded by an action.  Consider the following -ck 2 grammar:
1391
1392             rule :
1393                    <<int i;>>
1394                    (alpha)? beta C
1395                  | A B
1396                  ;
1397
1398             alpha : A ;
1399             beta  : A B;
1400
1401       Prior to MR14, the code generated for the first alternative
1402       resembled:
1403
1404         ...
1405         zzGUESS
1406         if ( !zzrv && LA(1)==A && LA(2)==A) {
1407             alpha();
1408             zzGUESS_DONE
1409             beta();
1410             zzmatch(C);
1411             zzCONSUME;
1412         } else {
1413         ...
1414
1415       The prediction expression (i.e. LA(1)==A && LA(2)==A) is clearly
1416       wrong because LA(2) should be matched to B (first[2] of beta is {B}).
1417
1418       With 1.33MR14 the prediction expression is:
1419
1420         ...
1421         if ( !zzrv && LA(1)==A && LA(2)==B) {
1422             alpha();
1423             zzGUESS_DONE
1424             beta();
1425             zzmatch(C);
1426             zzCONSUME;
1427         } else {
1428         ...
1429
1430       This will only affect users in which alpha is shorter than
1431       than max(k,ck) and there is an action immediately preceding
1432       the syntactic predicate.
1433
1434       This problem was reported by reported by Arpad Beszedes
1435       (beszedes@inf.u-szeged.hu) who provided a very clear example
1436       of the problem and identified the presence of the init-action
1437       as the likely culprit.
1438
1439 #173. (Changed in MR13a) -glms for Microsoft style filenames with -gl
1440
1441       With the -gl option antlr generates #line directives using the
1442       exact name of the input files specified on the command line.
1443       An oddity of the Microsoft C and C++ compilers is that they
1444       don't accept file names in #line directives containing "\"
1445       even though these are names from the native file system.
1446
1447       With -glms option, the "\" in file names appearing in #line
1448       directives is replaced with a "/" in order to conform to
1449       Microsoft compiler requirements.
1450
1451       Reported by Erwin Achermann (erwin.achermann@switzerland.org).
1452
1453 #172. (Changed in MR13) \r\n in antlr source counted as one line
1454
1455       Some MS software uses \r\n to indicate a new line.  Antlr
1456       now recognizes this in counting lines.
1457
1458       Reported by Edward L. Hepler (elh@ece.vill.edu).
1459
1460 #171. (Changed in MR13) #tokclass L..U now allowed
1461
1462       The following is now allowed:
1463
1464             #tokclass ABC { A..B C }
1465
1466       Reported by Dave Watola (dwatola@amtsun.jpl.nasa.gov)
1467
1468 #170. (Changed in MR13) Suppression for predicates with lookahead depth >1
1469
1470       In MR12 the capability for suppression of predicates with lookahead
1471       depth=1 was introduced.  With MR13 this had been extended to
1472       predicates with lookahead depth > 1 and released for use by users
1473       on an experimental basis.
1474
1475       Consider the following grammar with -ck 2 and the predicate in rule
1476       "a" with depth 2:
1477
1478             r1  : (ab)* "@"
1479                 ;
1480
1481             ab  : a
1482                 | b
1483                 ;
1484
1485             a   : (A B)? => <<p(LATEXT(2))>>? A B C
1486                 ;
1487
1488             b   : A B C
1489                 ;
1490
1491       Normally, the predicate would be hoisted into rule r1 in order to
1492       determine whether to call rule "ab".  However it should *not* be
1493       hoisted because, even if p is false, there is a valid alternative
1494       in rule b.  With "-mrhoistk on" the predicate will be suppressed.
1495
1496       If "-info p" command line option is present the following information
1497       will appear in the generated code:
1498
1499                 while ( (LA(1)==A)
1500         #if 0
1501
1502         Part (or all) of predicate with depth > 1 suppressed by alternative
1503             without predicate
1504
1505         pred  <<  p(LATEXT(2))>>?
1506                   depth=k=2  ("=>" guard)  rule a  line 8  t1.g
1507           tree context:
1508             (root = A
1509                B
1510             )
1511
1512         The token sequence which is suppressed: ( A B )
1513         The sequence of references which generate that sequence of tokens:
1514
1515            1 to ab          r1/1       line 1     t1.g
1516            2 ab             ab/1       line 4     t1.g
1517            3 to b           ab/2       line 5     t1.g
1518            4 b              b/1        line 11    t1.g
1519            5 #token A       b/1        line 11    t1.g
1520            6 #token B       b/1        line 11    t1.g
1521
1522         #endif
1523
1524       A slightly more complicated example:
1525
1526             r1  : (ab)* "@"
1527                 ;
1528
1529             ab  : a
1530                 | b
1531                 ;
1532
1533             a   : (A B)? => <<p(LATEXT(2))>>? (A  B | D E)
1534                 ;
1535
1536             b   : <<q(LATEXT(2))>>? D E
1537                 ;
1538
1539
1540       In this case, the sequence (D E) in rule "a" which lies behind
1541       the guard is used to suppress the predicate with context (D E)
1542       in rule b.
1543
1544                 while ( (LA(1)==A || LA(1)==D)
1545             #if 0
1546
1547             Part (or all) of predicate with depth > 1 suppressed by alternative
1548                 without predicate
1549
1550             pred  <<  q(LATEXT(2))>>?
1551                               depth=k=2  rule b  line 11  t2.g
1552               tree context:
1553                 (root = D
1554                    E
1555                 )
1556
1557             The token sequence which is suppressed: ( D E )
1558             The sequence of references which generate that sequence of tokens:
1559
1560                1 to ab          r1/1       line 1     t2.g
1561                2 ab             ab/1       line 4     t2.g
1562                3 to a           ab/1       line 4     t2.g
1563                4 a              a/1        line 8     t2.g
1564                5 #token D       a/1        line 8     t2.g
1565                6 #token E       a/1        line 8     t2.g
1566
1567             #endif
1568             &&
1569             #if 0
1570
1571             pred  <<  p(LATEXT(2))>>?
1572                               depth=k=2  ("=>" guard)  rule a  line 8  t2.g
1573               tree context:
1574                 (root = A
1575                    B
1576                 )
1577
1578             #endif
1579
1580             (! ( LA(1)==A && LA(2)==B ) || p(LATEXT(2)) )  {
1581                 ab();
1582                 ...
1583
1584 #169. (Changed in MR13) Predicate test optimization for depth=1 predicates
1585
1586       When the MR12 generated a test of a predicate which had depth 1
1587       it would use the depth >1 routines, resulting in correct but
1588       inefficient behavior.  In MR13, a bit test is used.
1589
1590 #168. (Changed in MR13) Token expressions in context guards
1591
1592       The token expressions appearing in context guards such as:
1593
1594             (A B)? => <<test(LT(1))>>?  someRule
1595
1596       are computed during an early phase of antlr processing.  As
1597       a result, prior to MR13, complex expressions such as:
1598
1599             ~B
1600             L..U
1601             ~L..U
1602             TokClassName
1603             ~TokClassName
1604
1605       were not computed properly.  This resulted in incorrect
1606       context being computed for such expressions.
1607
1608       In MR13 these context guards are verified for proper semantics
1609       in the initial phase and then re-evaluated after complex token
1610       expressions have been computed in order to produce the correct
1611       behavior.
1612
1613       Reported by Arpad Beszedes (beszedes@inf.u-szeged.hu).
1614
1615 #167. (Changed in MR13) ~L..U
1616
1617       Prior to MR13, the complement of a token range was
1618       not properly computed.
1619
1620 #166. (Changed in MR13) token expression L..U
1621
1622       The token U was represented as an unsigned char, restricting
1623       the use of L..U to cases where U was assigned a token number
1624       less than 256.  This is corrected in MR13.
1625
1626 #165. (Changed in MR13) option -newAST
1627
1628       To create ASTs from an ANTLRTokenPtr antlr usually calls
1629       "new AST(ANTLRTokenPtr)".  This option generates a call
1630       to "newAST(ANTLRTokenPtr)" instead.  This allows a user
1631       to define a parser member function to create an AST object.
1632
1633       Similar changes for ASTBase::tmake and ASTBase::link were not
1634       thought necessary since they do not create AST objects, only
1635       use existing ones.
1636
1637 #164. (Changed in MR13) Unused variable _astp
1638
1639       For many compilations, we have lived with warnings about
1640       the unused variable _astp.  It turns out that this varible
1641       can *never* be used because the code which references it was
1642       commented out.
1643
1644       This investigation was sparked by a note from Erwin Achermann
1645       (erwin.achermann@switzerland.org).
1646
1647 #163. (Changed in MR13) Incorrect makefiles for testcpp examples
1648
1649       All the examples in pccts/testcpp/* had incorrect definitions
1650       in the makefiles for the symbol "CCC".  Instead of CCC=CC they
1651       had CC=$(CCC).
1652
1653       There was an additional problem in testcpp/1/test.g due to the
1654       change in ANTLRToken::getText() to a const member function
1655       (Item #137).
1656
1657       Reported by Maurice Mass (maas@cuci.nl).
1658
1659 #162. (Changed in MR13) Combining #token with #tokdefs
1660
1661       When it became possible to change the print-name of a
1662       #token (Item #148) it became useful to give a #token
1663       statement whose only purpose was to giving a print name
1664       to the #token.  Prior to this change this could not be
1665       combined with the #tokdefs feature.
1666
1667 #161. (Changed in MR13) Switch -gxt inhibits generation of tokens.h
1668
1669 #160. (Changed in MR13) Omissions in list of names for remap.h
1670
1671       When a user selects the -gp option antlr creates a list
1672       of macros in remap.h to rename some of the standard
1673       antlr routines from zzXXX to userprefixXXX.
1674
1675       There were number of omissions from the remap.h name
1676       list related to the new trace facility.  This was reported,
1677       along with a fix, by Bernie Solomon (bernard@ug.eds.com).
1678
1679 #159. (Changed in MR13) Violations of classic C rules
1680
1681       There were a number of violations of classic C style in
1682       the distribution kit.  This was reported, along with fixes,
1683       by Bernie Solomon (bernard@ug.eds.com).
1684
1685 #158. (Changed in MR13) #header causes problem for pre-processors
1686
1687       A user who runs the C pre-processor on antlr source suggested
1688       that another syntax be allowed.  With MR13 such directives
1689       such as #header, #pragma, etc. may be written as "\#header",
1690       "\#pragma", etc.  For escaping pre-processor directives inside
1691       a #header use something like the following:
1692
1693             \#header
1694             <<
1695                 \#include <stdio.h>
1696             >>
1697
1698 #157. (Fixed in MR13) empty error sets for rules with infinite recursion
1699
1700       When the first set for a rule cannot be computed due to infinite
1701       left recursion and it is the only alternative for a block then
1702       the error set for the block would be empty.  This would result
1703       in a fatal error.
1704
1705       Reported by Darin Creason (creason@genedax.com)
1706
1707 #156. (Changed in MR13) DLGLexerBase::getToken() now public
1708
1709 #155. (Changed in MR13) Context behind predicates can suppress
1710
1711       With -mrhoist enabled the context behind a guarded predicate can
1712       be used to suppress other predicates.  Consider the following grammar:
1713
1714         r0 : (r1)+;
1715
1716         r1  : rp
1717             | rq
1718             ;
1719         rp  : <<p LATEXT(1)>>? B ;
1720         rq : (A)? => <<q LATEXT(1)>>? (A|B);
1721
1722       In earlier versions both predicates "p" and "q" would be hoisted into
1723       rule r0. With MR12c predicate p is suppressed because the context which
1724       follows predicate q includes "B" which can "cover" predicate "p".  In
1725       other words, in trying to decide in r0 whether to call r1, it doesn't
1726       really matter whether p is false or true because, either way, there is
1727       a valid choice within r1.
1728
1729 #154. (Changed in MR13) Making hoist suppression explicit using <<nohoist>>
1730
1731       A common error, even among experienced pccts users, is to code
1732       an init-action to inhibit hoisting rather than a leading action.
1733       An init-action does not inhibit hoisting.
1734
1735       This was coded:
1736
1737         rule1 : <<;>> rule2
1738
1739       This is what was meant:
1740
1741         rule1 : <<;>> <<;>> rule2
1742
1743       With MR13, the user can code:
1744
1745         rule1 : <<;>> <<nohoist>> rule2
1746
1747       The following will give an error message:
1748
1749         rule1 : <<nohoist>> rule2
1750
1751       If the <<nohoist>> appears as an init-action rather than a leading
1752       action an error message is issued.  The meaning of an init-action
1753       containing "nohoist" is unclear: does it apply to just one
1754       alternative or to all alternatives ?
1755
1756
1757
1758
1759
1760
1761
1762
1763         -------------------------------------------------------
1764         Note:  Items #153 to #1 are now in a separate file named
1765                 CHANGES_FROM_133_BEFORE_MR13.txt
1766         -------------------------------------------------------