]> git.proxmox.com Git - mirror_edk2.git/blob - EdkCompatibilityPkg/Other/Maintained/Tools/Pccts/CHANGES_FROM_133.txt
Add in the 1st version of ECP.
[mirror_edk2.git] / EdkCompatibilityPkg / Other / Maintained / Tools / Pccts / CHANGES_FROM_133.txt
1 =======================================================================
2 List of Implemented Fixes and Changes for Maintenance Releases of PCCTS
3
4
5 For a summary of the most significant changes see CHANGES_SUMMARY.TXT
6
7 =======================================================================
8
9 DISCLAIMER
10
11 The software and these notes are provided "as is". They may include
12 typographical or technical errors and their authors disclaims all
13 liability of any kind or nature for damages due to error, fault,
14 defect, or deficiency regardless of cause. All warranties of any
15 kind, either express or implied, including, but not limited to, the
16 implied warranties of merchantability and fitness for a particular
17 purpose are disclaimed.
18
19
20 -------------------------------------------------------
21 Note: Items #153 to #1 are now in a separate file named
22 CHANGES_FROM_133_BEFORE_MR13.txt
23 -------------------------------------------------------
24
25 #261. (Changed in MR19) Defer token fetch for C++ mode
26
27 Item #216 has been revised to indicate that use of the defer fetch
28 option (ZZDEFER_FETCH) requires dlg option -i.
29
30 #260. (MR22) Raise default lex buffer size from 8,000 to 32,000 bytes.
31
32 ZZLEXBUFSIZE is the size (in bytes) of the buffer used by dlg
33 generated lexers. The default value has been raised to 32,000 and
34 the value used by antlr, dlg, and sorcerer has also been raised to
35 32,000.
36
37 #259. (MR22) Default function arguments in C++ mode.
38
39 If a rule is declared:
40
41 rr [int i = 0] : ....
42
43 then the declaration generated by pccts resembles:
44
45 void rr(int i = 0);
46
47 however, the definition must omit the default argument:
48
49 void rr(int i) {...}
50
51 In the past the default value was not omitted. In MR22
52 the generated code resembles:
53
54 void rr(int i /* = 0 */ ) {...}
55
56 Implemented by Volker H. Simonis (simonis@informatik.uni-tuebingen.de)
57
58 #258. (MR22) Using a base class for your parser
59
60 In item #102 (MR10) the class statement was extended to allow one
61 to specify a base class other than ANTLRParser for the generated
62 parser. It turned out that this was less than useful because
63 the constructor still specified ANTLRParser as the base class.
64
65 The class statement now uses the first identifier appearing after
66 the ":" as the name of the base class. For example:
67
68 class MyParser : public FooParser {
69
70 Generates in MyParser.h:
71
72 class MyParser : public FooParser {
73
74 Generates in MyParser.cpp something that resembles:
75
76 MyParser::MyParser(ANTLRTokenBuffer *input) :
77 FooParser(input,1,0,0,4)
78 {
79 token_tbl = _token_tbl;
80 traceOptionValueDefault=1; // MR10 turn trace ON
81 }
82
83 The base class must constructor must have a signature similar to
84 that of ANTLRParser.
85
86 #257. (MR21a) Removed dlg statement that -i has no effect in C++ mode.
87
88 This was incorrect.
89
90 #256. (MR21a) Malformed syntax graph causes crash after error message.
91
92 In the past, certain kinds of errors in the very first grammar
93 element could cause the construction of a malformed graph
94 representing the grammar. This would eventually result in a
95 fatal internal error. The code has been changed to be more
96 resistant to this particular error.
97
98 #255. (MR21a) ParserBlackBox(FILE* f)
99
100 This constructor set openByBlackBox to the wrong value.
101
102 Reported by Kees Bakker (kees_bakker@tasking.nl).
103
104 #254. (MR21a) Reporting syntax error at end-of-file
105
106 When there was a syntax error at the end-of-file the syntax
107 error routine would substitute "<eof>" for the programmer's
108 end-of-file symbol. This substitution is now done only when
109 the programmer does not define his own end-of-file symbol
110 or the symbol begins with the character "@".
111
112 Reported by Kees Bakker (kees_bakker@tasking.nl).
113
114 #253. (MR21) Generation of block preamble (-preamble and -preamble_first)
115
116 The antlr option -preamble causes antlr to insert the code
117 BLOCK_PREAMBLE at the start of each rule and block. It does
118 not insert code before rules references, token references, or
119 actions. By properly defining the macro BLOCK_PREAMBLE the
120 user can generate code which is specific to the start of blocks.
121
122 The antlr option -preamble_first is similar, but inserts the
123 code BLOCK_PREAMBLE_FIRST(PreambleFirst_123) where the symbol
124 PreambleFirst_123 is equivalent to the first set defined by
125 the #FirstSetSymbol described in Item #248.
126
127 I have not investigated how these options interact with guess
128 mode (syntactic predicates).
129
130 #252. (MR21) Check for null pointer in trace routine
131
132 When some trace options are used when the parser is generated
133 without the trace enabled, the current rule name may be a
134 NULL pointer. A guard was added to check for this in
135 restoreState.
136
137 Reported by Douglas E. Forester (dougf@projtech.com).
138
139 #251. (MR21) Changes to #define zzTRACE_RULES
140
141 The macro zzTRACE_RULES was being use to pass information to
142 AParser.h. If this preprocessor symbol was not properly
143 set the first time AParser.h was #included, the declaration
144 of zzTRACEdata would be omitted (it is used by the -gd option).
145 Subsequent #includes of AParser.h would be skipped because of
146 the #ifdef guard, so the declaration of zzTracePrevRuleName would
147 never be made. The result was that proper compilation was very
148 order dependent.
149
150 The declaration of zzTRACEdata was made unconditional and the
151 problem of removing unused declarations will be left to optimizers.
152
153 Diagnosed by Douglas E. Forester (dougf@projtech.com).
154
155 #250. (MR21) Option for EXPERIMENTAL change to error sets for blocks
156
157 The antlr option -mrblkerr turns on an experimental feature
158 which is supposed to provide more accurate syntax error messages
159 for k=1, ck=1 grammars. When used with k>1 or ck>1 grammars the
160 behavior should be no worse than the current behavior.
161
162 There is no problem with the matching of elements or the computation
163 of prediction expressions in pccts. The task is only one of listing
164 the most appropriate tokens in the error message. The error sets used
165 in pccts error messages are approximations of the exact error set when
166 optional elements in (...)* or (...)+ are involved. While entirely
167 correct, the error messages are sometimes not 100% accurate.
168
169 There is also a minor philosophical issue. For example, suppose the
170 grammar expects the token to be an optional A followed by Z, and it
171 is X. X, of course, is neither A nor Z, so an error message is appropriate.
172 Is it appropriate to say "Expected Z" ? It is correct, it is accurate,
173 but it is not complete.
174
175 When k>1 or ck>1 the problem of providing the exactly correct
176 list of tokens for the syntax error messages ends up becoming
177 equivalent to evaluating the prediction expression for the
178 alternatives twice. However, for k=1 ck=1 grammars the prediction
179 expression can be computed easily and evaluated cheaply, so I
180 decided to try implementing it to satisfy a particular application.
181 This application uses the error set in an interactive command language
182 to provide prompts which list the alternatives available at that
183 point in the parser. The user can then enter additional tokens to
184 complete the command line. To do this required more accurate error
185 sets then previously provided by pccts.
186
187 In some cases the default pccts behavior may lead to more robust error
188 recovery or clearer error messages then having the exact set of tokens.
189 This is because (a) features like -ge allow the use of symbolic names for
190 certain sets of tokens, so having extra tokens may simply obscure things
191 and (b) the error set is use to resynchronize the parser, so a good
192 choice is sometimes more important than having the exact set.
193
194 Consider the following example:
195
196 Note: All examples code has been abbreviated
197 to the absolute minimum in order to make the
198 examples concise.
199
200 star1 : (A)* Z;
201
202 The generated code resembles:
203
204 old new (with -mrblkerr)
205 ------------- --------------------
206 for (;;) { for (;;) {
207 match(A); match(A);
208 } }
209 match(Z); if (! A and ! Z) then
210 FAIL(...{A,Z}...);
211 }
212 match(Z);
213
214
215 With input X
216 old message: Found X, expected Z
217 new message: Found X, expected A, Z
218
219 For the example:
220
221 star2 : (A|B)* Z;
222
223 old new (with -mrblkerr)
224 ------------- --------------------
225 for (;;) { for (;;) {
226 if (!A and !B) break; if (!A and !B) break;
227 if (...) { if (...) {
228 <same ...> <same ...>
229 } }
230 else { else {
231 FAIL(...{A,B,Z}...) FAIL(...{A,B}...);
232 } }
233 } }
234 match(B); if (! A and ! B and !Z) then
235 FAIL(...{A,B,Z}...);
236 }
237 match(B);
238
239 With input X
240 old message: Found X, expected Z
241 new message: Found X, expected A, B, Z
242 With input A X
243 old message: Found X, expected Z
244 new message: Found X, expected A, B, Z
245
246 This includes the choice of looping back to the
247 star block.
248
249 The code for plus blocks:
250
251 plus1 : (A)+ Z;
252
253 The generated code resembles:
254
255 old new (with -mrblkerr)
256 ------------- --------------------
257 do { do {
258 match(A); match(A);
259 } while (A) } while (A)
260 match(Z); if (! A and ! Z) then
261 FAIL(...{A,Z}...);
262 }
263 match(Z);
264
265 With input A X
266 old message: Found X, expected Z
267 new message: Found X, expected A, Z
268
269 This includes the choice of looping back to the
270 plus block.
271
272 For the example:
273
274 plus2 : (A|B)+ Z;
275
276 old new (with -mrblkerr)
277 ------------- --------------------
278 do { do {
279 if (A) { <same>
280 match(A); <same>
281 } else if (B) { <same>
282 match(B); <same>
283 } else { <same>
284 if (cnt > 1) break; <same>
285 FAIL(...{A,B,Z}...) FAIL(...{A,B}...);
286 } }
287 cnt++; <same>
288 } }
289
290 match(Z); if (! A and ! B and !Z) then
291 FAIL(...{A,B,Z}...);
292 }
293 match(B);
294
295 With input X
296 old message: Found X, expected A, B, Z
297 new message: Found X, expected A, B
298 With input A X
299 old message: Found X, expected Z
300 new message: Found X, expected A, B, Z
301
302 This includes the choice of looping back to the
303 star block.
304
305 #249. (MR21) Changes for DEC/VMS systems
306
307 Jean-François Piéronne (jfp@altavista.net) has updated some
308 VMS related command files and fixed some minor problems related
309 to building pccts under the DEC/VMS operating system. For DEC/VMS
310 users the most important differences are:
311
312 a. Revised makefile.vms
313 b. Revised genMMS for genrating VMS style makefiles.
314
315 #248. (MR21) Generate symbol for first set of an alternative
316
317 pccts can generate a symbol which represents the tokens which may
318 appear at the start of a block:
319
320 rr : #FirstSetSymbol(rr_FirstSet) ( Foo | Bar ) ;
321
322 This will generate the symbol rr_FirstSet of type SetWordType with
323 elements Foo and Bar set. The bits can be tested using code similar
324 to the following:
325
326 if (set_el(Foo, &rr_FirstSet)) { ...
327
328 This can be combined with the C array zztokens[] or the C++ routine
329 tokenName() to get the print name of the token in the first set.
330
331 The size of the set is given by the newly added enum SET_SIZE, a
332 protected member of the generated parser's class. The number of
333 elements in the generated set will not be exactly equal to the
334 value of SET_SIZE because of synthetic tokens created by #tokclass,
335 #errclass, the -ge option, and meta-tokens such as epsilon, and
336 end-of-file.
337
338 The #FirstSetSymbol must appear immediately before a block
339 such as (...)+, (...)*, and {...}, and (...). It may not appear
340 immediately before a token, a rule reference, or action. However
341 a token or rule reference can be enclosed in a (...) in order to
342 make the use of #pragma FirstSetSymbol legal.
343
344 rr_bad : #FirstSetSymbol(rr_bad_FirstSet) Foo; // Illegal
345
346 rr_ok : #FirstSetSymbol(rr_ok_FirstSet) (Foo); // Legal
347
348 Do not confuse FirstSetSymbol sets with the sets used for testing
349 lookahead. The sets used for FirstSetSymbol have one element per bit,
350 so the number of bytes is approximately the largest token number
351 divided by 8. The sets used for testing lookahead store 8 lookahead
352 sets per byte, so the length of the array is approximately the largest
353 token number.
354
355 If there is demand, a similar routine for follow sets can be added.
356
357 #247. (MR21) Misleading error message on syntax error for optional elements.
358
359 Prior to MR21, tokens which were optional did not appear in syntax
360 error messages if the block which immediately followed detected a
361 syntax error.
362
363 Consider the following grammar which accepts Number, Word, and Other:
364
365 rr : {Number} Word;
366
367 For this rule the code resembles:
368
369 if (LA(1) == Number) {
370 match(Number);
371 consume();
372 }
373 match(Word);
374
375 Prior to MR21, the error message for input "$ a" would be:
376
377 line 1: syntax error at "$" missing Word
378
379 With MR21 the message will be:
380
381 line 1: syntax error at "$" expecting Word, Number.
382
383 The generate code resembles:
384
385 if ( (LA(1)==Number) ) {
386 zzmatch(Number);
387 consume();
388 }
389 else {
390 if ( (LA(1)==Word) ) {
391 /* nothing */
392 }
393 else {
394 FAIL(... message for both Number and Word ...);
395 }
396 }
397 match(Word);
398
399 The code generated for optional blocks in MR21 is slightly longer
400 than the previous versions, but it should give better error messages.
401
402 The code generated for:
403
404 { a | b | c }
405
406 should now be *identical* to:
407
408 ( a | b | c | )
409
410 which was not the case prior to MR21.
411
412 Reported by Sue Marvin (sue@siara.com).
413
414 #246. (Changed in MR21) Use of $(MAKE) for calls to make
415
416 Calls to make from the makefiles were replaced with $(MAKE)
417 because of problems when using gmake.
418
419 Reported with fix by Sunil K.Vallamkonda (sunil@siara.com).
420
421 #245. (Changed in MR21) Changes to genmk
422
423 The following command line options have been added to genmk:
424
425 -cfiles ...
426
427 To add a user's C or C++ files into makefile automatically.
428 The list of files must be enclosed in apostrophes. This
429 option may be specified multiple times.
430
431 -compiler ...
432
433 The name of the compiler to use for $(CCC) or $(CC). The
434 default in C++ mode is "CC". The default in C mode is "cc".
435
436 -pccts_path ...
437
438 The value for $(PCCTS), the pccts directory. The default
439 is /usr/local/pccts.
440
441 Contributed by Tomasz Babczynski (t.babczynski@ict.pwr.wroc.pl).
442
443 #244. (Changed in MR21) Rename variable "not" in antlr.g
444
445 When antlr.g is compiled with a C++ compiler, a variable named
446 "not" causes problems. Reported by Sinan Karasu
447 (sinan.karasu@boeing.com).
448
449 #243 (Changed in MR21) Replace recursion with iteration in zzfree_ast
450
451 Another refinement to zzfree_ast in ast.c to limit recursion.
452
453 NAKAJIMA Mutsuki (muc@isr.co.jp).
454
455
456 #242. (Changed in MR21) LineInfoFormatStr
457
458 Added an #ifndef/#endif around LineInfoFormatStr in pcctscfg.h.
459
460 #241. (Changed in MR21) Changed macro PURIFY to a no-op
461
462 ***********************
463 *** NOT IMPLEMENTED ***
464 ***********************
465
466 The PURIFY macro was changed to a no-op because it was causing
467 problems when passing C++ objects.
468
469 The old definition:
470
471 #define PURIFY(r,s) memset((char *) &(r),'\\0',(s));
472
473 The new definition:
474
475 #define PURIFY(r,s) /* nothing */
476 #endif
477
478 #240. (Changed in MR21) sorcerer/h/sorcerer.h _MATCH and _MATCHRANGE
479
480 Added test for NULL token pointer.
481
482 Suggested by Peter Keller (keller@ebi.ac.uk)
483
484 #239. (Changed in MR21) C++ mode AParser::traceGuessFail
485
486 If tracing is turned on when the code has been generated
487 without trace code, a failed guess generates a trace report
488 even though there are no other trace reports. This
489 make the behavior consistent with other parts of the
490 trace system.
491
492 Reported by David Wigg (wiggjd@sbu.ac.uk).
493
494 #238. (Changed in MR21) Namespace version #include files
495
496 Changed reference from CStdio to cstdio (and other
497 #include file names) in the namespace version of pccts.
498 Should have known better.
499
500 #237. (Changed in MR21) ParserBlackBox(FILE*)
501
502 In the past, ParserBlackBox would close the FILE in the dtor
503 even though it was not opened by ParserBlackBox. The problem
504 is that there were two constructors, one which accepted a file
505 name and did an fopen, the other which accepted a FILE and did
506 not do an fopen. There is now an extra member variable which
507 remembers whether ParserBlackBox did the open or not.
508
509 Suggested by Mike Percy (mpercy@scires.com).
510
511 #236. (Changed in MR21) tmake now reports down pointer problem
512
513 When ASTBase::tmake attempts to update the down pointer of
514 an AST it checks to see if the down pointer is NULL. If it
515 is not NULL it does not do the update and returns NULL.
516 An attempt to update the down pointer is almost always a
517 result of a user error. This can lead to difficult to find
518 problems during tree construction.
519
520 With this change, the routine calls a virtual function
521 reportOverwriteOfDownPointer() which calls panic to
522 report the problem. Users who want the old behavior can
523 redefined the virtual function in their AST class.
524
525 Suggested by Sinan Karasu (sinan.karasu@boeing.com)
526
527 #235. (Changed in MR21) Made ANTLRParser::resynch() virtual
528
529 Suggested by Jerry Evans (jerry@swsl.co.uk).
530
531 #234. (Changed in MR21) Implicit int for function return value
532
533 ATokenBuffer:bufferSize() did not specify a type for the
534 return value.
535
536 Reported by Hai Vo-Ba (hai@fc.hp.com).
537
538 #233. (Changed in MR20) Converted to MSVC 6.0
539
540 Due to external circumstances I have had to convert to MSVC 6.0
541 The MSVC 5.0 project files (.dsw and .dsp) have been retained as
542 xxx50.dsp and xxx50.dsw. The MSVC 6.0 files are named xxx60.dsp
543 and xxx60.dsw (where xxx is the related to the directory/project).
544
545 #232. (Changed in MR20) Make setwd bit vectors protected in parser.h
546
547 The access for the setwd array in the parser header was not
548 specified. As a result, it would depend on the code which
549 preceded it. In MR20 it will always have access "protected".
550
551 Reported by Piotr Eljasiak (eljasiak@zt.gdansk.tpsa.pl).
552
553 #231. (Changed in MR20) Error in token buffer debug code.
554
555 When token buffer debugging is selected via the pre-processor
556 symbol DEBUG_TOKENBUFFER there is an erroneous check in
557 AParser.cpp:
558
559 #ifdef DEBUG_TOKENBUFFER
560 if (i >= inputTokens->bufferSize() ||
561 inputTokens->minTokens() < LLk ) /* MR20 Was "<=" */
562 ...
563 #endif
564
565 Reported by David Wigg (wiggjd@sbu.ac.uk).
566
567 #230. (Changed in MR20) Fixed problem with #define for -gd option
568
569 There was an error in setting zzTRACE_RULES for the -gd (trace) option.
570
571 Reported by Gary Funck (gary@intrepid.com).
572
573 #229. (Changed in MR20) Additional "const" for literals
574
575 "const" was added to the token name literal table.
576 "const" was added to some panic() and similar routine
577
578 #228. (Changed in MR20) dlg crashes on "()"
579
580 The following token defintion will cause DLG to crash.
581
582 #token "()"
583
584 When there is a syntax error in a regular expression
585 many of the dlg routines return a structure which has
586 null pointers. When this is accessed by callers it
587 generates the crash.
588
589 I have attempted to fix the more common cases.
590
591 Reported by Mengue Olivier (dolmen@bigfoot.com).
592
593 #227. (Changed in MR20) Array overwrite
594
595 Steveh Hand (sassth@unx.sas.com) reported a problem which
596 was traced to a temporary array which was not properly
597 resized for deeply nested blocks. This has been fixed.
598
599 #226. (Changed in MR20) -pedantic conformance
600
601 G. Hobbelt (i_a@mbh.org) and THM made many, many minor
602 changes to create prototypes for all the functions and
603 bring antlr, dlg, and sorcerer into conformance with
604 the gcc -pedantic option.
605
606 This may require uses to add pccts/h/pcctscfg.h to some
607 files or makefiles in order to have __USE_PROTOS defined.
608
609 #225 (Changed in MR20) AST stack adjustment in C mode
610
611 The fix in #214 for AST stack adjustment in C mode missed
612 some cases.
613
614 Reported with fix by Ger Hobbelt (i_a@mbh.org).
615
616 #224 (Changed in MR20) LL(1) and LL(2) with #pragma approx
617
618 This may take a record for the oldest, most trival, lexical
619 error in pccts. The regular expressions for LL(1) and LL(2)
620 lacked an escape for the left and right parenthesis.
621
622 Reported by Ger Hobbelt (i_a@mbh.org).
623
624 #223 (Changed in MR20) Addition of IBM_VISUAL_AGE directory
625
626 Build files for antlr, dlg, and sorcerer under IBM Visual Age
627 have been contributed by Anton Sergeev (ags@mlc.ru). They have
628 been placed in the pccts/IBM_VISUAL_AGE directory.
629
630 #222 (Changed in MR20) Replace __STDC__ with __USE_PROTOS
631
632 Most occurrences of __STDC__ replaced with __USE_PROTOS due to
633 complaints from several users.
634
635 #221 (Changed in MR20) Added #include for DLexerBase.h to PBlackBox.
636
637 Added #include for DLexerBase.h to PBlackBox.
638
639 #220 (Changed in MR19) strcat arguments reversed in #pred parse
640
641 The arguments to strcat are reversed when creating a print
642 name for a hash table entry for use with #pred feature.
643
644 Problem diagnosed and fix reported by Scott Harrington
645 (seh4@ix.netcom.com).
646
647 #219. (Changed in MR19) C Mode routine zzfree_ast
648
649 Changes to reduce use of recursion for AST trees with only right
650 links or only left links in the C mode routine zzfree_ast.
651
652 Implemented by SAKAI Kiyotaka (ksakai@isr.co.jp).
653
654 #218. (Changed in MR19) Changes to support unsigned char in C mode
655
656 Changes to antlr.h and err.h to fix omissions in use of zzchar_t
657
658 Implemented by SAKAI Kiyotaka (ksakai@isr.co.jp).
659
660 #217. (Changed in MR19) Error message when dlg -i and -CC options selected
661
662 *** This change was rescinded by item #257 ***
663
664 The parsers generated by pccts in C++ mode are not able to support the
665 interactive lexer option (except, perhaps, when using the deferred fetch
666 parser option.(Item #216).
667
668 DLG now warns when both -i and -CC are selected.
669
670 This warning was suggested by David Venditti (07751870267-0001@t-online.de).
671
672 #216. (Changed in MR19) Defer token fetch for C++ mode
673
674 Implemented by Volker H. Simonis (simonis@informatik.uni-tuebingen.de)
675
676 Normally, pccts keeps the lookahead token buffer completely filled.
677 This requires max(k,ck) tokens of lookahead. For some applications
678 this can cause deadlock problems. For example, there may be cases
679 when the parser can't tell when the input has been completely consumed
680 until the parse is complete, but the parse can't be completed because
681 the input routines are waiting for additional tokens to fill the
682 lookahead buffer.
683
684 When the ANTLRParser class is built with the pre-processor option
685 ZZDEFER_FETCH defined, the fetch of new tokens by consume() is deferred
686 until LA(i) or LT(i) is called.
687
688 To test whether this option has been built into the ANTLRParser class
689 use "isDeferFetchEnabled()".
690
691 Using the -gd trace option with the default tracein() and traceout()
692 routines will defeat the effort to defer the fetch because the
693 trace routines print out information about the lookahead token at
694 the start of the rule.
695
696 Because the tracein and traceout routines are virtual it is
697 easy to redefine them in your parser:
698
699 class MyParser {
700 <<
701 virtual void tracein(ANTLRChar * ruleName)
702 { fprintf(stderr,"Entering: %s\n", ruleName); }
703 virtual void traceout(ANTLRChar * ruleName)
704 { fprintf(stderr,"Leaving: %s\n", ruleName); }
705 >>
706
707 The originals for those routines are pccts/h/AParser.cpp
708
709 This requires use of the dlg option -i (interactive lexer).
710
711 This is experimental. The interaction with guess mode (syntactic
712 predicates)is not known.
713
714 #215. (Changed in MR19) Addition of reset() to DLGLexerBase
715
716 There was no obvious way to reset the lexer for reuse. The
717 reset() method now does this.
718
719 Suggested by David Venditti (07751870267-0001@t-online.de).
720
721 #214. (Changed in MR19) C mode: Adjust AST stack pointer at exit
722
723 In C mode the AST stack pointer needs to be reset if there will
724 be multiple calls to the ANTLRx macros.
725
726 Reported with fix by Paul D. Smith (psmith@baynetworks.com).
727
728 #213. (Changed in MR18) Fatal error with -mrhoistk (k>1 hoisting)
729
730 When rearranging code I forgot to un-comment a critical line of
731 code that handles hoisting of predicates with k>1 lookahead. This
732 is now fixed.
733
734 Reported by Reinier van den Born (reinier@vnet.ibm.com).
735
736 #212. (Changed in MR17) Mac related changes by Kenji Tanaka
737
738 Kenji Tanaka (kentar@osa.att.ne.jp) has made a number of changes for
739 Macintosh users.
740
741 a. The following Macintosh MPW files aid in installing pccts on Mac:
742
743 pccts/MPW_Read_Me
744
745 pccts/install68K.mpw
746 pccts/installPPC.mpw
747
748 pccts/antlr/antlr.r
749 pccts/antlr/antlr68K.make
750 pccts/antlr/antlrPPC.make
751
752 pccts/dlg/dlg.r
753 pccts/dlg/dlg68K.make
754 pccts/dlg/dlgPPC.make
755
756 pccts/sorcerer/sor.r
757 pccts/sorcerer/sor68K.make
758 pccts/sorcerer/sorPPC.make
759
760 They completely replace the previous Mac installation files.
761
762 b. The most significant is a change in the MAC_FILE_CREATOR symbol
763 in pcctscfg.h:
764
765 old: #define MAC_FILE_CREATOR 'MMCC' /* Metrowerks C/C++ Text files */
766 new: #define MAC_FILE_CREATOR 'CWIE' /* Metrowerks C/C++ Text files */
767
768 c. Added calls to special_fopen_actions() where necessary.
769
770 #211. (Changed in MR16a) C++ style comment in dlg
771
772 This has been fixed.
773
774 #210. (Changed in MR16a) Sor accepts \r\n, \r, or \n for end-of-line
775
776 A user requested that Sorcerer be changed to accept other forms
777 of end-of-line.
778
779 #209. (Changed in MR16) Name of files changed.
780
781 Old: CHANGES_FROM_1.33
782 New: CHANGES_FROM_133.txt
783
784 Old: KNOWN_PROBLEMS
785 New: KNOWN_PROBLEMS.txt
786
787 #208. (Changed in MR16) Change in use of pccts #include files
788
789 There were problems with MS DevStudio when mixing Sorcerer and
790 PCCTS in the same source file. The problem is caused by the
791 redefinition of setjmp in the MS header file setjmp.h. In
792 setjmp.h the pre-processor symbol setjmp was redefined to be
793 _setjmp. A later effort to execute #include <setjmp.h> resulted
794 in an effort to #include <_setjmp.h>. I'm not sure whether this
795 is a bug or a feature. In any case, I decided to fix it by
796 avoiding the use of pre-processor symbols in #include statements
797 altogether. This has the added benefit of making pre-compiled
798 headers work again.
799
800 I've replaced statements:
801
802 old: #include PCCTS_SETJMP_H
803 new: #include "pccts_setjmp.h"
804
805 Where pccts_setjmp.h contains:
806
807 #ifndef __PCCTS_SETJMP_H__
808 #define __PCCTS_SETJMP_H__
809
810 #ifdef PCCTS_USE_NAMESPACE_STD
811 #include <Csetjmp>
812 #else
813 #include <setjmp.h>
814 #endif
815
816 #endif
817
818 A similar change has been made for other standard header files
819 required by pccts and sorcerer: stdlib.h, stdarg.h, stdio.h, etc.
820
821 Reported by Jeff Vincent (JVincent@novell.com) and Dale Davis
822 (DalDavis@spectrace.com).
823
824 #207. (Changed in MR16) dlg reports an invalid range for: [\0x00-\0xff]
825
826 dlg will report that this is an invalid range.
827
828 Diagnosed by Piotr Eljasiak (eljasiak@no-spam.zt.gdansk.tpsa.pl):
829
830 I think this problem is not specific to unsigned chars
831 because dlg reports no error for the range [\0x00-\0xfe].
832
833 I've found that information on range is kept in field
834 letter (unsigned char) of Attrib struct. Unfortunately
835 the letter value internally is for some reasons increased
836 by 1, so \0xff is represented here as 0.
837
838 That's why dlg complains about the range [\0x00-\0xff] in
839 dlg_p.g:
840
841 if ($$.letter > $2.letter) {
842 error("invalid range ", zzline);
843 }
844
845 The fix is:
846
847 if ($$.letter > $2.letter && 255 != $$2.letter) {
848 error("invalid range ", zzline);
849 }
850
851 #206. (Changed in MR16) Free zzFAILtext in ANTLRParser destructor
852
853 The ANTLRParser destructor now frees zzFAILtext.
854
855 Problem and fix reported by Manfred Kogler (km@cast.uni-linz.ac.at).
856
857 #205. (Changed in MR16) DLGStringReset argument now const
858
859 Changed: void DLGStringReset(DLGChar *s) {...}
860 To: void DLGStringReset(const DLGChar *s) {...}
861
862 Suggested by Dale Davis (daldavis@spectrace.com)
863
864 #204. (Changed in MR15a) Change __WATCOM__ to __WATCOMC__ in pcctscfg.h
865
866 Reported by Oleg Dashevskii (olegdash@my-dejanews.com).
867
868 #203. (Changed in MR15) Addition of sorcerer to distribution kit
869
870 I have finally caved in to popular demand. The pccts 1.33mr15
871 kit will include sorcerer. The separate sorcerer kit will be
872 discontinued.
873
874 #202. (Changed) in MR15) Organization of MS Dev Studio Projects in Kit
875
876 Previously there was one workspace that contained projects for
877 all three parts of pccts: antlr, dlg, and sorcerer. Now each
878 part (and directory) has its own workspace/project and there
879 is an additional workspace/project to build a library from the
880 .cpp files in the pccts/h directory.
881
882 The library build will create pccts_debug.lib or pccts_release.lib
883 according to the configuration selected.
884
885 If you don't want to build pccts 1.33MR15 you can download a
886 ready-to-run kit for win32 from http://www.polhode.com/win32.zip.
887 The ready-to-run for win32 includes executables, a pre-built static
888 library for the .cpp files in the pccts/h directory, and a sample
889 application
890
891 You will need to define the environment variable PCCTS to point to
892 the root of the pccts directory hierarchy.
893
894 #201. (Changed in MR15) Several fixes by K.J. Cummings (cummings@peritus.com)
895
896 Generation of SETJMP rather than SETJMP_H in gen.c.
897
898 (Sor B19) Declaration of ref_vars_inits for ref_var_inits in
899 pccts/sorcerer/sorcerer.h.
900
901 #200. (Changed in MR15) Remove operator=() in AToken.h
902
903 User reported that WatCom couldn't handle use of
904 explicit operator =(). Replace with equivalent
905 using cast operator.
906
907 #199. (Changed in MR15) Don't allow use of empty #tokclass
908
909 Change antlr.g to disallow empty #tokclass sets.
910
911 Reported by Manfred Kogler (km@cast.uni-linz.ac.at).
912
913 #198. Revised ANSI C grammar due to efforts by Manuel Kessler
914
915 Manuel Kessler (mlkessler@cip.physik.uni-wuerzburg.de)
916
917 Allow trailing ... in function parameter lists.
918 Add bit fields.
919 Allow old-style function declarations.
920 Support cv-qualified pointers.
921 Better checking of combinations of type specifiers.
922 Release of memory for local symbols on scope exit.
923 Allow input file name on command line as well as by redirection.
924
925 and other miscellaneous tweaks.
926
927 This is not part of the pccts distribution kit. It must be
928 downloaded separately from:
929
930 http://www.polhode.com/ansi_mr15.zip
931
932 #197. (Changed in MR14) Resetting the lookahead buffer of the parser
933
934 Explanation and fix by Sinan Karasu (sinan.karasu@boeing.com)
935
936 Consider the code used to prime the lookahead buffer LA(i)
937 of the parser when init() is called:
938
939 void
940 ANTLRParser::
941 prime_lookahead()
942 {
943 int i;
944 for(i=1;i<=LLk; i++) consume();
945 dirty=0;
946 //lap = 0; // MR14 - Sinan Karasu (sinan.karusu@boeing.com)
947 //labase = 0; // MR14
948 labase=lap; // MR14
949 }
950
951 When the parser is instantiated, lap=0,labase=0 is set.
952
953 The "for" loop runs LLk times. In consume(), lap = lap +1 (mod LLk) is
954 computed. Therefore, lap(before the loop) == lap (after the loop).
955
956 Now the only problem comes in when one does an init() of the parser
957 after an Eof has been seen. At that time, lap could be non zero.
958 Assume it was lap==1. Now we do a prime_lookahead(). If LLk is 2,
959 then
960
961 consume()
962 {
963 NLA = inputTokens->getToken()->getType();
964 dirty--;
965 lap = (lap+1)&(LLk-1);
966 }
967
968 or expanding NLA,
969
970 token_type[lap&(LLk-1)]) = inputTokens->getToken()->getType();
971 dirty--;
972 lap = (lap+1)&(LLk-1);
973
974 so now we prime locations 1 and 2. In prime_lookahead it used to set
975 lap=0 and labase=0. Now, the next token will be read from location 0,
976 NOT 1 as it should have been.
977
978 This was never caught before, because if a parser is just instantiated,
979 then lap and labase are 0, the offending assignment lines are
980 basically no-ops, since the for loop wraps around back to 0.
981
982 #196. (Changed in MR14) Problems with "(alpha)? beta" guess
983
984 Consider the following syntactic predicate in a grammar
985 with 2 tokens of lookahead (k=2 or ck=2):
986
987 rule : ( alpha )? beta ;
988 alpha : S t ;
989 t : T U
990 | T
991 ;
992 beta : S t Z ;
993
994 When antlr computes the prediction expression with one token
995 of lookahead for alts 1 and 2 of rule t it finds an ambiguity.
996
997 Because the grammar has a lookahead of 2 it tries to compute
998 two tokens of lookahead for alts 1 and 2 of t. Alt 1 clearly
999 has a lookahead of (T U). Alt 2 is one token long so antlr
1000 tries to compute the follow set of alt 2, which means finding
1001 the things which can follow rule t in the context of (alpha)?.
1002 This cannot be computed, because alpha is only part of a rule,
1003 and antlr can't tell what part of beta is matched by alpha and
1004 what part remains to be matched. Thus it impossible for antlr
1005 to properly determine the follow set of rule t.
1006
1007 Prior to 1.33MR14, the follow of (alpha)? was computed as
1008 FIRST(beta) as a result of the internal representation of
1009 guess blocks.
1010
1011 With MR14 the follow set will be the empty set for that context.
1012
1013 Normally, one expects a rule appearing in a guess block to also
1014 appear elsewhere. When the follow context for this other use
1015 is "ored" with the empty set, the context from the other use
1016 results, and a reasonable follow context results. However if
1017 there is *no* other use of the rule, or it is used in a different
1018 manner then the follow context will be inaccurate - it was
1019 inaccurate even before MR14, but it will be inaccurate in a
1020 different way.
1021
1022 For the example given earlier, a reasonable way to rewrite the
1023 grammar:
1024
1025 rule : ( alpha )? beta
1026 alpha : S t ;
1027 t : T U
1028 | T
1029 ;
1030 beta : alpha Z ;
1031
1032 If there are no other uses of the rule appearing in the guess
1033 block it will generate a test for EOF - a workaround for
1034 representing a null set in the lookahead tests.
1035
1036 If you encounter such a problem you can use the -alpha option
1037 to get additional information:
1038
1039 line 2: error: not possible to compute follow set for alpha
1040 in an "(alpha)? beta" block.
1041
1042 With the antlr -alpha command line option the following information
1043 is inserted into the generated file:
1044
1045 #if 0
1046
1047 Trace of references leading to attempt to compute the follow set of
1048 alpha in an "(alpha)? beta" block. It is not possible for antlr to
1049 compute this follow set because it is not known what part of beta has
1050 already been matched by alpha and what part remains to be matched.
1051
1052 Rules which make use of the incorrect follow set will also be incorrect
1053
1054 1 #token T alpha/2 line 7 brief.g
1055 2 end alpha alpha/3 line 8 brief.g
1056 2 end (...)? block at start/1 line 2 brief.g
1057
1058 #endif
1059
1060 At the moment, with the -alpha option selected the program marks
1061 any rules which appear in the trace back chain (above) as rules with
1062 possible problems computing follow set.
1063
1064 Reported by Greg Knapen (gregory.knapen@bell.ca).
1065
1066 #195. (Changed in MR14) #line directive not at column 1
1067
1068 Under certain circunstances a predicate test could generate
1069 a #line directive which was not at column 1.
1070
1071 Reported with fix by David KÃ¥gedal (davidk@lysator.liu.se)
1072 (http://www.lysator.liu.se/~davidk/).
1073
1074 #194. (Changed in MR14) (C Mode only) Demand lookahead with #tokclass
1075
1076 In C mode with the demand lookahead option there is a bug in the
1077 code which handles matches for #tokclass (zzsetmatch and
1078 zzsetmatch_wsig).
1079
1080 The bug causes the lookahead pointer to get out of synchronization
1081 with the current token pointer.
1082
1083 The problem was reported with a fix by Ger Hobbelt (hobbelt@axa.nl).
1084
1085 #193. (Changed in MR14) Use of PCCTS_USE_NAMESPACE_STD
1086
1087 The pcctscfg.h now contains the following definitions:
1088
1089 #ifdef PCCTS_USE_NAMESPACE_STD
1090 #define PCCTS_STDIO_H <Cstdio>
1091 #define PCCTS_STDLIB_H <Cstdlib>
1092 #define PCCTS_STDARG_H <Cstdarg>
1093 #define PCCTS_SETJMP_H <Csetjmp>
1094 #define PCCTS_STRING_H <Cstring>
1095 #define PCCTS_ASSERT_H <Cassert>
1096 #define PCCTS_ISTREAM_H <istream>
1097 #define PCCTS_IOSTREAM_H <iostream>
1098 #define PCCTS_NAMESPACE_STD namespace std {}; using namespace std;
1099 #else
1100 #define PCCTS_STDIO_H <stdio.h>
1101 #define PCCTS_STDLIB_H <stdlib.h>
1102 #define PCCTS_STDARG_H <stdarg.h>
1103 #define PCCTS_SETJMP_H <setjmp.h>
1104 #define PCCTS_STRING_H <string.h>
1105 #define PCCTS_ASSERT_H <assert.h>
1106 #define PCCTS_ISTREAM_H <istream.h>
1107 #define PCCTS_IOSTREAM_H <iostream.h>
1108 #define PCCTS_NAMESPACE_STD
1109 #endif
1110
1111 The runtime support in pccts/h uses these pre-processor symbols
1112 consistently.
1113
1114 Also, antlr and dlg have been changed to generate code which uses
1115 these pre-processor symbols rather than having the names of the
1116 #include files hard-coded in the generated code.
1117
1118 This required the addition of "#include pcctscfg.h" to a number of
1119 files in pccts/h.
1120
1121 It appears that this sometimes causes problems for MSVC 5 in
1122 combination with the "automatic" option for pre-compiled headers.
1123 In such cases disable the "automatic" pre-compiled headers option.
1124
1125 Suggested by Hubert Holin (Hubert.Holin@Bigfoot.com).
1126
1127 #192. (Changed in MR14) Change setText() to accept "const ANTLRChar *"
1128
1129 Changed ANTLRToken::setText(ANTLRChar *) to setText(const ANTLRChar *).
1130 This allows literal strings to be used to initialize tokens. Since
1131 the usual token implementation (ANTLRCommonToken) makes a copy of the
1132 input string, this was an unnecessary limitation.
1133
1134 Suggested by Bob McWhirter (bob@netwrench.com).
1135
1136 #191. (Changed in MR14) HP/UX aCC compiler compatibility problem
1137
1138 Needed to explicitly declare zzINF_DEF_TOKEN_BUFFER_SIZE and
1139 zzINF_BUFFER_TOKEN_CHUNK_SIZE as ints in pccts/h/AParser.cpp.
1140
1141 Reported by David Cook (dcook@bmc.com).
1142
1143 #190. (Changed in MR14) IBM OS/2 CSet compiler compatibility problem
1144
1145 Name conflict with "_cs" in pccts/h/ATokenBuffer.cpp
1146
1147 Reported by David Cook (dcook@bmc.com).
1148
1149 #189. (Changed in MR14) -gxt switch in C mode
1150
1151 The -gxt switch in C mode didn't work because of incorrect
1152 initialization.
1153
1154 Reported by Sinan Karasu (sinan@boeing.com).
1155
1156 #188. (Changed in MR14) Added pccts/h/DLG_stream_input.h
1157
1158 This is a DLG stream class based on C++ istreams.
1159
1160 Contributed by Hubert Holin (Hubert.Holin@Bigfoot.com).
1161
1162 #187. (Changed in MR14) Rename config.h to pcctscfg.h
1163
1164 The PCCTS configuration file has been renamed from config.h to
1165 pcctscfg.h. The problem with the original name is that it led
1166 to name collisions when pccts parsers were combined with other
1167 software.
1168
1169 All of the runtime support routines in pccts/h/* have been
1170 changed to use the new name. Existing software can continue
1171 to use pccts/h/config.h. The contents of pccts/h/config.h is
1172 now just "#include "pcctscfg.h".
1173
1174 I don't have a record of the user who suggested this.
1175
1176 #186. (Changed in MR14) Pre-processor symbol DllExportPCCTS class modifier
1177
1178 Classes in the C++ runtime support routines are now declared:
1179
1180 class DllExportPCCTS className ....
1181
1182 By default, the pre-processor symbol is defined as the empty
1183 string. This if for use by MSVC++ users to create DLL classes.
1184
1185 Suggested by Manfred Kogler (km@cast.uni-linz.ac.at).
1186
1187 #185. (Changed in MR14) Option to not use PCCTS_AST base class for ASTBase
1188
1189 Normally, the ASTBase class is derived from PCCTS_AST which contains
1190 functions useful to Sorcerer. If these are not necessary then the
1191 user can define the pre-processor symbol "PCCTS_NOT_USING_SOR" which
1192 will cause the ASTBase class to replace references to PCCTS_AST with
1193 references to ASTBase where necessary.
1194
1195 The class ASTDoublyLinkedBase will contain a pure virtual function
1196 shallowCopy() that was formerly defined in class PCCTS_AST.
1197
1198 Suggested by Bob McWhirter (bob@netwrench.com).
1199
1200 #184. (Changed in MR14) Grammars with no tokens generate invalid tokens.h
1201
1202 Reported by Hubert Holin (Hubert.Holin@bigfoot.com).
1203
1204 #183. (Changed in MR14) -f to specify file with names of grammar files
1205
1206 In DEC/VMS it is difficult to specify very long command lines.
1207 The -f option allows one to place the names of the grammar files
1208 in a data file in order to bypass limitations of the DEC/VMS
1209 command language interpreter.
1210
1211 Addition supplied by Bernard Giroud (b_giroud@decus.ch).
1212
1213 #182. (Changed in MR14) Output directory option for DEC/VMS
1214
1215 Fix some problems with the -o option under DEC/VMS.
1216
1217 Fix supplied by Bernard Giroud (b_giroud@decus.ch).
1218
1219 #181. (Changed in MR14) Allow chars > 127 in DLGStringInput::nextChar()
1220
1221 Changed DLGStringInput to cast the character using (unsigned char)
1222 so that languages with character codes greater than 127 work
1223 without changes.
1224
1225 Suggested by Manfred Kogler (km@cast.uni-linz.ac.at).
1226
1227 #180. (Added in MR14) ANTLRParser::getEofToken()
1228
1229 Added "ANTLRToken ANTLRParser::getEofToken() const" to match the
1230 setEofToken routine.
1231
1232 Requested by Manfred Kogler (km@cast.uni-linz.ac.at).
1233
1234 #179. (Fixed in MR14) Memory leak for BufFileInput subclass of DLGInputStream
1235
1236 The BufFileInput class described in Item #142 neglected to release
1237 the allocated buffer when an instance was destroyed.
1238
1239 Reported by Manfred Kogler (km@cast.uni-linz.ac.at).
1240
1241 #178. (Fixed in MR14) Bug in "(alpha)? beta" guess blocks first sets
1242
1243 In 1.33 vanilla, and all maintenance releases prior to MR14
1244 there is a bug in the handling of guess blocks which use the
1245 "long" form:
1246
1247 (alpha)? beta
1248
1249 inside a (...)*, (...)+, or {...} block.
1250
1251 This problem does *not* apply to the case where beta is omitted
1252 or when the syntactic predicate is on the leading edge of an
1253 alternative.
1254
1255 The problem is that both alpha and beta are stored in the
1256 syntax diagram, and that some analysis routines would fail
1257 to skip the alpha portion when it was not on the leading edge.
1258 Consider the following grammar with -ck 2:
1259
1260 r : ( (A)? B )* C D
1261
1262 | A B /* forces -ck 2 computation for old antlr */
1263 /* reports ambig for alts 1 & 2 */
1264
1265 | B C /* forces -ck 2 computation for new antlr */
1266 /* reports ambig for alts 1 & 3 */
1267 ;
1268
1269 The prediction expression for the first alternative should be
1270 LA(1)={B C} LA(2)={B C D}, but previous versions of antlr
1271 would compute the prediction expression as LA(1)={A C} LA(2)={B D}
1272
1273 Reported by Arpad Beszedes (beszedes@inf.u-szeged.hu) who provided
1274 a very clear example of the problem and identified the probable cause.
1275
1276 #177. (Changed in MR14) #tokdefs and #token with regular expression
1277
1278 In MR13 the change described by Item #162 caused an existing
1279 feature of antlr to fail. Prior to the change it was possible
1280 to give regular expression definitions and actions to tokens
1281 which were defined via the #tokdefs directive.
1282
1283 This now works again.
1284
1285 Reported by Manfred Kogler (km@cast.uni-linz.ac.at).
1286
1287 #176. (Changed in MR14) Support for #line in antlr source code
1288
1289 Note: this was implemented by Arpad Beszedes (beszedes@inf.u-szeged.hu).
1290
1291 In 1.33MR14 it is possible for a pre-processor to generate #line
1292 directives in the antlr source and have those line numbers and file
1293 names used in antlr error messages and in the #line directives
1294 generated by antlr.
1295
1296 The #line directive may appear in the following forms:
1297
1298 #line ll "sss" xx xx ...
1299
1300 where ll represents a line number, "sss" represents the name of a file
1301 enclosed in quotation marks, and xxx are arbitrary integers.
1302
1303 The following form (without "line") is not supported at the moment:
1304
1305 # ll "sss" xx xx ...
1306
1307 The result:
1308
1309 zzline
1310
1311 is replaced with ll from the # or #line directive
1312
1313 FileStr[CurFile]
1314
1315 is updated with the contents of the string (if any)
1316 following the line number
1317
1318 Note
1319 ----
1320 The file-name string following the line number can be a complete
1321 name with a directory-path. Antlr generates the output files from
1322 the input file name (by replacing the extension from the file-name
1323 with .c or .cpp).
1324
1325 If the input file (or the file-name from the line-info) contains
1326 a path:
1327
1328 "../grammar.g"
1329
1330 the generated source code will be placed in "../grammar.cpp" (i.e.
1331 in the parent directory). This is inconvenient in some cases
1332 (even the -o switch can not be used) so the path information is
1333 removed from the #line directive. Thus, if the line-info was
1334
1335 #line 2 "../grammar.g"
1336
1337 then the current file-name will become "grammar.g"
1338
1339 In this way, the generated source code according to the grammar file
1340 will always be in the current directory, except when the -o switch
1341 is used.
1342
1343 #175. (Changed in MR14) Bug when guess block appears at start of (...)*
1344
1345 In 1.33 vanilla and all maintenance releases prior to 1.33MR14
1346 there is a bug when a guess block appears at the start of a (...)+.
1347 Consider the following k=1 (ck=1) grammar:
1348
1349 rule :
1350 ( (STAR)? ZIP )* ID ;
1351
1352 Prior to 1.33MR14, the generated code resembled:
1353
1354 ...
1355 zzGUESS_BLOCK
1356 while ( 1 ) {
1357 if ( ! LA(1)==STAR) break;
1358 zzGUESS
1359 if ( !zzrv ) {
1360 zzmatch(STAR);
1361 zzCONSUME;
1362 zzGUESS_DONE
1363 zzmatch(ZIP);
1364 zzCONSUME;
1365 ...
1366
1367 Note that the routine uses STAR for the prediction expression
1368 rather than ZIP. With 1.33MR14 the generated code resembles:
1369
1370 ...
1371 while ( 1 ) {
1372 if ( ! LA(1)==ZIP) break;
1373 ...
1374
1375 This problem existed only with (...)* blocks and was caused
1376 by the slightly more complicate graph which represents (...)*
1377 blocks. This caused the analysis routine to compute the first
1378 set for the alpha part of the "(alpha)? beta" rather than the
1379 beta part.
1380
1381 Both (...)+ and {...} blocks handled the guess block correctly.
1382
1383 Reported by Arpad Beszedes (beszedes@inf.u-szeged.hu) who provided
1384 a very clear example of the problem and identified the probable cause.
1385
1386 #174. (Changed in MR14) Bug when action precedes syntactic predicate
1387
1388 In 1.33 vanilla, and all maintenance releases prior to MR14,
1389 there was a bug when a syntactic predicate was immediately
1390 preceded by an action. Consider the following -ck 2 grammar:
1391
1392 rule :
1393 <<int i;>>
1394 (alpha)? beta C
1395 | A B
1396 ;
1397
1398 alpha : A ;
1399 beta : A B;
1400
1401 Prior to MR14, the code generated for the first alternative
1402 resembled:
1403
1404 ...
1405 zzGUESS
1406 if ( !zzrv && LA(1)==A && LA(2)==A) {
1407 alpha();
1408 zzGUESS_DONE
1409 beta();
1410 zzmatch(C);
1411 zzCONSUME;
1412 } else {
1413 ...
1414
1415 The prediction expression (i.e. LA(1)==A && LA(2)==A) is clearly
1416 wrong because LA(2) should be matched to B (first[2] of beta is {B}).
1417
1418 With 1.33MR14 the prediction expression is:
1419
1420 ...
1421 if ( !zzrv && LA(1)==A && LA(2)==B) {
1422 alpha();
1423 zzGUESS_DONE
1424 beta();
1425 zzmatch(C);
1426 zzCONSUME;
1427 } else {
1428 ...
1429
1430 This will only affect users in which alpha is shorter than
1431 than max(k,ck) and there is an action immediately preceding
1432 the syntactic predicate.
1433
1434 This problem was reported by reported by Arpad Beszedes
1435 (beszedes@inf.u-szeged.hu) who provided a very clear example
1436 of the problem and identified the presence of the init-action
1437 as the likely culprit.
1438
1439 #173. (Changed in MR13a) -glms for Microsoft style filenames with -gl
1440
1441 With the -gl option antlr generates #line directives using the
1442 exact name of the input files specified on the command line.
1443 An oddity of the Microsoft C and C++ compilers is that they
1444 don't accept file names in #line directives containing "\"
1445 even though these are names from the native file system.
1446
1447 With -glms option, the "\" in file names appearing in #line
1448 directives is replaced with a "/" in order to conform to
1449 Microsoft compiler requirements.
1450
1451 Reported by Erwin Achermann (erwin.achermann@switzerland.org).
1452
1453 #172. (Changed in MR13) \r\n in antlr source counted as one line
1454
1455 Some MS software uses \r\n to indicate a new line. Antlr
1456 now recognizes this in counting lines.
1457
1458 Reported by Edward L. Hepler (elh@ece.vill.edu).
1459
1460 #171. (Changed in MR13) #tokclass L..U now allowed
1461
1462 The following is now allowed:
1463
1464 #tokclass ABC { A..B C }
1465
1466 Reported by Dave Watola (dwatola@amtsun.jpl.nasa.gov)
1467
1468 #170. (Changed in MR13) Suppression for predicates with lookahead depth >1
1469
1470 In MR12 the capability for suppression of predicates with lookahead
1471 depth=1 was introduced. With MR13 this had been extended to
1472 predicates with lookahead depth > 1 and released for use by users
1473 on an experimental basis.
1474
1475 Consider the following grammar with -ck 2 and the predicate in rule
1476 "a" with depth 2:
1477
1478 r1 : (ab)* "@"
1479 ;
1480
1481 ab : a
1482 | b
1483 ;
1484
1485 a : (A B)? => <<p(LATEXT(2))>>? A B C
1486 ;
1487
1488 b : A B C
1489 ;
1490
1491 Normally, the predicate would be hoisted into rule r1 in order to
1492 determine whether to call rule "ab". However it should *not* be
1493 hoisted because, even if p is false, there is a valid alternative
1494 in rule b. With "-mrhoistk on" the predicate will be suppressed.
1495
1496 If "-info p" command line option is present the following information
1497 will appear in the generated code:
1498
1499 while ( (LA(1)==A)
1500 #if 0
1501
1502 Part (or all) of predicate with depth > 1 suppressed by alternative
1503 without predicate
1504
1505 pred << p(LATEXT(2))>>?
1506 depth=k=2 ("=>" guard) rule a line 8 t1.g
1507 tree context:
1508 (root = A
1509 B
1510 )
1511
1512 The token sequence which is suppressed: ( A B )
1513 The sequence of references which generate that sequence of tokens:
1514
1515 1 to ab r1/1 line 1 t1.g
1516 2 ab ab/1 line 4 t1.g
1517 3 to b ab/2 line 5 t1.g
1518 4 b b/1 line 11 t1.g
1519 5 #token A b/1 line 11 t1.g
1520 6 #token B b/1 line 11 t1.g
1521
1522 #endif
1523
1524 A slightly more complicated example:
1525
1526 r1 : (ab)* "@"
1527 ;
1528
1529 ab : a
1530 | b
1531 ;
1532
1533 a : (A B)? => <<p(LATEXT(2))>>? (A B | D E)
1534 ;
1535
1536 b : <<q(LATEXT(2))>>? D E
1537 ;
1538
1539
1540 In this case, the sequence (D E) in rule "a" which lies behind
1541 the guard is used to suppress the predicate with context (D E)
1542 in rule b.
1543
1544 while ( (LA(1)==A || LA(1)==D)
1545 #if 0
1546
1547 Part (or all) of predicate with depth > 1 suppressed by alternative
1548 without predicate
1549
1550 pred << q(LATEXT(2))>>?
1551 depth=k=2 rule b line 11 t2.g
1552 tree context:
1553 (root = D
1554 E
1555 )
1556
1557 The token sequence which is suppressed: ( D E )
1558 The sequence of references which generate that sequence of tokens:
1559
1560 1 to ab r1/1 line 1 t2.g
1561 2 ab ab/1 line 4 t2.g
1562 3 to a ab/1 line 4 t2.g
1563 4 a a/1 line 8 t2.g
1564 5 #token D a/1 line 8 t2.g
1565 6 #token E a/1 line 8 t2.g
1566
1567 #endif
1568 &&
1569 #if 0
1570
1571 pred << p(LATEXT(2))>>?
1572 depth=k=2 ("=>" guard) rule a line 8 t2.g
1573 tree context:
1574 (root = A
1575 B
1576 )
1577
1578 #endif
1579
1580 (! ( LA(1)==A && LA(2)==B ) || p(LATEXT(2)) ) {
1581 ab();
1582 ...
1583
1584 #169. (Changed in MR13) Predicate test optimization for depth=1 predicates
1585
1586 When the MR12 generated a test of a predicate which had depth 1
1587 it would use the depth >1 routines, resulting in correct but
1588 inefficient behavior. In MR13, a bit test is used.
1589
1590 #168. (Changed in MR13) Token expressions in context guards
1591
1592 The token expressions appearing in context guards such as:
1593
1594 (A B)? => <<test(LT(1))>>? someRule
1595
1596 are computed during an early phase of antlr processing. As
1597 a result, prior to MR13, complex expressions such as:
1598
1599 ~B
1600 L..U
1601 ~L..U
1602 TokClassName
1603 ~TokClassName
1604
1605 were not computed properly. This resulted in incorrect
1606 context being computed for such expressions.
1607
1608 In MR13 these context guards are verified for proper semantics
1609 in the initial phase and then re-evaluated after complex token
1610 expressions have been computed in order to produce the correct
1611 behavior.
1612
1613 Reported by Arpad Beszedes (beszedes@inf.u-szeged.hu).
1614
1615 #167. (Changed in MR13) ~L..U
1616
1617 Prior to MR13, the complement of a token range was
1618 not properly computed.
1619
1620 #166. (Changed in MR13) token expression L..U
1621
1622 The token U was represented as an unsigned char, restricting
1623 the use of L..U to cases where U was assigned a token number
1624 less than 256. This is corrected in MR13.
1625
1626 #165. (Changed in MR13) option -newAST
1627
1628 To create ASTs from an ANTLRTokenPtr antlr usually calls
1629 "new AST(ANTLRTokenPtr)". This option generates a call
1630 to "newAST(ANTLRTokenPtr)" instead. This allows a user
1631 to define a parser member function to create an AST object.
1632
1633 Similar changes for ASTBase::tmake and ASTBase::link were not
1634 thought necessary since they do not create AST objects, only
1635 use existing ones.
1636
1637 #164. (Changed in MR13) Unused variable _astp
1638
1639 For many compilations, we have lived with warnings about
1640 the unused variable _astp. It turns out that this varible
1641 can *never* be used because the code which references it was
1642 commented out.
1643
1644 This investigation was sparked by a note from Erwin Achermann
1645 (erwin.achermann@switzerland.org).
1646
1647 #163. (Changed in MR13) Incorrect makefiles for testcpp examples
1648
1649 All the examples in pccts/testcpp/* had incorrect definitions
1650 in the makefiles for the symbol "CCC". Instead of CCC=CC they
1651 had CC=$(CCC).
1652
1653 There was an additional problem in testcpp/1/test.g due to the
1654 change in ANTLRToken::getText() to a const member function
1655 (Item #137).
1656
1657 Reported by Maurice Mass (maas@cuci.nl).
1658
1659 #162. (Changed in MR13) Combining #token with #tokdefs
1660
1661 When it became possible to change the print-name of a
1662 #token (Item #148) it became useful to give a #token
1663 statement whose only purpose was to giving a print name
1664 to the #token. Prior to this change this could not be
1665 combined with the #tokdefs feature.
1666
1667 #161. (Changed in MR13) Switch -gxt inhibits generation of tokens.h
1668
1669 #160. (Changed in MR13) Omissions in list of names for remap.h
1670
1671 When a user selects the -gp option antlr creates a list
1672 of macros in remap.h to rename some of the standard
1673 antlr routines from zzXXX to userprefixXXX.
1674
1675 There were number of omissions from the remap.h name
1676 list related to the new trace facility. This was reported,
1677 along with a fix, by Bernie Solomon (bernard@ug.eds.com).
1678
1679 #159. (Changed in MR13) Violations of classic C rules
1680
1681 There were a number of violations of classic C style in
1682 the distribution kit. This was reported, along with fixes,
1683 by Bernie Solomon (bernard@ug.eds.com).
1684
1685 #158. (Changed in MR13) #header causes problem for pre-processors
1686
1687 A user who runs the C pre-processor on antlr source suggested
1688 that another syntax be allowed. With MR13 such directives
1689 such as #header, #pragma, etc. may be written as "\#header",
1690 "\#pragma", etc. For escaping pre-processor directives inside
1691 a #header use something like the following:
1692
1693 \#header
1694 <<
1695 \#include <stdio.h>
1696 >>
1697
1698 #157. (Fixed in MR13) empty error sets for rules with infinite recursion
1699
1700 When the first set for a rule cannot be computed due to infinite
1701 left recursion and it is the only alternative for a block then
1702 the error set for the block would be empty. This would result
1703 in a fatal error.
1704
1705 Reported by Darin Creason (creason@genedax.com)
1706
1707 #156. (Changed in MR13) DLGLexerBase::getToken() now public
1708
1709 #155. (Changed in MR13) Context behind predicates can suppress
1710
1711 With -mrhoist enabled the context behind a guarded predicate can
1712 be used to suppress other predicates. Consider the following grammar:
1713
1714 r0 : (r1)+;
1715
1716 r1 : rp
1717 | rq
1718 ;
1719 rp : <<p LATEXT(1)>>? B ;
1720 rq : (A)? => <<q LATEXT(1)>>? (A|B);
1721
1722 In earlier versions both predicates "p" and "q" would be hoisted into
1723 rule r0. With MR12c predicate p is suppressed because the context which
1724 follows predicate q includes "B" which can "cover" predicate "p". In
1725 other words, in trying to decide in r0 whether to call r1, it doesn't
1726 really matter whether p is false or true because, either way, there is
1727 a valid choice within r1.
1728
1729 #154. (Changed in MR13) Making hoist suppression explicit using <<nohoist>>
1730
1731 A common error, even among experienced pccts users, is to code
1732 an init-action to inhibit hoisting rather than a leading action.
1733 An init-action does not inhibit hoisting.
1734
1735 This was coded:
1736
1737 rule1 : <<;>> rule2
1738
1739 This is what was meant:
1740
1741 rule1 : <<;>> <<;>> rule2
1742
1743 With MR13, the user can code:
1744
1745 rule1 : <<;>> <<nohoist>> rule2
1746
1747 The following will give an error message:
1748
1749 rule1 : <<nohoist>> rule2
1750
1751 If the <<nohoist>> appears as an init-action rather than a leading
1752 action an error message is issued. The meaning of an init-action
1753 containing "nohoist" is unclear: does it apply to just one
1754 alternative or to all alternatives ?
1755
1756
1757
1758
1759
1760
1761
1762
1763 -------------------------------------------------------
1764 Note: Items #153 to #1 are now in a separate file named
1765 CHANGES_FROM_133_BEFORE_MR13.txt
1766 -------------------------------------------------------