]>
Commit | Line | Data |
---|---|---|
878ddf1f | 1 | CHANGES FROM 1.31\r |
2 | \r | |
3 | This file contains the migration of PCCTS from 1.31 in the order that\r | |
4 | changes were made. 1.32b7 is the last beta before full 1.32.\r | |
5 | Terence Parr, Parr Research Corporation 1995.\r | |
6 | \r | |
7 | \r | |
8 | ======================================================================\r | |
9 | 1.32b1\r | |
10 | Added Russell Quong to banner, changed banner for output slightly\r | |
11 | Fixed it so that you have before / after actions for C++ in class def\r | |
12 | Fixed bug in optimizer that made it sometimes forget to set internal\r | |
13 | token pointers. Only showed up when a {...} was in the "wrong spot".\r | |
14 | \r | |
15 | ======================================================================\r | |
16 | 1.32b2\r | |
17 | Added fixes by Dave Seidel for PC compilers in 32 bit mode (config.h\r | |
18 | and set.h).\r | |
19 | \r | |
20 | ======================================================================\r | |
21 | 1.32b3\r | |
22 | Fixed hideous bug in code generator for wildcard and for ~token op.\r | |
23 | \r | |
24 | from Dave Seidel\r | |
25 | \r | |
26 | Added pcnames.bat\r | |
27 | 1. in antlr/main.c: change strcasecmp() to stricmp()\r | |
28 | \r | |
29 | 2. in dlg/output.c: use DLEXER_C instead on "DLexer.C"\r | |
30 | \r | |
31 | 3. in h/PBlackBox.h: use <iostream.h> instead of <stream.h>\r | |
32 | \r | |
33 | ======================================================================\r | |
34 | 1.32b4\r | |
35 | When the -ft option was used, any path prefix screwed up\r | |
36 | the gate on the .h files\r | |
37 | \r | |
38 | Fixed yet another bug due to the optimizer.\r | |
39 | \r | |
40 | The exception handling thing was a bit wacko:\r | |
41 | \r | |
42 | a : ( A B )? A B\r | |
43 | | A C\r | |
44 | ;\r | |
45 | exception ...\r | |
46 | \r | |
47 | caused an exception if "A C" was the input. In other words,\r | |
48 | it found that A C didn't match the (A B)? pred and caused\r | |
49 | an exception rather than trying the next alt. All I did\r | |
50 | was to change the zzmatch_wsig() macros.\r | |
51 | \r | |
52 | Fixed some problems in gen.c relating to the name of token\r | |
53 | class bit sets in the output.\r | |
54 | \r | |
55 | Added the tremendously cool generalized predicate. For the\r | |
56 | moment, I'll give this bried description.\r | |
57 | \r | |
58 | a : <<predicate>>? blah\r | |
59 | | foo\r | |
60 | ;\r | |
61 | \r | |
62 | This implies that (assuming blah and foo are syntactically\r | |
63 | ambiguous) "predicate" indicates the semantic validity of\r | |
64 | applying "blah". If "predicate" is false, "foo" is attempted.\r | |
65 | \r | |
66 | Previously, you had to say:\r | |
67 | \r | |
68 | a : <<LA(1)==ID ? predicate : 1>>? ID\r | |
69 | | ID\r | |
70 | ;\r | |
71 | \r | |
72 | Now, you can simply use "predicate" without the ?: operator\r | |
73 | if you turn on ANTLR command line option: "-prc on". This\r | |
74 | tells ANTLR to compute that all by itself. It computes n\r | |
75 | tokens of lookahead where LT(n) or LATEXT(n) is the farthest\r | |
76 | ahead you look.\r | |
77 | \r | |
78 | If you give a predicate using "-prc on" that is followed\r | |
79 | by a construct that can recognize more than one n-sequence,\r | |
80 | you will get a warning from ANTLR. For example,\r | |
81 | \r | |
82 | a : <<isTypeName(LT(1)->getText())>>? (ID|INT)\r | |
83 | ;\r | |
84 | \r | |
85 | This is wrong because the predicate will be applied to INTs\r | |
86 | as well as ID's. You should use this syntax to make\r | |
87 | the predicate more specific:\r | |
88 | \r | |
89 | a : (ID)? => <<isTypeName(LT(1)->getText())>>? (ID|INT)\r | |
90 | ;\r | |
91 | \r | |
92 | which says "don't apply the predicate unless ID is the\r | |
93 | current lookahead context".\r | |
94 | \r | |
95 | You cannot currently have anything in the "(context)? =>"\r | |
96 | except sequences such as:\r | |
97 | \r | |
98 | ( LPAREN ID | LPAREN SCOPE )? => <<pred>>?\r | |
99 | \r | |
100 | I haven't tested this THAT much, but it does work for the\r | |
101 | C++ grammar.\r | |
102 | \r | |
103 | ======================================================================\r | |
104 | 1.32b5\r | |
105 | \r | |
106 | Added getLine() to the ANTLRTokenBase and DLGBasedToken classes\r | |
107 | left line() for backward compatibility.\r | |
108 | ----\r | |
109 | Removed SORCERER_TRANSFORM from the ast.h stuff.\r | |
110 | -------\r | |
111 | Fixed bug in code gen of ANTLR such that nested syn preds work more\r | |
112 | efficiently now. The ANTLRTokenBuffer was getting very large\r | |
113 | with nested predicates.\r | |
114 | ------\r | |
115 | Memory leak is now gone from ANTLRTokenBuf; all tokens are deleted.\r | |
116 | For backward compatibility reasons, you have to say parser->deleteTokens()\r | |
117 | or mytokenbuffer->deleteTokens() but later it will be the default mode.\r | |
118 | Say this after the parser is constructed. E.g.,\r | |
119 | \r | |
120 | ParserBlackBox<DLGLexer, MyParser, ANTLRToken> p(stdin);\r | |
121 | p.parser()->deleteTokens();\r | |
122 | p.parser()->start_symbol();\r | |
123 | \r | |
124 | \r | |
125 | ==============================\r | |
126 | 1.32b6\r | |
127 | \r | |
128 | Changed so that deleteTokens() will do a delete ((ANTLRTokenBase *))\r | |
129 | on the ptr. This gets the virtual destructor.\r | |
130 | \r | |
131 | Fixed some weird things in the C++ header files (a few return types).\r | |
132 | \r | |
133 | Made the AST routines correspond to the book and SORCERER stuff.\r | |
134 | \r | |
135 | New token stuff: See testcpp/14/test.g\r | |
136 | \r | |
137 | ANTLR accepts a #pragma gc_tokens which says\r | |
138 | [1] Generate label = copy(LT(1)) instead of label=LT(1) for\r | |
139 | all labeled token references.\r | |
140 | [2] User now has to define ANTLRTokenPtr (as a class or a typedef\r | |
141 | to just a pointer) as well as the ANTLRToken class itself.\r | |
142 | See the example.\r | |
143 | \r | |
144 | To delete tokens in token buffer, use deleteTokens() message on parser.\r | |
145 | \r | |
146 | All tokens that fall off the ANTLRTokenBuffer get deleted\r | |
147 | which is what currently happens when deleteTokens() message\r | |
148 | has been sent to token buffer.\r | |
149 | \r | |
150 | We always generate ANTLRTokenPtr instead of 'ANTLRToken *' now.\r | |
151 | Then if no pragma set, ANTLR generates a\r | |
152 | \r | |
153 | class ANTLRToken;\r | |
154 | typedef ANTLRToken *ANTLRTokenPtr;\r | |
155 | \r | |
156 | in each file.\r | |
157 | \r | |
158 | Made a warning for x:rule_ref <<$x>>; still no warning for $i's, however.\r | |
159 | class BB {\r | |
160 | \r | |
161 | a : x:b y:A <<$x\r | |
162 | $y>>\r | |
163 | ;\r | |
164 | \r | |
165 | b : B;\r | |
166 | \r | |
167 | }\r | |
168 | generates\r | |
169 | Antlr parser generator Version 1.32b6 1989-1995\r | |
170 | test.g, line 3: error: There are no token ptrs for rule references: '$x'\r | |
171 | \r | |
172 | ===================\r | |
173 | 1.32b7:\r | |
174 | \r | |
175 | [With respect to token object garbage collection (GC), 1.32b7\r | |
176 | backtracks from 1.32b6, but results in better and less intrusive GC.\r | |
177 | This is the last beta version before full 1.32.]\r | |
178 | \r | |
179 | BIGGEST CHANGES:\r | |
180 | \r | |
181 | o The "#pragma gc_tokens" is no longer used.\r | |
182 | \r | |
183 | o .C files are now .cpp files (hence, makefiles will have to\r | |
184 | be changed; or you can rerun genmk). This is a good move,\r | |
185 | but causes some backward incompatibility problems. You can\r | |
186 | avoid this by changing CPP_FILE_SUFFIX to ".C" in pccts/h/config.h.\r | |
187 | \r | |
188 | o The token object class hierarchy has been flattened to include\r | |
189 | only three classes: ANTLRAbstractToken, ANTLRCommonToken, and\r | |
190 | ANTLRCommonNoRefCountToken. The common token now does garbage\r | |
191 | collection via ref counting.\r | |
192 | \r | |
193 | o "Smart" pointers are now used for garbage collection. That is,\r | |
194 | ANTLRTokenPtr is used instead of "ANTLRToken *".\r | |
195 | \r | |
196 | o The antlr.1 man page has been cleaned up slightly.\r | |
197 | \r | |
198 | o The SUN C++ compiler now complains less about C++ support code.\r | |
199 | \r | |
200 | o Grammars which subclass ANTLRCommonToken must wrap all token\r | |
201 | pointer references in mytoken(token_ptr). This is the only\r | |
202 | serious backward incompatibility. See below.\r | |
203 | \r | |
204 | \r | |
205 | MINOR CHANGES:\r | |
206 | \r | |
207 | --------------------------------------------------------\r | |
208 | 1 deleteTokens()\r | |
209 | \r | |
210 | The deleteTokens() message to the parser or token buffer has been changed\r | |
211 | to one of:\r | |
212 | \r | |
213 | void noGarbageCollectTokens() { inputTokens->noGarbageCollectTokens(); }\r | |
214 | void garbageCollectTokens() { inputTokens->garbageCollectTokens(); }\r | |
215 | \r | |
216 | The token buffer deletes all non-referenced tokens by default now.\r | |
217 | \r | |
218 | --------------------------------------------------------\r | |
219 | 2 makeToken()\r | |
220 | \r | |
221 | The makeToken() message returns a new type. The function should look\r | |
222 | like:\r | |
223 | \r | |
224 | virtual ANTLRAbstractToken *makeToken(ANTLRTokenType tt,\r | |
225 | ANTLRChar *txt,\r | |
226 | int line)\r | |
227 | {\r | |
228 | ANTLRAbstractToken *t = new ANTLRCommonToken(tt,txt);\r | |
229 | t->setLine(line);\r | |
230 | return t;\r | |
231 | }\r | |
232 | \r | |
233 | --------------------------------------------------------\r | |
234 | 3 TokenType\r | |
235 | \r | |
236 | Changed TokenType-> ANTLRTokenType (often forces changes in AST defs due\r | |
237 | to #[] constructor called to AST(tokentype, string)).\r | |
238 | \r | |
239 | --------------------------------------------------------\r | |
240 | 4 AST()\r | |
241 | \r | |
242 | You must define AST(ANTLRTokenPtr t) now in your AST class definition.\r | |
243 | You might also have to include ATokPtr.h above the definition; e.g.,\r | |
244 | if AST is defined in a separate file, such as AST.h, it's a good idea\r | |
245 | to include ATOKPTR_H (ATokPtr.h). For example,\r | |
246 | \r | |
247 | #include ATOKPTR_H\r | |
248 | class AST : public ASTBase {\r | |
249 | protected:\r | |
250 | ANTLRTokenPtr token;\r | |
251 | public:\r | |
252 | AST(ANTLRTokenPtr t) { token = t; }\r | |
253 | void preorder_action() {\r | |
254 | char *s = token->getText();\r | |
255 | printf(" %s", s);\r | |
256 | }\r | |
257 | };\r | |
258 | \r | |
259 | Note the use of smart pointers rather than "ANTLRToken *".\r | |
260 | \r | |
261 | --------------------------------------------------------\r | |
262 | 5 SUN C++\r | |
263 | \r | |
264 | From robertb oakhill.sps.mot.com Bob Bailey. Changed ANTLR C++ output\r | |
265 | to avoid an error in Sun C++ 3.0.1. Made "public" return value\r | |
266 | structs created to hold multiple return values public.\r | |
267 | \r | |
268 | --------------------------------------------------------\r | |
269 | 6 genmk\r | |
270 | \r | |
271 | Fixed genmk so that target List.* is not included anymore. It's\r | |
272 | called SList.* anyway.\r | |
273 | \r | |
274 | --------------------------------------------------------\r | |
275 | 7 \r vs \n\r | |
276 | \r | |
277 | Scott Vorthmann <vorth cmu.edu> fixed antlr.g in ANTLR so that \r\r | |
278 | is allowed as the return character as well as \n.\r | |
279 | \r | |
280 | --------------------------------------------------------\r | |
281 | 8 Exceptions\r | |
282 | \r | |
283 | Bug in exceptions attached to labeled token/tokclass references. Didn't gen\r | |
284 | code for exceptions. This didn't work:\r | |
285 | \r | |
286 | a : "help" x:ID\r | |
287 | ;\r | |
288 | exception[x]\r | |
289 | catch MismatchedToken : <<printf("eh?\n");>>\r | |
290 | \r | |
291 | Now ANTLR generates (which is kinda big, but necessary):\r | |
292 | \r | |
293 | if ( !_match_wsig(ID) ) {\r | |
294 | if ( guessing ) goto fail;\r | |
295 | _signal=MismatchedToken;\r | |
296 | switch ( _signal ) {\r | |
297 | case MismatchedToken :\r | |
298 | printf("eh?\n");\r | |
299 | _signal = NoSignal;\r | |
300 | break;\r | |
301 | default :\r | |
302 | goto _handler;\r | |
303 | }\r | |
304 | }\r | |
305 | \r | |
306 | which implies that you can recover and continue parsing after a missing/bad\r | |
307 | token reference.\r | |
308 | \r | |
309 | --------------------------------------------------------\r | |
310 | 9 genmk\r | |
311 | \r | |
312 | genmk now correctly uses config file for CPP_FILE_SUFFIX stuff.\r | |
313 | \r | |
314 | --------------------------------------------------------\r | |
315 | 10 general cleanup / PURIFY\r | |
316 | \r | |
317 | Anthony Green <green vizbiz.com> suggested a bunch of good general\r | |
318 | clean up things for the code; he also suggested a few things to\r | |
319 | help out the "PURIFY" memory allocation checker.\r | |
320 | \r | |
321 | --------------------------------------------------------\r | |
322 | 11 $-variable references.\r | |
323 | \r | |
324 | Manuel ORNATO indicated that a $-variable outside of a rule caused\r | |
325 | ANTLR to crash. I fixed this.\r | |
326 | \r | |
327 | 12 Tom Moog suggestion\r | |
328 | \r | |
329 | Fail action of semantic predicate needs "{}" envelope. FIXED.\r | |
330 | \r | |
331 | 13 references to LT(1).\r | |
332 | \r | |
333 | I have enclosed all assignments such as:\r | |
334 | \r | |
335 | _t22 = (ANTLRTokenPtr)LT(1);\r | |
336 | \r | |
337 | in "if ( !guessing )" so that during backtracking the reference count\r | |
338 | for token objects is not increased.\r | |
339 | \r | |
340 | \r | |
341 | TOKEN OBJECT GARBAGE COLLECTION\r | |
342 | \r | |
343 | 1 INTRODUCTION\r | |
344 | \r | |
345 | The class ANTLRCommonToken is now garbaged collected through a "smart"\r | |
346 | pointer called ANTLRTokenPtr using reference counting. Any token\r | |
347 | object not referenced by your grammar actions is destroyed by the\r | |
348 | ANTLRTokenBuffer when it must make room for more token objects.\r | |
349 | Referenced tokens are then destroyed in your parser when local\r | |
350 | ANTLRTokenPtr objects are deleted. For example,\r | |
351 | \r | |
352 | a : label:ID ;\r | |
353 | \r | |
354 | would be converted to something like:\r | |
355 | \r | |
356 | void yourclass::a(void)\r | |
357 | {\r | |
358 | zzRULE;\r | |
359 | ANTLRTokenPtr label=NULL; // used to be ANTLRToken *label;\r | |
360 | zzmatch(ID);\r | |
361 | label = (ANTLRTokenPtr)LT(1);\r | |
362 | consume();\r | |
363 | ...\r | |
364 | }\r | |
365 | \r | |
366 | When the "label" object is destroyed (it's just a pointer to your\r | |
367 | input token object LT(1)), it decrements the reference count on the\r | |
368 | object created for the ID. If the count goes to zero, the object\r | |
369 | pointed by label is deleted.\r | |
370 | \r | |
371 | To correctly manage the garbage collection, you should use\r | |
372 | ANTLRTokenPtr instead of "ANTLRToken *". Most ANTLR support code\r | |
373 | (visible to the user) has been modified to use the smart pointers.\r | |
374 | \r | |
375 | ***************************************************************\r | |
376 | Remember that any local objects that you create are not deleted when a\r | |
377 | lonjmp() is executed. Unfortunately, the syntactic predicates (...)?\r | |
378 | use setjmp()/longjmp(). There are some situations when a few tokens\r | |
379 | will "leak".\r | |
380 | ***************************************************************\r | |
381 | \r | |
382 | 2 DETAILS\r | |
383 | \r | |
384 | o The default is to perform token object garbage collection.\r | |
385 | You may use parser->noGarbageCollectTokens() to turn off\r | |
386 | garbage collection.\r | |
387 | \r | |
388 | \r | |
389 | o The type ANTLRTokenPtr is always defined now (automatically).\r | |
390 | If you do not wish to use smart pointers, you will have to\r | |
391 | redefined ANTLRTokenPtr by subclassing, changing the header\r | |
392 | file or changing ANTLR's code generation (easy enough to\r | |
393 | do in gen.c).\r | |
394 | \r | |
395 | o If you don't use ParserBlackBox, the new initialization sequence is:\r | |
396 | \r | |
397 | ANTLRTokenPtr aToken = new ANTLRToken;\r | |
398 | scan.setToken(mytoken(aToken));\r | |
399 | \r | |
400 | where mytoken(aToken) gets an ANTLRToken * from the smart pointer.\r | |
401 | \r | |
402 | o Define C++ preprocessor symbol DBG_REFCOUNTTOKEN to see a bunch of\r | |
403 | debugging stuff for reference counting if you suspect something.\r | |
404 | \r | |
405 | \r | |
406 | 3 WHY DO I HAVE TO TYPECAST ALL MY TOKEN POINTERS NOW??????\r | |
407 | \r | |
408 | If you subclass ANTLRCommonToken and then attempt to refer to one of\r | |
409 | your token members via a token pointer in your grammar actions, the\r | |
410 | C++ compiler will complain that your token object does not have that\r | |
411 | member. For example, if you used to do this\r | |
412 | \r | |
413 | <<\r | |
414 | class ANTLRToken : public ANTLRCommonToken {\r | |
415 | int muck;\r | |
416 | ...\r | |
417 | };\r | |
418 | >>\r | |
419 | \r | |
420 | class Foo {\r | |
421 | a : t:ID << t->muck = ...; >> ;\r | |
422 | }\r | |
423 | \r | |
424 | Now, you must do change the t->muck reference to:\r | |
425 | \r | |
426 | a : t:ID << mytoken(t)->muck = ...; >> ;\r | |
427 | \r | |
428 | in order to downcast 't' to be an "ANTLRToken *" not the\r | |
429 | "ANTLRAbstractToken *" resulting from ANTLRTokenPtr::operator->().\r | |
430 | The macro is defined as:\r | |
431 | \r | |
432 | /*\r | |
433 | * Since you cannot redefine operator->() to return one of the user's\r | |
434 | * token object types, we must down cast. This is a drag. Here's\r | |
435 | * a macro that helps. template: "mytoken(a-smart-ptr)->myfield".\r | |
436 | */\r | |
437 | #define mytoken(tp) ((ANTLRToken *)(tp.operator->()))\r | |
438 | \r | |
439 | You have to use macro mytoken(grammar-label) now because smart\r | |
440 | pointers are not specific to a parser's token objects. In other\r | |
441 | words, the ANTLRTokenPtr class has a pointer to a generic\r | |
442 | ANTLRAbstractToken not your ANTLRToken; the ANTLR support code must\r | |
443 | use smart pointers too, but be able to work with any kind of\r | |
444 | ANTLRToken. Sorry about this, but it's C++'s fault not mine. Some\r | |
445 | nebulous future version of the C++ compilers should obviate the need\r | |
446 | to downcast smart pointers with runtime type checking (and by allowing\r | |
447 | different return type of overridden functions).\r | |
448 | \r | |
449 | A way to have backward compatible code is to shut off the token object\r | |
450 | garbage collection; i.e., use parser->noGarbageCollectTokens() and\r | |
451 | change the definition of ANTLRTokenPtr (that's why you get source code\r | |
452 | <wink>).\r | |
453 | \r | |
454 | \r | |
455 | PARSER EXCEPTION HANDLING\r | |
456 | \r | |
457 | I've noticed some weird stuff with the exception handling. I intend\r | |
458 | to give this top priority for the "book release" of ANTLR.\r | |
459 | \r | |
460 | ==========\r | |
461 | 1.32 Full Release\r | |
462 | \r | |
463 | o Changed Token class hierarchy to be (Thanks to Tom Moog):\r | |
464 | \r | |
465 | ANTLRAbstractToken\r | |
466 | ANTLRRefCountToken\r | |
467 | ANTLRCommonToken\r | |
468 | ANTLRNoRefCountCommonToken\r | |
469 | \r | |
470 | o Added virtual panic() to ANTLRAbstractToken. Made ANTLRParser::panic()\r | |
471 | virtual also.\r | |
472 | \r | |
473 | o Cleaned up the dup() stuff in AST hierarchy to use shallowCopy() to\r | |
474 | make node copies. John Farr at Medtronic suggested this. I.e.,\r | |
475 | if you want to use dup() with either ANTLR or SORCERER or -transform\r | |
476 | mode with SORCERER, you must defined shallowCopy() as:\r | |
477 | \r | |
478 | virtual PCCTS_AST *shallowCopy()\r | |
479 | {\r | |
480 | return new AST;\r | |
481 | p->setDown(NULL);\r | |
482 | p->setRight(NULL);\r | |
483 | return p;\r | |
484 | }\r | |
485 | \r | |
486 | or\r | |
487 | \r | |
488 | virtual PCCTS_AST *shallowCopy()\r | |
489 | {\r | |
490 | return new AST(*this);\r | |
491 | }\r | |
492 | \r | |
493 | if you have defined a copy constructor such as\r | |
494 | \r | |
495 | AST(const AST &t) // shallow copy constructor\r | |
496 | {\r | |
497 | token = t.token;\r | |
498 | iconst = t.iconst;\r | |
499 | setDown(NULL);\r | |
500 | setRight(NULL);\r | |
501 | }\r | |
502 | \r | |
503 | o Added a warning with -CC and -gk are used together. This is broken,\r | |
504 | hence a warning is appropriate.\r | |
505 | \r | |
506 | o Added warning when #-stuff is used w/o -gt option.\r | |
507 | \r | |
508 | o Updated MPW installation.\r | |
509 | \r | |
510 | o "Miller, Philip W." <MILLERPW f1groups.fsd.jhuapl.edu> suggested\r | |
511 | that genmk be use RENAME_OBJ_FLAG RENAME_EXE_FLAG instead of\r | |
512 | hardcoding "-o" in genmk.c.\r | |
513 | \r | |
514 | o made all exit() calls use EXIT_SUCCESS or EXIT_FAILURE.\r | |
515 | \r | |
516 | ===========================================================================\r | |
517 | 1.33\r | |
518 | \r | |
519 | EXIT_FAILURE and EXIT_SUCCESS were not always defined. I had to modify\r | |
520 | a bunch of files to use PCCTS_EXIT_XXX, which forces a new version. Sorry\r | |
521 | about that.\r | |
522 | \r |