]>
Commit | Line | Data |
---|---|---|
3eb9473e | 1 | =======================================================================\r |
2 | List of Implemented Fixes and Changes for Maintenance Releases of PCCTS\r | |
3 | \r | |
4 | \r | |
5 | For a summary of the most significant changes see CHANGES_SUMMARY.TXT\r | |
6 | \r | |
7 | =======================================================================\r | |
8 | \r | |
9 | DISCLAIMER\r | |
10 | \r | |
11 | The software and these notes are provided "as is". They may include\r | |
12 | typographical or technical errors and their authors disclaims all\r | |
13 | liability of any kind or nature for damages due to error, fault,\r | |
14 | defect, or deficiency regardless of cause. All warranties of any\r | |
15 | kind, either express or implied, including, but not limited to, the\r | |
16 | implied warranties of merchantability and fitness for a particular\r | |
17 | purpose are disclaimed.\r | |
18 | \r | |
19 | \r | |
20 | -------------------------------------------------------\r | |
21 | Note: Items #153 to #1 are now in a separate file named\r | |
22 | CHANGES_FROM_133_BEFORE_MR13.txt\r | |
23 | -------------------------------------------------------\r | |
24 | \r | |
25 | #261. (Changed in MR19) Defer token fetch for C++ mode\r | |
26 | \r | |
27 | Item #216 has been revised to indicate that use of the defer fetch\r | |
28 | option (ZZDEFER_FETCH) requires dlg option -i.\r | |
29 | \r | |
30 | #260. (MR22) Raise default lex buffer size from 8,000 to 32,000 bytes.\r | |
31 | \r | |
32 | ZZLEXBUFSIZE is the size (in bytes) of the buffer used by dlg \r | |
33 | generated lexers. The default value has been raised to 32,000 and\r | |
34 | the value used by antlr, dlg, and sorcerer has also been raised to\r | |
35 | 32,000.\r | |
36 | \r | |
37 | #259. (MR22) Default function arguments in C++ mode.\r | |
38 | \r | |
39 | If a rule is declared:\r | |
40 | \r | |
41 | rr [int i = 0] : ....\r | |
42 | \r | |
43 | then the declaration generated by pccts resembles:\r | |
44 | \r | |
45 | void rr(int i = 0);\r | |
46 | \r | |
47 | however, the definition must omit the default argument:\r | |
48 | \r | |
49 | void rr(int i) {...}\r | |
50 | \r | |
51 | In the past the default value was not omitted. In MR22\r | |
52 | the generated code resembles:\r | |
53 | \r | |
54 | void rr(int i /* = 0 */ ) {...}\r | |
55 | \r | |
56 | Implemented by Volker H. Simonis (simonis@informatik.uni-tuebingen.de)\r | |
57 | \r | |
58 | #258. (MR22) Using a base class for your parser\r | |
59 | \r | |
60 | In item #102 (MR10) the class statement was extended to allow one\r | |
61 | to specify a base class other than ANTLRParser for the generated\r | |
62 | parser. It turned out that this was less than useful because\r | |
63 | the constructor still specified ANTLRParser as the base class.\r | |
64 | \r | |
65 | The class statement now uses the first identifier appearing after\r | |
66 | the ":" as the name of the base class. For example:\r | |
67 | \r | |
68 | class MyParser : public FooParser {\r | |
69 | \r | |
70 | Generates in MyParser.h:\r | |
71 | \r | |
72 | class MyParser : public FooParser {\r | |
73 | \r | |
74 | Generates in MyParser.cpp something that resembles:\r | |
75 | \r | |
76 | MyParser::MyParser(ANTLRTokenBuffer *input) :\r | |
77 | FooParser(input,1,0,0,4)\r | |
78 | {\r | |
79 | token_tbl = _token_tbl;\r | |
80 | traceOptionValueDefault=1; // MR10 turn trace ON\r | |
81 | }\r | |
82 | \r | |
83 | The base class must constructor must have a signature similar to\r | |
84 | that of ANTLRParser.\r | |
85 | \r | |
86 | #257. (MR21a) Removed dlg statement that -i has no effect in C++ mode.\r | |
87 | \r | |
88 | This was incorrect.\r | |
89 | \r | |
90 | #256. (MR21a) Malformed syntax graph causes crash after error message.\r | |
91 | \r | |
92 | In the past, certain kinds of errors in the very first grammar\r | |
93 | element could cause the construction of a malformed graph \r | |
94 | representing the grammar. This would eventually result in a\r | |
95 | fatal internal error. The code has been changed to be more\r | |
96 | resistant to this particular error.\r | |
97 | \r | |
98 | #255. (MR21a) ParserBlackBox(FILE* f) \r | |
99 | \r | |
100 | This constructor set openByBlackBox to the wrong value.\r | |
101 | \r | |
102 | Reported by Kees Bakker (kees_bakker@tasking.nl).\r | |
103 | \r | |
104 | #254. (MR21a) Reporting syntax error at end-of-file\r | |
105 | \r | |
106 | When there was a syntax error at the end-of-file the syntax\r | |
107 | error routine would substitute "<eof>" for the programmer's\r | |
108 | end-of-file symbol. This substitution is now done only when\r | |
109 | the programmer does not define his own end-of-file symbol\r | |
110 | or the symbol begins with the character "@".\r | |
111 | \r | |
112 | Reported by Kees Bakker (kees_bakker@tasking.nl).\r | |
113 | \r | |
114 | #253. (MR21) Generation of block preamble (-preamble and -preamble_first)\r | |
115 | \r | |
116 | The antlr option -preamble causes antlr to insert the code\r | |
117 | BLOCK_PREAMBLE at the start of each rule and block. It does\r | |
118 | not insert code before rules references, token references, or\r | |
119 | actions. By properly defining the macro BLOCK_PREAMBLE the\r | |
120 | user can generate code which is specific to the start of blocks.\r | |
121 | \r | |
122 | The antlr option -preamble_first is similar, but inserts the\r | |
123 | code BLOCK_PREAMBLE_FIRST(PreambleFirst_123) where the symbol\r | |
124 | PreambleFirst_123 is equivalent to the first set defined by\r | |
125 | the #FirstSetSymbol described in Item #248.\r | |
126 | \r | |
127 | I have not investigated how these options interact with guess\r | |
128 | mode (syntactic predicates).\r | |
129 | \r | |
130 | #252. (MR21) Check for null pointer in trace routine\r | |
131 | \r | |
132 | When some trace options are used when the parser is generated\r | |
133 | without the trace enabled, the current rule name may be a\r | |
134 | NULL pointer. A guard was added to check for this in\r | |
135 | restoreState.\r | |
136 | \r | |
137 | Reported by Douglas E. Forester (dougf@projtech.com).\r | |
138 | \r | |
139 | #251. (MR21) Changes to #define zzTRACE_RULES\r | |
140 | \r | |
141 | The macro zzTRACE_RULES was being use to pass information to\r | |
142 | AParser.h. If this preprocessor symbol was not properly\r | |
143 | set the first time AParser.h was #included, the declaration\r | |
144 | of zzTRACEdata would be omitted (it is used by the -gd option).\r | |
145 | Subsequent #includes of AParser.h would be skipped because of \r | |
146 | the #ifdef guard, so the declaration of zzTracePrevRuleName would\r | |
147 | never be made. The result was that proper compilation was very \r | |
148 | order dependent.\r | |
149 | \r | |
150 | The declaration of zzTRACEdata was made unconditional and the\r | |
151 | problem of removing unused declarations will be left to optimizers.\r | |
152 | \r | |
153 | Diagnosed by Douglas E. Forester (dougf@projtech.com).\r | |
154 | \r | |
155 | #250. (MR21) Option for EXPERIMENTAL change to error sets for blocks\r | |
156 | \r | |
157 | The antlr option -mrblkerr turns on an experimental feature\r | |
158 | which is supposed to provide more accurate syntax error messages\r | |
159 | for k=1, ck=1 grammars. When used with k>1 or ck>1 grammars the\r | |
160 | behavior should be no worse than the current behavior.\r | |
161 | \r | |
162 | There is no problem with the matching of elements or the computation\r | |
163 | of prediction expressions in pccts. The task is only one of listing\r | |
164 | the most appropriate tokens in the error message. The error sets used\r | |
165 | in pccts error messages are approximations of the exact error set when\r | |
166 | optional elements in (...)* or (...)+ are involved. While entirely\r | |
167 | correct, the error messages are sometimes not 100% accurate. \r | |
168 | \r | |
169 | There is also a minor philosophical issue. For example, suppose the\r | |
170 | grammar expects the token to be an optional A followed by Z, and it \r | |
171 | is X. X, of course, is neither A nor Z, so an error message is appropriate.\r | |
172 | Is it appropriate to say "Expected Z" ? It is correct, it is accurate,\r | |
173 | but it is not complete. \r | |
174 | \r | |
175 | When k>1 or ck>1 the problem of providing the exactly correct\r | |
176 | list of tokens for the syntax error messages ends up becoming\r | |
177 | equivalent to evaluating the prediction expression for the\r | |
178 | alternatives twice. However, for k=1 ck=1 grammars the prediction\r | |
179 | expression can be computed easily and evaluated cheaply, so I\r | |
180 | decided to try implementing it to satisfy a particular application.\r | |
181 | This application uses the error set in an interactive command language\r | |
182 | to provide prompts which list the alternatives available at that\r | |
183 | point in the parser. The user can then enter additional tokens to\r | |
184 | complete the command line. To do this required more accurate error \r | |
185 | sets then previously provided by pccts.\r | |
186 | \r | |
187 | In some cases the default pccts behavior may lead to more robust error\r | |
188 | recovery or clearer error messages then having the exact set of tokens.\r | |
189 | This is because (a) features like -ge allow the use of symbolic names for\r | |
190 | certain sets of tokens, so having extra tokens may simply obscure things\r | |
191 | and (b) the error set is use to resynchronize the parser, so a good\r | |
192 | choice is sometimes more important than having the exact set.\r | |
193 | \r | |
194 | Consider the following example:\r | |
195 | \r | |
196 | Note: All examples code has been abbreviated\r | |
197 | to the absolute minimum in order to make the\r | |
198 | examples concise.\r | |
199 | \r | |
200 | star1 : (A)* Z;\r | |
201 | \r | |
202 | The generated code resembles:\r | |
203 | \r | |
204 | old new (with -mrblkerr)\r | |
205 | ------------- --------------------\r | |
206 | for (;;) { for (;;) {\r | |
207 | match(A); match(A);\r | |
208 | } }\r | |
209 | match(Z); if (! A and ! Z) then\r | |
210 | FAIL(...{A,Z}...);\r | |
211 | }\r | |
212 | match(Z);\r | |
213 | \r | |
214 | \r | |
215 | With input X\r | |
216 | old message: Found X, expected Z\r | |
217 | new message: Found X, expected A, Z\r | |
218 | \r | |
219 | For the example:\r | |
220 | \r | |
221 | star2 : (A|B)* Z;\r | |
222 | \r | |
223 | old new (with -mrblkerr)\r | |
224 | ------------- --------------------\r | |
225 | for (;;) { for (;;) {\r | |
226 | if (!A and !B) break; if (!A and !B) break;\r | |
227 | if (...) { if (...) {\r | |
228 | <same ...> <same ...>\r | |
229 | } }\r | |
230 | else { else {\r | |
231 | FAIL(...{A,B,Z}...) FAIL(...{A,B}...);\r | |
232 | } }\r | |
233 | } }\r | |
234 | match(B); if (! A and ! B and !Z) then\r | |
235 | FAIL(...{A,B,Z}...);\r | |
236 | }\r | |
237 | match(B);\r | |
238 | \r | |
239 | With input X\r | |
240 | old message: Found X, expected Z\r | |
241 | new message: Found X, expected A, B, Z\r | |
242 | With input A X\r | |
243 | old message: Found X, expected Z\r | |
244 | new message: Found X, expected A, B, Z\r | |
245 | \r | |
246 | This includes the choice of looping back to the\r | |
247 | star block.\r | |
248 | \r | |
249 | The code for plus blocks:\r | |
250 | \r | |
251 | plus1 : (A)+ Z;\r | |
252 | \r | |
253 | The generated code resembles:\r | |
254 | \r | |
255 | old new (with -mrblkerr)\r | |
256 | ------------- --------------------\r | |
257 | do { do {\r | |
258 | match(A); match(A);\r | |
259 | } while (A) } while (A)\r | |
260 | match(Z); if (! A and ! Z) then\r | |
261 | FAIL(...{A,Z}...);\r | |
262 | }\r | |
263 | match(Z);\r | |
264 | \r | |
265 | With input A X\r | |
266 | old message: Found X, expected Z\r | |
267 | new message: Found X, expected A, Z\r | |
268 | \r | |
269 | This includes the choice of looping back to the\r | |
270 | plus block.\r | |
271 | \r | |
272 | For the example:\r | |
273 | \r | |
274 | plus2 : (A|B)+ Z;\r | |
275 | \r | |
276 | old new (with -mrblkerr)\r | |
277 | ------------- --------------------\r | |
278 | do { do {\r | |
279 | if (A) { <same>\r | |
280 | match(A); <same>\r | |
281 | } else if (B) { <same>\r | |
282 | match(B); <same>\r | |
283 | } else { <same>\r | |
284 | if (cnt > 1) break; <same>\r | |
285 | FAIL(...{A,B,Z}...) FAIL(...{A,B}...);\r | |
286 | } }\r | |
287 | cnt++; <same>\r | |
288 | } }\r | |
289 | \r | |
290 | match(Z); if (! A and ! B and !Z) then\r | |
291 | FAIL(...{A,B,Z}...);\r | |
292 | }\r | |
293 | match(B);\r | |
294 | \r | |
295 | With input X\r | |
296 | old message: Found X, expected A, B, Z\r | |
297 | new message: Found X, expected A, B\r | |
298 | With input A X\r | |
299 | old message: Found X, expected Z\r | |
300 | new message: Found X, expected A, B, Z\r | |
301 | \r | |
302 | This includes the choice of looping back to the\r | |
303 | star block.\r | |
304 | \r | |
305 | #249. (MR21) Changes for DEC/VMS systems\r | |
306 | \r | |
307 |