]> git.proxmox.com Git - mirror_edk2.git/blame - EdkCompatibilityPkg/Other/Maintained/Tools/Pccts/CHANGES_FROM_133.txt
Updated conversion command line option with unicode support
[mirror_edk2.git] / EdkCompatibilityPkg / Other / Maintained / Tools / Pccts / CHANGES_FROM_133.txt
CommitLineData
3eb9473e 1=======================================================================\r
2List of Implemented Fixes and Changes for Maintenance Releases of PCCTS\r
3\r
4\r
5 For a summary of the most significant changes see CHANGES_SUMMARY.TXT\r
6\r
7=======================================================================\r
8\r
9 DISCLAIMER\r
10\r
11 The software and these notes are provided "as is". They may include\r
12 typographical or technical errors and their authors disclaims all\r
13 liability of any kind or nature for damages due to error, fault,\r
14 defect, or deficiency regardless of cause. All warranties of any\r
15 kind, either express or implied, including, but not limited to, the\r
16 implied warranties of merchantability and fitness for a particular\r
17 purpose are disclaimed.\r
18\r
19\r
20 -------------------------------------------------------\r
21 Note: Items #153 to #1 are now in a separate file named\r
22 CHANGES_FROM_133_BEFORE_MR13.txt\r
23 -------------------------------------------------------\r
24\r
25#261. (Changed in MR19) Defer token fetch for C++ mode\r
26\r
27 Item #216 has been revised to indicate that use of the defer fetch\r
28 option (ZZDEFER_FETCH) requires dlg option -i.\r
29\r
30#260. (MR22) Raise default lex buffer size from 8,000 to 32,000 bytes.\r
31\r
32 ZZLEXBUFSIZE is the size (in bytes) of the buffer used by dlg \r
33 generated lexers. The default value has been raised to 32,000 and\r
34 the value used by antlr, dlg, and sorcerer has also been raised to\r
35 32,000.\r
36\r
37#259. (MR22) Default function arguments in C++ mode.\r
38\r
39 If a rule is declared:\r
40\r
41 rr [int i = 0] : ....\r
42\r
43 then the declaration generated by pccts resembles:\r
44\r
45 void rr(int i = 0);\r
46\r
47 however, the definition must omit the default argument:\r
48\r
49 void rr(int i) {...}\r
50\r
51 In the past the default value was not omitted. In MR22\r
52 the generated code resembles:\r
53\r
54 void rr(int i /* = 0 */ ) {...}\r
55\r
56 Implemented by Volker H. Simonis (simonis@informatik.uni-tuebingen.de)\r
57\r
58#258. (MR22) Using a base class for your parser\r
59\r
60 In item #102 (MR10) the class statement was extended to allow one\r
61 to specify a base class other than ANTLRParser for the generated\r
62 parser. It turned out that this was less than useful because\r
63 the constructor still specified ANTLRParser as the base class.\r
64\r
65 The class statement now uses the first identifier appearing after\r
66 the ":" as the name of the base class. For example:\r
67\r
68 class MyParser : public FooParser {\r
69\r
70 Generates in MyParser.h:\r
71\r
72 class MyParser : public FooParser {\r
73\r
74 Generates in MyParser.cpp something that resembles:\r
75\r
76 MyParser::MyParser(ANTLRTokenBuffer *input) :\r
77 FooParser(input,1,0,0,4)\r
78 {\r
79 token_tbl = _token_tbl;\r
80 traceOptionValueDefault=1; // MR10 turn trace ON\r
81 }\r
82\r
83 The base class must constructor must have a signature similar to\r
84 that of ANTLRParser.\r
85\r
86#257. (MR21a) Removed dlg statement that -i has no effect in C++ mode.\r
87\r
88 This was incorrect.\r
89\r
90#256. (MR21a) Malformed syntax graph causes crash after error message.\r
91\r
92 In the past, certain kinds of errors in the very first grammar\r
93 element could cause the construction of a malformed graph \r
94 representing the grammar. This would eventually result in a\r
95 fatal internal error. The code has been changed to be more\r
96 resistant to this particular error.\r
97\r
98#255. (MR21a) ParserBlackBox(FILE* f) \r
99\r
100 This constructor set openByBlackBox to the wrong value.\r
101\r
102 Reported by Kees Bakker (kees_bakker@tasking.nl).\r
103\r
104#254. (MR21a) Reporting syntax error at end-of-file\r
105\r
106 When there was a syntax error at the end-of-file the syntax\r
107 error routine would substitute "<eof>" for the programmer's\r
108 end-of-file symbol. This substitution is now done only when\r
109 the programmer does not define his own end-of-file symbol\r
110 or the symbol begins with the character "@".\r
111\r
112 Reported by Kees Bakker (kees_bakker@tasking.nl).\r
113\r
114#253. (MR21) Generation of block preamble (-preamble and -preamble_first)\r
115\r
116 The antlr option -preamble causes antlr to insert the code\r
117 BLOCK_PREAMBLE at the start of each rule and block. It does\r
118 not insert code before rules references, token references, or\r
119 actions. By properly defining the macro BLOCK_PREAMBLE the\r
120 user can generate code which is specific to the start of blocks.\r
121\r
122 The antlr option -preamble_first is similar, but inserts the\r
123 code BLOCK_PREAMBLE_FIRST(PreambleFirst_123) where the symbol\r
124 PreambleFirst_123 is equivalent to the first set defined by\r
125 the #FirstSetSymbol described in Item #248.\r
126\r
127 I have not investigated how these options interact with guess\r
128 mode (syntactic predicates).\r
129\r
130#252. (MR21) Check for null pointer in trace routine\r
131\r
132 When some trace options are used when the parser is generated\r
133 without the trace enabled, the current rule name may be a\r
134 NULL pointer. A guard was added to check for this in\r
135 restoreState.\r
136\r
137 Reported by Douglas E. Forester (dougf@projtech.com).\r
138\r
139#251. (MR21) Changes to #define zzTRACE_RULES\r
140\r
141 The macro zzTRACE_RULES was being use to pass information to\r
142 AParser.h. If this preprocessor symbol was not properly\r
143 set the first time AParser.h was #included, the declaration\r
144 of zzTRACEdata would be omitted (it is used by the -gd option).\r
145 Subsequent #includes of AParser.h would be skipped because of \r
146 the #ifdef guard, so the declaration of zzTracePrevRuleName would\r
147 never be made. The result was that proper compilation was very \r
148 order dependent.\r
149\r
150 The declaration of zzTRACEdata was made unconditional and the\r
151 problem of removing unused declarations will be left to optimizers.\r
152 \r
153 Diagnosed by Douglas E. Forester (dougf@projtech.com).\r
154\r
155#250. (MR21) Option for EXPERIMENTAL change to error sets for blocks\r
156\r
157 The antlr option -mrblkerr turns on an experimental feature\r
158 which is supposed to provide more accurate syntax error messages\r
159 for k=1, ck=1 grammars. When used with k>1 or ck>1 grammars the\r
160 behavior should be no worse than the current behavior.\r
161\r
162 There is no problem with the matching of elements or the computation\r
163 of prediction expressions in pccts. The task is only one of listing\r
164 the most appropriate tokens in the error message. The error sets used\r
165 in pccts error messages are approximations of the exact error set when\r
166 optional elements in (...)* or (...)+ are involved. While entirely\r
167 correct, the error messages are sometimes not 100% accurate. \r
168\r
169 There is also a minor philosophical issue. For example, suppose the\r
170 grammar expects the token to be an optional A followed by Z, and it \r
171 is X. X, of course, is neither A nor Z, so an error message is appropriate.\r
172 Is it appropriate to say "Expected Z" ? It is correct, it is accurate,\r
173 but it is not complete. \r
174\r
175 When k>1 or ck>1 the problem of providing the exactly correct\r
176 list of tokens for the syntax error messages ends up becoming\r
177 equivalent to evaluating the prediction expression for the\r
178 alternatives twice. However, for k=1 ck=1 grammars the prediction\r
179 expression can be computed easily and evaluated cheaply, so I\r
180 decided to try implementing it to satisfy a particular application.\r
181 This application uses the error set in an interactive command language\r
182 to provide prompts which list the alternatives available at that\r
183 point in the parser. The user can then enter additional tokens to\r
184 complete the command line. To do this required more accurate error \r
185 sets then previously provided by pccts.\r
186\r
187 In some cases the default pccts behavior may lead to more robust error\r
188 recovery or clearer error messages then having the exact set of tokens.\r
189 This is because (a) features like -ge allow the use of symbolic names for\r
190 certain sets of tokens, so having extra tokens may simply obscure things\r
191 and (b) the error set is use to resynchronize the parser, so a good\r
192 choice is sometimes more important than having the exact set.\r
193\r
194 Consider the following example:\r
195\r
196 Note: All examples code has been abbreviated\r
197 to the absolute minimum in order to make the\r
198 examples concise.\r
199\r
200 star1 : (A)* Z;\r
201\r
202 The generated code resembles:\r
203\r
204 old new (with -mrblkerr)\r
205 ------------- --------------------\r
206 for (;;) { for (;;) {\r
207 match(A); match(A);\r
208 } }\r
209 match(Z); if (! A and ! Z) then\r
210 FAIL(...{A,Z}...);\r
211 }\r
212 match(Z);\r
213\r
214\r
215 With input X\r
216 old message: Found X, expected Z\r
217 new message: Found X, expected A, Z\r
218\r
219 For the example:\r
220\r
221 star2 : (A|B)* Z;\r
222\r
223 old new (with -mrblkerr)\r
224 ------------- --------------------\r
225 for (;;) { for (;;) {\r
226 if (!A and !B) break; if (!A and !B) break;\r
227 if (...) { if (...) {\r
228 <same ...> <same ...>\r
229 } }\r
230 else { else {\r
231 FAIL(...{A,B,Z}...) FAIL(...{A,B}...);\r
232 } }\r
233 } }\r
234 match(B); if (! A and ! B and !Z) then\r
235 FAIL(...{A,B,Z}...);\r
236 }\r
237 match(B);\r
238\r
239 With input X\r
240 old message: Found X, expected Z\r
241 new message: Found X, expected A, B, Z\r
242 With input A X\r
243 old message: Found X, expected Z\r
244 new message: Found X, expected A, B, Z\r
245\r
246 This includes the choice of looping back to the\r
247 star block.\r
248\r
249 The code for plus blocks:\r
250\r
251 plus1 : (A)+ Z;\r
252\r
253 The generated code resembles:\r
254\r
255 old new (with -mrblkerr)\r
256 ------------- --------------------\r
257 do { do {\r
258 match(A); match(A);\r
259 } while (A) } while (A)\r
260 match(Z); if (! A and ! Z) then\r
261 FAIL(...{A,Z}...);\r
262 }\r
263 match(Z);\r
264\r
265 With input A X\r
266 old message: Found X, expected Z\r
267 new message: Found X, expected A, Z\r
268\r
269 This includes the choice of looping back to the\r
270 plus block.\r
271\r
272 For the example:\r
273\r
274 plus2 : (A|B)+ Z;\r
275\r
276 old new (with -mrblkerr)\r
277 ------------- --------------------\r
278 do { do {\r
279 if (A) { <same>\r
280 match(A); <same>\r
281 } else if (B) { <same>\r
282 match(B); <same>\r
283 } else { <same>\r
284 if (cnt > 1) break; <same>\r
285 FAIL(...{A,B,Z}...) FAIL(...{A,B}...);\r
286 } }\r
287 cnt++; <same>\r
288 } }\r
289\r
290 match(Z); if (! A and ! B and !Z) then\r
291 FAIL(...{A,B,Z}...);\r
292 }\r
293 match(B);\r
294\r
295 With input X\r
296 old message: Found X, expected A, B, Z\r
297 new message: Found X, expected A, B\r
298 With input A X\r
299 old message: Found X, expected Z\r
300 new message: Found X, expected A, B, Z\r
301\r
302 This includes the choice of looping back to the\r
303 star block.\r
304 \r
305#249. (MR21) Changes for DEC/VMS systems\r
306\r
307