[mirror_edk2.git] / EdkCompatibilityPkg / Other / Maintained / Tools / Pccts / CHANGES_FROM_133.txt

=======================================================================\r
List of Implemented Fixes and Changes for Maintenance Releases of PCCTS\r
\r
\r
 For a summary of the most significant changes see CHANGES_SUMMARY.TXT\r
\r
=======================================================================\r
\r
                               DISCLAIMER\r
\r
 The software and these notes are provided "as is".  They may include\r
 typographical or technical errors and their authors disclaims all\r
 liability of any kind or nature for damages due to error, fault,\r
 defect, or deficiency regardless of cause.  All warranties of any\r
 kind, either express or implied, including, but not limited to, the\r
 implied  warranties of merchantability and fitness for a particular\r
 purpose are disclaimed.\r
\r
\r
        -------------------------------------------------------\r
        Note:  Items #153 to #1 are now in a separate file named\r
                CHANGES_FROM_133_BEFORE_MR13.txt\r
        -------------------------------------------------------\r
\r
#261. (Changed in MR19) Defer token fetch for C++ mode\r
\r
    Item #216 has been revised to indicate that use of the defer fetch\r
    option (ZZDEFER_FETCH) requires dlg option -i.\r
\r
#260. (MR22) Raise default lex buffer size from 8,000 to 32,000 bytes.\r
\r
    ZZLEXBUFSIZE is the size (in bytes) of the buffer used by dlg \r
    generated lexers.  The default value has been raised to 32,000 and\r
    the value used by antlr, dlg, and sorcerer has also been raised to\r
    32,000.\r
\r
#259. (MR22) Default function arguments in C++ mode.\r
\r
    If a rule is declared:\r
\r
            rr [int i = 0] : ....\r
\r
    then the declaration generated by pccts resembles:\r
\r
            void rr(int i = 0);\r
\r
    however, the definition must omit the default argument:\r
\r
            void rr(int i) {...}\r
\r
    In the past the default value was not omitted.  In MR22\r
    the generated code resembles:\r
\r
            void rr(int i /* = 0 */ ) {...}\r
\r
    Implemented by Volker H. Simonis (simonis@informatik.uni-tuebingen.de)\r
\r
#258. (MR22)  Using a base class for your parser\r
\r
    In item #102 (MR10) the class statement was extended to allow one\r
    to specify a base class other than ANTLRParser for the generated\r
    parser.  It turned out that this was less than useful because\r
    the constructor still specified ANTLRParser as the base class.\r
\r
    The class statement now uses the first identifier appearing after\r
    the ":" as the name of the base class.  For example:\r
\r
        class MyParser : public FooParser {\r
\r
    Generates in MyParser.h:\r
\r
            class MyParser : public FooParser {\r
\r
    Generates in MyParser.cpp something that resembles:\r
\r
            MyParser::MyParser(ANTLRTokenBuffer *input) :\r
                                         FooParser(input,1,0,0,4)\r
            {\r
            	token_tbl = _token_tbl;\r
            	traceOptionValueDefault=1;		// MR10 turn trace ON\r
            }\r
\r
    The base class must constructor must have a signature similar to\r
    that of ANTLRParser.\r
\r
#257. (MR21a) Removed dlg statement that -i has no effect in C++ mode.\r
\r
    This was incorrect.\r
\r
#256. (MR21a) Malformed syntax graph causes crash after error message.\r
\r
    In the past, certain kinds of errors in the very first grammar\r
    element could cause the construction of a malformed graph \r
    representing the grammar.  This would eventually result in a\r
    fatal internal error.  The code has been changed to be more\r
    resistant to this particular error.\r
\r
#255. (MR21a) ParserBlackBox(FILE* f) \r
\r
    This constructor set openByBlackBox to the wrong value.\r
\r
    Reported by Kees Bakker (kees_bakker@tasking.nl).\r
\r
#254. (MR21a) Reporting syntax error at end-of-file\r
\r
    When there was a syntax error at the end-of-file the syntax\r
    error routine would substitute "<eof>" for the programmer's\r
    end-of-file symbol.  This substitution is now done only when\r
    the programmer does not define his own end-of-file symbol\r
    or the symbol begins with the character "@".\r
\r
    Reported by Kees Bakker (kees_bakker@tasking.nl).\r
\r
#253. (MR21) Generation of block preamble (-preamble and -preamble_first)\r
\r
    The antlr option -preamble causes antlr to insert the code\r
    BLOCK_PREAMBLE at the start of each rule and block.  It does\r
    not insert code before rules references, token references, or\r
    actions.  By properly defining the macro BLOCK_PREAMBLE the\r
    user can generate code which is specific to the start of blocks.\r
\r
    The antlr option -preamble_first is similar, but inserts the\r
    code BLOCK_PREAMBLE_FIRST(PreambleFirst_123) where the symbol\r
    PreambleFirst_123 is equivalent to the first set defined by\r
    the #FirstSetSymbol described in Item #248.\r
\r
    I have not investigated how these options interact with guess\r
    mode (syntactic predicates).\r
\r
#252. (MR21) Check for null pointer in trace routine\r
\r
    When some trace options are used when the parser is generated\r
    without the trace enabled, the current rule name may be a\r
    NULL pointer.  A guard was added to check for this in\r
    restoreState.\r
\r
    Reported by Douglas E. Forester (dougf@projtech.com).\r
\r
#251. (MR21) Changes to #define zzTRACE_RULES\r
\r
    The macro zzTRACE_RULES was being use to pass information to\r
    AParser.h.  If this preprocessor symbol was not properly\r
    set the first time AParser.h was #included, the declaration\r
    of zzTRACEdata would be omitted (it is used by the -gd option).\r
    Subsequent #includes of AParser.h would be skipped because of \r
    the #ifdef guard, so the declaration of zzTracePrevRuleName would\r
    never be made.  The result was that proper compilation was very \r
    order dependent.\r
\r
    The declaration of zzTRACEdata was made unconditional and the\r
    problem of removing unused declarations will be left to optimizers.\r
    \r
    Diagnosed by Douglas E. Forester (dougf@projtech.com).\r
\r
#250. (MR21) Option for EXPERIMENTAL change to error sets for blocks\r
\r
    The antlr option -mrblkerr turns on an experimental feature\r
    which is supposed to provide more accurate syntax error messages\r
    for k=1, ck=1 grammars.  When used with k>1 or ck>1 grammars the\r
    behavior should be no worse than the current behavior.\r
\r
    There is no problem with the matching of elements or the computation\r
    of prediction expressions in pccts.  The task is only one of listing\r
    the most appropriate tokens in the error message.  The error sets used\r
    in pccts error messages are approximations of the exact error set when\r
    optional elements in (...)* or (...)+ are involved.  While entirely\r
    correct, the error messages are sometimes not 100% accurate.  \r
\r
    There is also a minor philosophical issue.  For example, suppose the\r
    grammar expects the token to be an optional A followed by Z, and it \r
    is X.  X, of course, is neither A nor Z, so an error message is appropriate.\r
    Is it appropriate to say "Expected Z" ?  It is correct, it is accurate,\r
    but it is not complete.  \r
\r
    When k>1 or ck>1 the problem of providing the exactly correct\r
    list of tokens for the syntax error messages ends up becoming\r
    equivalent to evaluating the prediction expression for the\r
    alternatives twice. However, for k=1 ck=1 grammars the prediction\r
    expression can be computed easily and evaluated cheaply, so I\r
    decided to try implementing it to satisfy a particular application.\r
    This application uses the error set in an interactive command language\r
    to provide prompts which list the alternatives available at that\r
    point in the parser.  The user can then enter additional tokens to\r
    complete the command line.  To do this required more accurate error \r
    sets then previously provided by pccts.\r
\r
    In some cases the default pccts behavior may lead to more robust error\r
    recovery or clearer error messages then having the exact set of tokens.\r
    This is because (a) features like -ge allow the use of symbolic names for\r
    certain sets of tokens, so having extra tokens may simply obscure things\r
    and (b) the error set is use to resynchronize the parser, so a good\r
    choice is sometimes more important than having the exact set.\r
\r
    Consider the following example:\r
\r
            Note:  All examples code has been abbreviated\r
            to the absolute minimum in order to make the\r
            examples concise.\r
\r
        star1 : (A)* Z;\r
\r
    The generated code resembles:\r
\r
           old                new (with -mrblkerr)\r
        -------------         --------------------\r
        for (;;) {            for (;;) {\r
            match(A);           match(A);\r
        }                     }\r
        match(Z);             if (! A and ! Z) then\r
                                FAIL(...{A,Z}...);\r
                              }\r
                              match(Z);\r
\r
\r
        With input X\r
            old message: Found X, expected Z\r
            new message: Found X, expected A, Z\r
\r
    For the example:\r
\r
        star2 : (A|B)* Z;\r
\r
           old                      new (with -mrblkerr)\r
        -------------               --------------------\r
        for (;;) {                  for (;;) {\r
          if (!A and !B) break;       if (!A and !B) break;\r
          if (...) {                  if (...) {\r
            <same ...>                  <same ...>\r
          }                           }\r
          else {                      else {\r
            FAIL(...{A,B,Z}...)         FAIL(...{A,B}...);\r
          }                           }\r
        }                           }\r
        match(B);                   if (! A and ! B and !Z) then\r
                                        FAIL(...{A,B,Z}...);\r
                                    }\r
                                    match(B);\r
\r
        With input X\r
            old message: Found X, expected Z\r
            new message: Found X, expected A, B, Z\r
        With input A X\r
            old message: Found X, expected Z\r
            new message: Found X, expected A, B, Z\r
\r
            This includes the choice of looping back to the\r
            star block.\r
\r
    The code for plus blocks:\r
\r
        plus1 : (A)+ Z;\r
\r
    The generated code resembles:\r
\r
           old                  new (with -mrblkerr)\r
        -------------           --------------------\r
        do {                    do {\r
          match(A);               match(A);\r
        } while (A)             } while (A)\r
        match(Z);               if (! A and ! Z) then\r
                                  FAIL(...{A,Z}...);\r
                                }\r
                                match(Z);\r
\r
        With input A X\r
            old message: Found X, expected Z\r
            new message: Found X, expected A, Z\r
\r
            This includes the choice of looping back to the\r
            plus block.\r
\r
    For the example:\r
\r
        plus2 : (A|B)+ Z;\r
\r
           old                    new (with -mrblkerr)\r
        -------------             --------------------\r
        do {                        do {\r
          if (A) {                    <same>\r
            match(A);                 <same>\r
          } else if (B) {             <same>\r
            match(B);                 <same>\r
          } else {                    <same>\r
            if (cnt > 1) break;       <same>\r
            FAIL(...{A,B,Z}...)         FAIL(...{A,B}...);\r
          }                           }\r
          cnt++;                      <same>\r
        }                           }\r
\r
        match(Z);                   if (! A and ! B and !Z) then\r
                                        FAIL(...{A,B,Z}...);\r
                                    }\r
                                    match(B);\r
\r
        With input X\r
            old message: Found X, expected A, B, Z\r
            new message: Found X, expected A, B\r
        With input A X\r
            old message: Found X, expected Z\r
            new message: Found X, expected A, B, Z\r
\r
            This includes the choice of looping back to the\r
            star block.\r
    \r
#249. (MR21) Changes for DEC/VMS systems\r
\r
Commit	Line	Data
3eb9473e	1	=======================================================================\r
	2	List of Implemented Fixes and Changes for Maintenance Releases of PCCTS\r
	3	\r
	4	\r
	5	For a summary of the most significant changes see CHANGES_SUMMARY.TXT\r
	6	\r
	7	=======================================================================\r
	8	\r
	9	DISCLAIMER\r
	10	\r
	11	The software and these notes are provided "as is". They may include\r
	12	typographical or technical errors and their authors disclaims all\r
	13	liability of any kind or nature for damages due to error, fault,\r
	14	defect, or deficiency regardless of cause. All warranties of any\r
	15	kind, either express or implied, including, but not limited to, the\r
	16	implied warranties of merchantability and fitness for a particular\r
	17	purpose are disclaimed.\r
	18	\r
	19	\r
	20	-------------------------------------------------------\r
	21	Note: Items #153 to #1 are now in a separate file named\r
	22	CHANGES_FROM_133_BEFORE_MR13.txt\r
	23	-------------------------------------------------------\r
	24	\r
	25	#261. (Changed in MR19) Defer token fetch for C++ mode\r
	26	\r
	27	Item #216 has been revised to indicate that use of the defer fetch\r
	28	option (ZZDEFER_FETCH) requires dlg option -i.\r
	29	\r
	30	#260. (MR22) Raise default lex buffer size from 8,000 to 32,000 bytes.\r
	31	\r
	32	ZZLEXBUFSIZE is the size (in bytes) of the buffer used by dlg \r
	33	generated lexers. The default value has been raised to 32,000 and\r
	34	the value used by antlr, dlg, and sorcerer has also been raised to\r
	35	32,000.\r
	36	\r
	37	#259. (MR22) Default function arguments in C++ mode.\r
	38	\r
	39	If a rule is declared:\r
	40	\r
	41	rr [int i = 0] : ....\r
	42	\r
	43	then the declaration generated by pccts resembles:\r
	44	\r
	45	void rr(int i = 0);\r
	46	\r
	47	however, the definition must omit the default argument:\r
	48	\r
	49	void rr(int i) {...}\r
	50	\r
	51	In the past the default value was not omitted. In MR22\r
	52	the generated code resembles:\r
	53	\r
	54	void rr(int i /* = 0 */ ) {...}\r
	55	\r
	56	Implemented by Volker H. Simonis (simonis@informatik.uni-tuebingen.de)\r
	57	\r
	58	#258. (MR22) Using a base class for your parser\r
	59	\r
	60	In item #102 (MR10) the class statement was extended to allow one\r
	61	to specify a base class other than ANTLRParser for the generated\r
	62	parser. It turned out that this was less than useful because\r
	63	the constructor still specified ANTLRParser as the base class.\r
	64	\r
65	The class statement now uses the first identifier appearing after\r
66	the ":" as the name of the base class. For example:\r
67	\r
68	class MyParser : public FooParser {\r
69	\r
70	Generates in MyParser.h:\r
71	\r
72	class MyParser : public FooParser {\r
73	\r
74	Generates in MyParser.cpp something that resembles:\r
75	\r
76	MyParser::MyParser(ANTLRTokenBuffer *input) :\r
77	FooParser(input,1,0,0,4)\r
78	{\r
79	token_tbl = _token_tbl;\r
80	traceOptionValueDefault=1; // MR10 turn trace ON\r
81	}\r
82	\r
83	The base class must constructor must have a signature similar to\r
84	that of ANTLRParser.\r
85	\r
86	#257. (MR21a) Removed dlg statement that -i has no effect in C++ mode.\r
87	\r
88	This was incorrect.\r
89	\r
90	#256. (MR21a) Malformed syntax graph causes crash after error message.\r
91	\r
92	In the past, certain kinds of errors in the very first grammar\r
93	element could cause the construction of a malformed graph \r
94	representing the grammar. This would eventually result in a\r
95	fatal internal error. The code has been changed to be more\r
96	resistant to this particular error.\r
97	\r
98	#255. (MR21a) ParserBlackBox(FILE* f) \r
99	\r
100	This constructor set openByBlackBox to the wrong value.\r
101	\r
102	Reported by Kees Bakker (kees_bakker@tasking.nl).\r
103	\r
104	#254. (MR21a) Reporting syntax error at end-of-file\r
105	\r
106	When there was a syntax error at the end-of-file the syntax\r
107	error routine would substitute "<eof>" for the programmer's\r
108	end-of-file symbol. This substitution is now done only when\r
109	the programmer does not define his own end-of-file symbol\r
110	or the symbol begins with the character "@".\r
111	\r
112	Reported by Kees Bakker (kees_bakker@tasking.nl).\r
113	\r
114	#253. (MR21) Generation of block preamble (-preamble and -preamble_first)\r
115	\r
116	The antlr option -preamble causes antlr to insert the code\r
117	BLOCK_PREAMBLE at the start of each rule and block. It does\r
118	not insert code before rules references, token references, or\r
119	actions. By properly defining the macro BLOCK_PREAMBLE the\r
120	user can generate code which is specific to the start of blocks.\r
121	\r
122	The antlr option -preamble_first is similar, but inserts the\r
123	code BLOCK_PREAMBLE_FIRST(PreambleFirst_123) where the symbol\r
124	PreambleFirst_123 is equivalent to the first set defined by\r
125	the #FirstSetSymbol described in Item #248.\r
126	\r
127	I have not investigated how these options interact with guess\r
128	mode (syntactic predicates).\r
129	\r
130	#252. (MR21) Check for null pointer in trace routine\r
131	\r
132	When some trace options are used when the parser is generated\r
133	without the trace enabled, the current rule name may be a\r
134	NULL pointer. A guard was added to check for this in\r
135	restoreState.\r
136	\r
137	Reported by Douglas E. Forester (dougf@projtech.com).\r
138	\r
139	#251. (MR21) Changes to #define zzTRACE_RULES\r
140	\r
141	The macro zzTRACE_RULES was being use to pass information to\r
142	AParser.h. If this preprocessor symbol was not properly\r
143	set the first time AParser.h was #included, the declaration\r
144	of zzTRACEdata would be omitted (it is used by the -gd option).\r
145	Subsequent #includes of AParser.h would be skipped because of \r
146	the #ifdef guard, so the declaration of zzTracePrevRuleName would\r
147	never be made. The result was that proper compilation was very \r
148	order dependent.\r
149	\r
150	The declaration of zzTRACEdata was made unconditional and the\r
151	problem of removing unused declarations will be left to optimizers.\r
152	\r
153	Diagnosed by Douglas E. Forester (dougf@projtech.com).\r
154	\r
155	#250. (MR21) Option for EXPERIMENTAL change to error sets for blocks\r
156	\r
157	The antlr option -mrblkerr turns on an experimental feature\r
158	which is supposed to provide more accurate syntax error messages\r
159	for k=1, ck=1 grammars. When used with k>1 or ck>1 grammars the\r
160	behavior should be no worse than the current behavior.\r
161	\r
162	There is no problem with the matching of elements or the computation\r
163	of prediction expressions in pccts. The task is only one of listing\r
164	the most appropriate tokens in the error message. The error sets used\r
165	in pccts error messages are approximations of the exact error set when\r
166	optional elements in (...)* or (...)+ are involved. While entirely\r
167	correct, the error messages are sometimes not 100% accurate. \r
168	\r
169	There is also a minor philosophical issue. For example, suppose the\r
170	grammar expects the token to be an optional A followed by Z, and it \r
171	is X. X, of course, is neither A nor Z, so an error message is appropriate.\r
172	Is it appropriate to say "Expected Z" ? It is correct, it is accurate,\r
173	but it is not complete. \r
174	\r
175	When k>1 or ck>1 the problem of providing the exactly correct\r
176	list of tokens for the syntax error messages ends up becoming\r
177	equivalent to evaluating the prediction expression for the\r
178	alternatives twice. However, for k=1 ck=1 grammars the prediction\r
179	expression can be computed easily and evaluated cheaply, so I\r
180	decided to try implementing it to satisfy a particular application.\r
181	This application uses the error set in an interactive command language\r
182	to provide prompts which list the alternatives available at that\r
183	point in the parser. The user can then enter additional tokens to\r
184	complete the command line. To do this required more accurate error \r
185	sets then previously provided by pccts.\r
186	\r
187	In some cases the default pccts behavior may lead to more robust error\r
188	recovery or clearer error messages then having the exact set of tokens.\r
189	This is because (a) features like -ge allow the use of symbolic names for\r
190	certain sets of tokens, so having extra tokens may simply obscure things\r
191	and (b) the error set is use to resynchronize the parser, so a good\r
192	choice is sometimes more important than having the exact set.\r
193	\r
194	Consider the following example:\r
195	\r
196	Note: All examples code has been abbreviated\r
197	to the absolute minimum in order to make the\r
198	examples concise.\r
199	\r
200	star1 : (A)* Z;\r
201	\r
202	The generated code resembles:\r
203	\r
204	old new (with -mrblkerr)\r
205	------------- --------------------\r
206	for (;;) { for (;;) {\r
207	match(A); match(A);\r
208	} }\r
209	match(Z); if (! A and ! Z) then\r
210	FAIL(...{A,Z}...);\r
211	}\r
212	match(Z);\r
213	\r
214	\r
215	With input X\r
216	old message: Found X, expected Z\r
217	new message: Found X, expected A, Z\r
218	\r
219	For the example:\r
220	\r
221	star2 : (A\|B)* Z;\r
222	\r
223	old new (with -mrblkerr)\r
224	------------- --------------------\r
225	for (;;) { for (;;) {\r
226	if (!A and !B) break; if (!A and !B) break;\r
227	if (...) { if (...) {\r
228	<same ...> <same ...>\r
229	} }\r
230	else { else {\r
231	FAIL(...{A,B,Z}...) FAIL(...{A,B}...);\r
232	} }\r
233	} }\r
234	match(B); if (! A and ! B and !Z) then\r
235	FAIL(...{A,B,Z}...);\r
236	}\r
237	match(B);\r
238	\r
239	With input X\r
240	old message: Found X, expected Z\r
241	new message: Found X, expected A, B, Z\r
242	With input A X\r
243	old message: Found X, expected Z\r
244	new message: Found X, expected A, B, Z\r
245	\r
246	This includes the choice of looping back to the\r
247	star block.\r
248	\r
249	The code for plus blocks:\r
250	\r
251	plus1 : (A)+ Z;\r
252	\r
253	The generated code resembles:\r
254	\r
255	old new (with -mrblkerr)\r
256	------------- --------------------\r
257	do { do {\r
258	match(A); match(A);\r
259	} while (A) } while (A)\r
260	match(Z); if (! A and ! Z) then\r
261	FAIL(...{A,Z}...);\r
262	}\r
263	match(Z);\r
264	\r
265	With input A X\r
266	old message: Found X, expected Z\r
267	new message: Found X, expected A, Z\r
268	\r
269	This includes the choice of looping back to the\r
270	plus block.\r
271	\r
272	For the example:\r
273	\r
274	plus2 : (A\|B)+ Z;\r
275	\r
276	old new (with -mrblkerr)\r
277	------------- --------------------\r
278	do { do {\r
279	if (A) { <same>\r
280	match(A); <same>\r
281	} else if (B) { <same>\r
282	match(B); <same>\r
283	} else { <same>\r
284	if (cnt > 1) break; <same>\r
285	FAIL(...{A,B,Z}...) FAIL(...{A,B}...);\r
286	} }\r
287	cnt++; <same>\r
288	} }\r
289	\r
290	match(Z); if (! A and ! B and !Z) then\r
291	FAIL(...{A,B,Z}...);\r
292	}\r
293	match(B);\r
294	\r
295	With input X\r
296	old message: Found X, expected A, B, Z\r
297	new message: Found X, expected A, B\r
298	With input A X\r
299	old message: Found X, expected Z\r
300	new message: Found X, expected A, B, Z\r
301	\r
302	This includes the choice of looping back to the\r
303	star block.\r
304	\r
305	#249. (MR21) Changes for DEC/VMS systems\r
306	\r
307