]>
Commit | Line | Data |
---|---|---|
878ddf1f | 1 | \r |
2 | =======================================================\r | |
3 | Known Problems In PCCTS - Last revised 14 November 1998\r | |
4 | =======================================================\r | |
5 | \r | |
6 | #17. The dlg fix for handling characters up to 255 is incorrect.\r | |
7 | \r | |
8 | See item #207.\r | |
9 | \r | |
10 | Reported by Frank Hartmann.\r | |
11 | \r | |
12 | #16. A note about "&&" predicates (Mike Dimmick)\r | |
13 | \r | |
14 | Mike Dimmick has pointed out a potential pitfall in the use of the\r | |
15 | "&&" style predicate. Consider:\r | |
16 | \r | |
17 | r0: (g)? => <<P>>? r1\r | |
18 | | ...\r | |
19 | ;\r | |
20 | r1: A | B;\r | |
21 | \r | |
22 | If the context guard g is not a subset of the lookahead context for r1\r | |
23 | (in other words g is neither A nor B) then the code may execute r1 \r | |
24 | even when the lookahead context is not satisfied. This is an error\r | |
25 | by the person coding the grammer, and the error should be reported to\r | |
26 | the user, but it isn't. expect. Some examples I've run seem to\r | |
27 | indicate that such an error actually results in the rule becoming\r | |
28 | unreachable.\r | |
29 | \r | |
30 | When g is properly coded the code is correct, the problem is when g\r | |
31 | is not properly coded.\r | |
32 | \r | |
33 | A second problem reported by Mike Dimmick is that the test for a\r | |
34 | failed validation predicate is equivalent to a test on the predicate\r | |
35 | along. In other words, if the "&&" has not been hoisted then it may\r | |
36 | falsely report a validation error.\r | |
37 | \r | |
38 | #15. (Changed in MR23) Warning for LT(i), LATEXT(i) in token match actions\r | |
39 | \r | |
40 | An bug (or at least an oddity) is that a reference to LT(1), LA(1),\r | |
41 | or LATEXT(1) in an action which immediately follows a token match\r | |
42 | in a rule refers to the token matched, not the token which is in\r | |
43 | the lookahead buffer. Consider:\13\r | |
44 | \r | |
45 | r : abc <<action alpha>> D <<action beta>> E;\r | |
46 | \r | |
47 | In this case LT(1) in action alpha will refer to the next token in\r | |
48 | the lookahead buffer ("D"), but LT(1) in action beta will refer to\r | |
49 | the token matched by D - the preceding token.\r | |
50 | \r | |
51 | A warning has been added which warns users about this when an action\r | |
52 | following a token match contains a reference to LT(1), LA(1), or LATEXT(1).\r | |
53 | \r | |
54 | This behavior should be changed, but it appears in too many programs\r | |
55 | now. Another problem, perhaps more significant, is that the obvious\r | |
56 | fix (moving the consume() call to before the action) could change the \r | |
57 | order in which input is requested and output appears in existing programs.\r | |
58 | \r | |
59 | This problem was reported, along with a fix by Benjamin Mandel\r | |
60 | (beny@sd.co.il). However, I felt that changing the behavior was too\r | |
61 | dangerous for existing code.\r | |
62 | \r | |
63 | #14. Parsing bug in dlg\r | |
64 | \r | |
65 | THM: I have been unable to reproduce this problem.\r | |
66 | \r | |
67 | Reported by Rick Howard Mijenix Corporation (rickh@mijenix.com).\r | |
68 | \r | |
69 | The regular expression parser (in rexpr.c) fails while\r | |
70 | trying to parse the following regular expression:\r | |
71 | \r | |
72 | {[a-zA-Z]:}(\\\\[a-zA-Z0-9]*)+\r | |
73 | \r | |
74 | See my comment in the following excerpt from rexpr.c:\r | |
75 | \r | |
76 | /*\r | |
77 | * <regExpr> ::= <andExpr> ( '|' {<andExpr>} )*\r | |
78 | *\r | |
79 | * Return -1 if syntax error\r | |
80 | * Return 0 if none found\r | |
81 | * Return 1 if a regExrp was found\r | |
82 | */\r | |
83 | static\r | |
84 | regExpr(g)\r | |
85 | GraphPtr g;\r | |
86 | {\r | |
87 | Graph g1, g2;\r | |
88 | \r | |
89 | if ( andExpr(&g1) == -1 )\r | |
90 | {\r | |
91 | return -1;\r | |
92 | }\r | |
93 | \r | |
94 | while ( token == '|' )\r | |
95 | {\r | |
96 | int a;\r | |
97 | next();\r | |
98 | a = andExpr(&g2);\r | |
99 | if ( a == -1 ) return -1; /* syntax error below */\r | |
100 | else if ( !a ) return 1; /* empty alternative */\r | |
101 | g1 = BuildNFA_AorB(g1, g2);\r | |
102 | }\r | |
103 | \r | |
104 | if ( token!='\0' ) return -1;\r | |
105 | *****\r | |
106 | ***** It appears to fail here becuause token is 125 - the closing '}'\r | |
107 | ***** If I change it to:\r | |
108 | ***** if ( token!='\0' && token!='}' && token!= ')' ) return -1;\r | |
109 | *****\r | |
110 | ***** It succeeds, but I'm not sure this is the corrrect approach.\r | |
111 | *****\r | |
112 | *g = g1;\r | |
113 | return 1;\r | |
114 | }\r | |
115 | \r | |
116 | #13. dlg reports an invalid range for: [\0x00-\0xff]\r | |
117 | \r | |
118 | Diagnosed by Piotr Eljasiak (eljasiak@no-spam.zt.gdansk.tpsa.pl):\r | |
119 | \r | |
120 | Fixed in MR16.\r | |
121 | \r | |
122 | #12. Strings containing comment actions\r | |
123 | \r | |
124 | Sequences that looked like C style comments appearing in string\r | |
125 | literals are improperly parsed by antlr/dlg.\r | |
126 | \r | |
127 | << fprintf(out," /* obsolete */ ");\r | |
128 | \r | |
129 | For this case use:\r | |
130 | \r | |
131 | << fprintf(out," \/\* obsolete \*\/ ");\r | |
132 | \r | |
133 | Reported by K.J. Cummings (cummings@peritus.com).\r | |
134 | \r | |
135 | #11. User hook for deallocation of variables on guess fail\r | |
136 | \r | |
137 | The mechanism outlined in Item #108 works only for\r | |
138 | heap allocated variables.\r | |
139 | \r | |
140 | #10. Label re-initialization in ( X {y:Y} )*\r | |
141 | \r | |
142 | If a label assignment is optional and appears in a\r | |
143 | (...)* or (...)+ block it will not be reset to NULL\r | |
144 | when it is skipped by a subsequent iteration.\r | |
145 | \r | |
146 | Consider the example:\r | |
147 | \r | |
148 | ( X { y:Y })* Z\r | |
149 | \r | |
150 | with input:\r | |
151 | \r | |
152 | X Y X Z\r | |
153 | \r | |
154 | The first time through the block Y will be matched and\r | |
155 | y will be set to point to the token. On the second\r | |
156 | iteration of the (...)* block there is no match for Y.\r | |
157 | But y will not be reset to NULL, as the user might\r | |
158 | expect, it will contain a reference to the Y that was\r | |
159 | matched in the first iteration.\r | |
160 | \r | |
161 | The work-around is to manually reset y:\r | |
162 | \r | |
163 | ( X << y = NULL; >> { y:Y } )* Z\r | |
164 | \r | |
165 | or\r | |
166 | \r | |
167 | ( X ( y:Y | << y = NULL; >> /* epsilon */ ) )* Z\r | |
168 | \r | |
169 | Reported by Jeff Vincent (JVincent@novell.com).\r | |
170 | \r | |
171 | #9. PCCTAST.h PCCTSAST::setType() is a noop\r | |
172 | \r | |
173 | #8. #tokdefs with ~Token and .\r | |
174 | \r | |
175 | THM: I have been unable to reproduce this problem.\r | |
176 | \r | |
177 | When antlr uses #tokdefs to define tokens the fields of\r | |
178 | #errclass and #tokclass do not get properly defined.\r | |
179 | When it subsequently attempts to take the complement of\r | |
180 | the set of tokens (using ~Token or .) it can refer to\r | |
181 | tokens which don't have names, generating a fatal error.\r | |
182 | \r | |
183 | #7. DLG crashes on some invalid inputs\r | |
184 | \r | |
185 | THM: In MR20 have fixed the most common cases.\r | |
186 | \r | |
187 | The following token defintion will cause DLG to crash.\r | |
188 | \r | |
189 | #token "()"\r | |
190 | \r | |
191 | Reported by Mengue Olivier (dolmen@bigfoot.com).\r | |
192 | \r | |
193 | #6. On MS systems \n\r is treated as two new lines\r | |
194 | \r | |
195 | Fixed.\r | |
196 | \r | |
197 | #5. Token expressions in #tokclass\r | |
198 | \r | |
199 | #errclass does not support TOK1..TOK2 or ~TOK syntax.\r | |
200 | #tokclass does not support ~TOKEN syntax\r | |
201 | \r | |
202 | A workaround for #errclass TOK1..TOK2 is to use a\r | |
203 | #tokclass.\r | |
204 | \r | |
205 | Reported by Dave Watola (dwatola@amtsun.jpl.nasa.gov)\r | |
206 | \r | |
207 | #4. A #tokdef must appear "early" in the grammar file.\r | |
208 | \r | |
209 | The "early" section of the grammar file is the only\r | |
210 | place where the following directives may appear:\r | |
211 | \r | |
212 | #header\r | |
213 | #first\r | |
214 | #tokdefs\r | |
215 | #parser\r | |
216 | \r | |
217 | Any other kind of statement signifiies the end of the\r | |
218 | "early" section.\r | |
219 | \r | |
220 | #3. Use of PURIFY macro for C++ mode\r | |
221 | \r | |
222 | Item #93 of the CHANGES_FROM_1.33 describes the use of\r | |
223 | the PURIFY macro to zero arguments to be passed by\r | |
224 | upward inheritance.\r | |
225 | \r | |
226 | #define PURIFY(r, s) memset((char *) &(r), '\0', (s));\r | |
227 | \r | |
228 | This may not be the right thing to do for C++ objects that\r | |
229 | have constructors. Reported by Bonny Rais (bonny@werple.net.au).\r | |
230 | \r | |
231 | For those cases one should #define PURIFY to be an empty macro\r | |
232 | in the #header or #first actions.\r | |
233 | \r | |
234 | #2. Fixed in 1.33MR10 - See CHANGES_FROM_1.33 Item #80.\r | |
235 | \r | |
236 | #1. The quality of support for systems with 8.3 file names leaves\r | |
237 | much to be desired. Since the kit is distributed using the\r | |
238 | long file names and the make file uses long file names it requires\r | |
239 | some effort to generate. This will probably not be changed due\r | |
240 | to the large number of systems already written using the long\r | |
241 | file names.\r |