Tools/Source/TianoTools/Pccts/KNOWN_PROBLEMS.txt

   1
   2     =======================================================
   3     Known Problems In PCCTS - Last revised 14 November 1998
   4     =======================================================
   5
   6 #17. The dlg fix for handling characters up to 255 is incorrect.
   7
   8     See item #207.
   9
  10     Reported by Frank Hartmann.
  11
  12 #16. A note about "&&" predicates (Mike Dimmick)
  13
  14     Mike Dimmick has pointed out a potential pitfall in the use of the
  15     "&&" style predicate.  Consider:
  16
  17          r0: (g)? => <<P>>?  r1
  18              | ...
  19              ;
  20          r1: A | B;
  21
  22     If the context guard g is not a subset of the lookahead context for r1
  23     (in other words g is neither A nor B) then the code may execute r1
  24     even when the lookahead context is not satisfied.  This is an error
  25     by the person coding the grammer, and the error should be reported to
  26     the user, but it isn't. expect.  Some examples I've run seem to
  27     indicate that such an error actually results in the rule becoming
  28     unreachable.
  29
  30     When g is properly coded the code is correct, the problem is when g
  31     is not properly coded.
  32
  33     A second problem reported by Mike Dimmick is that the test for a
  34     failed validation predicate is equivalent to a test on the predicate
  35     along.  In other words, if the "&&" has not been hoisted then it may
  36     falsely report a validation error.
  37
  38 #15. (Changed in MR23) Warning for LT(i), LATEXT(i) in token match actions
  39
  40     An bug (or at least an oddity) is that a reference to LT(1), LA(1),
  41     or LATEXT(1) in an action which immediately follows a token match
  42     in a rule refers to the token matched, not the token which is in
  43     the lookahead buffer.  Consider:\13
  44
  45         r : abc <<action alpha>> D <<action beta>> E;
  46
  47     In this case LT(1) in action alpha will refer to the next token in
  48     the lookahead buffer ("D"), but LT(1) in action beta will refer to
  49     the token matched by D - the preceding token.
  50
  51     A warning has been added which warns users about this when an action
  52     following a token match contains a reference to LT(1), LA(1), or LATEXT(1).
  53
  54     This behavior should be changed, but it appears in too many programs
  55     now.  Another problem, perhaps more significant, is that the obvious
  56     fix (moving the consume() call to before the action) could change the
  57     order in which input is requested and output appears in existing programs.
  58
  59     This problem was reported, along with a fix by Benjamin Mandel
  60     (beny@sd.co.il).  However, I felt that changing the behavior was too
  61     dangerous for existing code.
  62
  63 #14. Parsing bug in dlg
  64
  65     THM: I have been unable to reproduce this problem.
  66
  67     Reported by Rick Howard Mijenix Corporation (rickh@mijenix.com).
  68
  69     The regular expression parser (in rexpr.c) fails while
  70     trying to parse the following regular expression:
  71
  72             {[a-zA-Z]:}(\\\\[a-zA-Z0-9]*)+
  73
  74     See my comment in the following excerpt from rexpr.c:
  75
  76     /*
  77      * <regExpr>        ::= <andExpr> ( '|' {<andExpr>} )*
  78      *
  79      * Return -1 if syntax error
  80      * Return  0 if none found
  81      * Return  1 if a regExrp was found
  82      */
  83         static
  84         regExpr(g)
  85         GraphPtr g;
  86         {
  87             Graph g1, g2;
  88
  89             if ( andExpr(&g1) == -1 )
  90             {
  91                 return -1;
  92             }
  93
  94             while ( token == '|' )
  95             {
  96                 int a;
  97                 next();
  98                 a = andExpr(&g2);
  99                 if ( a == -1 ) return -1;   /* syntax error below */
 100                 else if ( !a ) return 1;    /* empty alternative */
 101                 g1 = BuildNFA_AorB(g1, g2);
 102             }
 103
 104             if ( token!='\0' ) return -1;
 105         *****
 106         ***** It appears to fail here becuause token is 125 - the closing '}'
 107         ***** If I change it to:
 108         *****    if ( token!='\0' && token!='}' && token!= ')' ) return -1;
 109         *****
 110         ***** It succeeds, but I'm not sure this is the corrrect approach.
 111         *****
 112             *g = g1;
 113             return 1;
 114         }
 115
 116 #13. dlg reports an invalid range for: [\0x00-\0xff]
 117
 118     Diagnosed by Piotr Eljasiak (eljasiak@no-spam.zt.gdansk.tpsa.pl):
 119
 120     Fixed in MR16.
 121
 122 #12. Strings containing comment actions
 123
 124      Sequences that looked like C style comments appearing in string
 125      literals are improperly parsed by antlr/dlg.
 126
 127         << fprintf(out," /* obsolete */ ");
 128
 129      For this case use:
 130
 131         << fprintf(out," \/\* obsolete \*\/ ");
 132
 133      Reported by K.J. Cummings (cummings@peritus.com).
 134
 135 #11. User hook for deallocation of variables on guess fail
 136
 137      The mechanism outlined in Item #108 works only for
 138      heap allocated variables.
 139
 140 #10. Label re-initialization in ( X {y:Y} )*
 141
 142      If a label assignment is optional and appears in a
 143      (...)* or (...)+ block it will not be reset to NULL
 144      when it is skipped by a subsequent iteration.
 145
 146      Consider the example:
 147
 148             ( X { y:Y })* Z
 149
 150      with input:
 151
 152             X Y X Z
 153
 154      The first time through the block Y will be matched and
 155      y will be set to point to the token.  On the second
 156      iteration of the (...)* block there is no match for Y.
 157      But y will not be reset to NULL, as the user might
 158      expect, it will contain a reference to the Y that was
 159      matched in the first iteration.
 160
 161      The work-around is to manually reset y:
 162
 163             ( X << y = NULL; >> { y:Y } )* Z
 164
 165         or
 166
 167             ( X ( y:Y | << y = NULL; >> /* epsilon */ ) )* Z
 168
 169      Reported by Jeff Vincent (JVincent@novell.com).
 170
 171 #9. PCCTAST.h PCCTSAST::setType() is a noop
 172
 173 #8. #tokdefs with ~Token and .
 174
 175     THM: I have been unable to reproduce this problem.
 176
 177     When antlr uses #tokdefs to define tokens the fields of
 178     #errclass and #tokclass do not get properly defined.
 179     When it subsequently attempts to take the complement of
 180     the set of tokens (using ~Token or .) it can refer to
 181     tokens which don't have names, generating a fatal error.
 182
 183 #7. DLG crashes on some invalid inputs
 184
 185     THM:  In MR20 have fixed the most common cases.
 186
 187     The following token defintion will cause DLG to crash.
 188
 189         #token "()"
 190
 191     Reported by  Mengue Olivier (dolmen@bigfoot.com).
 192
 193 #6. On MS systems \n\r is treated as two new lines
 194
 195     Fixed.
 196
 197 #5. Token expressions in #tokclass
 198
 199     #errclass does not support TOK1..TOK2 or ~TOK syntax.
 200     #tokclass does not support ~TOKEN syntax
 201
 202     A workaround for #errclass TOK1..TOK2 is to use a
 203     #tokclass.
 204
 205     Reported by Dave Watola (dwatola@amtsun.jpl.nasa.gov)
 206
 207 #4. A #tokdef must appear "early" in the grammar file.
 208
 209     The "early" section of the grammar file is the only
 210     place where the following directives may appear:
 211
 212         #header
 213         #first
 214         #tokdefs
 215         #parser
 216
 217     Any other kind of statement signifiies the end of the
 218     "early" section.
 219
 220 #3. Use of PURIFY macro for C++ mode
 221
 222     Item #93 of the CHANGES_FROM_1.33 describes the use of
 223     the PURIFY macro to zero arguments to be passed by
 224     upward inheritance.
 225
 226         #define PURIFY(r, s) memset((char *) &(r), '\0', (s));
 227
 228     This may not be the right thing to do for C++ objects that
 229     have constructors.  Reported by Bonny Rais (bonny@werple.net.au).
 230
 231     For those cases one should #define PURIFY to be an empty macro
 232     in the #header or #first actions.
 233
 234 #2. Fixed in 1.33MR10 - See CHANGES_FROM_1.33 Item #80.
 235
 236 #1. The quality of support for systems with 8.3 file names leaves
 237     much to be desired.  Since the kit is distributed using the
 238     long file names and the make file uses long file names it requires
 239     some effort to generate.  This will probably not be changed due
 240     to the large number of systems already written using the long
 241     file names.