+++ /dev/null
-======================================================================\r
-\r
- CHANGES_SUMMARY.TXT\r
-\r
- A QUICK overview of changes from 1.33 in reverse order\r
-\r
- A summary of additions rather than bug fixes and minor code changes.\r
-\r
- Numbers refer to items in CHANGES_FROM_133*.TXT\r
- which may contain additional information.\r
-\r
- DISCLAIMER\r
-\r
- The software and these notes are provided "as is". They may include\r
- typographical or technical errors and their authors disclaims all\r
- liability of any kind or nature for damages due to error, fault,\r
- defect, or deficiency regardless of cause. All warranties of any\r
- kind, either express or implied, including, but not limited to, the\r
- implied warranties of merchantability and fitness for a particular\r
- purpose are disclaimed.\r
-\r
-======================================================================\r
-\r
-#258. You can specify a user-defined base class for your parser\r
-\r
- The base class must constructor must have a signature similar to\r
- that of ANTLRParser.\r
-\r
-#253. Generation of block preamble (-preamble and -preamble_first)\r
-\r
- The antlr option -preamble causes antlr to insert the code\r
- BLOCK_PREAMBLE at the start of each rule and block.\r
-\r
- The antlr option -preamble_first is similar, but inserts the\r
- code BLOCK_PREAMBLE_FIRST(PreambleFirst_123) where the symbol\r
- PreambleFirst_123 is equivalent to the first set defined by\r
- the #FirstSetSymbol described in Item #248.\r
-\r
-#248. Generate symbol for first set of an alternative\r
-\r
- rr : #FirstSetSymbol(rr_FirstSet) ( Foo | Bar ) ;\r
-\r
-#216. Defer token fetch for C++ mode\r
-\r
- When the ANTLRParser class is built with the pre-processor option \r
- ZZDEFER_FETCH defined, the fetch of new tokens by consume() is deferred\r
- until LA(i) or LT(i) is called. \r
-\r
-#215. Use reset() to reset DLGLexerBase\r
-#188. Added pccts/h/DLG_stream_input.h\r
-#180. Added ANTLRParser::getEofToken()\r
-#173. -glms for Microsoft style filenames with -gl\r
-#170. Suppression for predicates with lookahead depth >1\r
-\r
- Consider the following grammar with -ck 2 and the predicate in rule\r
- "a" with depth 2:\r
-\r
- r1 : (ab)* "@"\r
- ;\r
-\r
- ab : a\r
- | b\r
- ;\r
-\r
- a : (A B)? => <<p(LATEXT(2))>>? A B C\r
- ;\r
-\r
- b : A B C\r
- ;\r
-\r
- Normally, the predicate would be hoisted into rule r1 in order to\r
- determine whether to call rule "ab". However it should *not* be\r
- hoisted because, even if p is false, there is a valid alternative\r
- in rule b. With "-mrhoistk on" the predicate will be suppressed.\r
-\r
- If "-info p" command line option is present the following information\r
- will appear in the generated code:\r
-\r
- while ( (LA(1)==A)\r
- #if 0\r
-\r
- Part (or all) of predicate with depth > 1 suppressed by alternative\r
- without predicate\r
-\r
- pred << p(LATEXT(2))>>?\r
- depth=k=2 ("=>" guard) rule a line 8 t1.g\r
- tree context:\r
- (root = A\r
- B\r
- )\r
-\r
- The token sequence which is suppressed: ( A B )\r
- The sequence of references which generate that sequence of tokens:\r
-\r
- 1 to ab r1/1 line 1 t1.g\r
- 2 ab ab/1 line 4 t1.g\r
- 3 to b ab/2 line 5 t1.g\r
- 4 b b/1 line 11 t1.g\r
- 5 #token A b/1 line 11 t1.g\r
- 6 #token B b/1 line 11 t1.g\r
-\r
- #endif\r
-\r
- A slightly more complicated example:\r
-\r
- r1 : (ab)* "@"\r
- ;\r
-\r
- ab : a\r
- | b\r
- ;\r
-\r
- a : (A B)? => <<p(LATEXT(2))>>? (A B | D E)\r
- ;\r
-\r
- b : <<q(LATEXT(2))>>? D E\r
- ;\r
-\r
-\r
- In this case, the sequence (D E) in rule "a" which lies behind\r
- the guard is used to suppress the predicate with context (D E)\r
- in rule b.\r
-\r
- while ( (LA(1)==A || LA(1)==D)\r
- #if 0\r
-\r
- Part (or all) of predicate with depth > 1 suppressed by alternative\r
- without predicate\r
-\r
- pred << q(LATEXT(2))>>?\r
- depth=k=2 rule b line 11 t2.g\r
- tree context:\r
- (root = D\r
- E\r
- )\r
-\r
- The token sequence which is suppressed: ( D E )\r
- The sequence of references which generate that sequence of tokens:\r
-\r
- 1 to ab r1/1 line 1 t2.g\r
- 2 ab ab/1 line 4 t2.g\r
- 3 to a ab/1 line 4 t2.g\r
- 4 a a/1 line 8 t2.g\r
- 5 #token D a/1 line 8 t2.g\r
- 6 #token E a/1 line 8 t2.g\r
-\r
- #endif\r
- &&\r
- #if 0\r
-\r
- pred << p(LATEXT(2))>>?\r
- depth=k=2 ("=>" guard) rule a line 8 t2.g\r
- tree context:\r
- (root = A\r
- B\r
- )\r
-\r
- #endif\r
-\r
- (! ( LA(1)==A && LA(2)==B ) || p(LATEXT(2)) ) {\r
- ab();\r
- ...\r
-\r
-#165. (Changed in MR13) option -newAST\r
-\r
- To create ASTs from an ANTLRTokenPtr antlr usually calls\r
- "new AST(ANTLRTokenPtr)". This option generates a call\r
- to "newAST(ANTLRTokenPtr)" instead. This allows a user\r
- to define a parser member function to create an AST object.\r
-\r
-#161. (Changed in MR13) Switch -gxt inhibits generation of tokens.h\r
-\r
-#158. (Changed in MR13) #header causes problem for pre-processors\r
-\r
- A user who runs the C pre-processor on antlr source suggested\r
- that another syntax be allowed. With MR13 such directives\r
- such as #header, #pragma, etc. may be written as "\#header",\r
- "\#pragma", etc. For escaping pre-processor directives inside\r
- a #header use something like the following:\r
-\r
- \#header\r
- <<\r
- \#include <stdio.h>\r
- >>\r
-\r
-#155. (Changed in MR13) Context behind predicates can suppress\r
-\r
- With -mrhoist enabled the context behind a guarded predicate can\r
- be used to suppress other predicates. Consider the following grammar:\r
-\r
- r0 : (r1)+;\r
-\r
- r1 : rp\r
- | rq\r
- ;\r
- rp : <<p LATEXT(1)>>? B ;\r
- rq : (A)? => <<q LATEXT(1)>>? (A|B);\r
-\r
- In earlier versions both predicates "p" and "q" would be hoisted into\r
- rule r0. With MR12c predicate p is suppressed because the context which\r
- follows predicate q includes "B" which can "cover" predicate "p". In\r
- other words, in trying to decide in r0 whether to call r1, it doesn't\r
- really matter whether p is false or true because, either way, there is\r
- a valid choice within r1.\r
-\r
-#154. (Changed in MR13) Making hoist suppression explicit using <<nohoist>>\r
-\r
- A common error, even among experienced pccts users, is to code\r
- an init-action to inhibit hoisting rather than a leading action.\r
- An init-action does not inhibit hoisting.\r
-\r
- This was coded:\r
-\r
- rule1 : <<;>> rule2\r
-\r
- This is what was meant:\r
-\r
- rule1 : <<;>> <<;>> rule2\r
-\r
- With MR13, the user can code:\r
-\r
- rule1 : <<;>> <<nohoist>> rule2\r
-\r
- The following will give an error message:\r
-\r
- rule1 : <<nohoist>> rule2\r
-\r
- If the <<nohoist>> appears as an init-action rather than a leading\r
- action an error message is issued. The meaning of an init-action\r
- containing "nohoist" is unclear: does it apply to just one\r
- alternative or to all alternatives ?\r
-\r
-#151a. Addition of ANTLRParser::getLexer(), ANTLRTokenStream::getLexer()\r
-\r
- You must manually cast the ANTLRTokenStream to your program's\r
- lexer class. Because the name of the lexer's class is not fixed.\r
- Thus it is impossible to incorporate it into the DLGLexerBase\r
- class.\r
-\r
-#151b.(Changed in MR12) ParserBlackBox member getLexer()\r
-\r
-#150. (Changed in MR12) syntaxErrCount and lexErrCount now public\r
-\r
-#149. (Changed in MR12) antlr option -info o (letter o for orphan)\r
-\r
- If there is more than one rule which is not referenced by any\r
- other rule then all such rules are listed. This is useful for\r
- alerting one to rules which are not used, but which can still\r
- contribute to ambiguity.\r
-\r
-#148. (Changed in MR11) #token names appearing in zztokens,token_tbl\r
-\r
- One can write:\r
-\r
- #token Plus ("+") "\+"\r
- #token RP ("(") "\("\r
- #token COM ("comment begin") "/\*"\r
-\r
- The string in parenthesis will be used in syntax error messages.\r
-\r
-#146. (Changed in MR11) Option -treport for locating "difficult" alts\r
-\r
- It can be difficult to determine which alternatives are causing\r
- pccts to work hard to resolve an ambiguity. In some cases the\r
- ambiguity is successfully resolved after much CPU time so there\r
- is no message at all.\r
-\r
- A rough measure of the amount of work being peformed which is\r
- independent of the CPU speed and system load is the number of\r
- tnodes created. Using "-info t" gives information about the\r
- total number of tnodes created and the peak number of tnodes.\r
-\r
- Tree Nodes: peak 1300k created 1416k lost 0\r
-\r
- It also puts in the generated C or C++ file the number of tnodes\r
- created for a rule (at the end of the rule). However this\r
- information is not sufficient to locate the alternatives within\r
- a rule which are causing the creation of tnodes.\r
-\r
- Using:\r
-\r
- antlr -treport 100000 ....\r
-\r
- causes antlr to list on stdout any alternatives which require the\r
- creation of more than 100,000 tnodes, along with the lookahead sets\r
- for those alternatives.\r
-\r
- The following is a trivial case from the ansi.g grammar which shows\r
- the format of the report. This report might be of more interest\r
- in cases where 1,000,000 tuples were created to resolve the ambiguity.\r
-\r
- -------------------------------------------------------------------------\r
- There were 0 tuples whose ambiguity could not be resolved\r
- by full lookahead\r
- There were 157 tnodes created to resolve ambiguity between:\r
-\r
- Choice 1: statement/2 line 475 file ansi.g\r
- Choice 2: statement/3 line 476 file ansi.g\r
-\r
- Intersection of lookahead[1] sets:\r
-\r
- IDENTIFIER\r
-\r
- Intersection of lookahead[2] sets:\r
-\r
- LPARENTHESIS COLON AMPERSAND MINUS\r
- STAR PLUSPLUS MINUSMINUS ONESCOMPLEMENT\r
- NOT SIZEOF OCTALINT DECIMALINT\r
- HEXADECIMALINT FLOATONE FLOATTWO IDENTIFIER\r
- STRING CHARACTER\r
- -------------------------------------------------------------------------\r
-\r
-#143. (Changed in MR11) Optional ";" at end of #token statement\r
-\r
- Fixes problem of:\r
-\r
- #token X "x"\r
-\r
- <<\r
- parser action\r
- >>\r
-\r
- Being confused with:\r
-\r
- #token X "x" <<lexical action>>\r
-\r
-#142. (Changed in MR11) class BufFileInput subclass of DLGInputStream\r
-\r
- Alexey Demakov (demakov@kazbek.ispras.ru) has supplied class\r
- BufFileInput derived from DLGInputStream which provides a\r
- function lookahead(char *string) to test characters in the\r
- input stream more than one character ahead.\r
- The class is located in pccts/h/BufFileInput.* of the kit.\r
-\r
-#140. #pred to define predicates\r
-\r
- +---------------------------------------------------+\r
- | Note: Assume "-prc on" for this entire discussion |\r
- +---------------------------------------------------+\r
-\r
- A problem with predicates is that each one is regarded as\r
- unique and capable of disambiguating cases where two\r
- alternatives have identical lookahead. For example:\r
-\r
- rule : <<pred(LATEXT(1))>>? A\r
- | <<pred(LATEXT(1))>>? A\r
- ;\r
-\r
- will not cause any error messages or warnings to be issued\r
- by earlier versions of pccts. To compare the text of the\r
- predicates is an incomplete solution.\r
-\r
- In 1.33MR11 I am introducing the #pred statement in order to\r
- solve some problems with predicates. The #pred statement allows\r
- one to give a symbolic name to a "predicate literal" or a\r
- "predicate expression" in order to refer to it in other predicate\r
- expressions or in the rules of the grammar.\r
-\r
- The predicate literal associated with a predicate symbol is C\r
- or C++ code which can be used to test the condition. A\r
- predicate expression defines a predicate symbol in terms of other\r
- predicate symbols using "!", "&&", and "||". A predicate symbol\r
- can be defined in terms of a predicate literal, a predicate\r
- expression, or *both*.\r
-\r
- When a predicate symbol is defined with both a predicate literal\r
- and a predicate expression, the predicate literal is used to generate\r
- code, but the predicate expression is used to check for two\r
- alternatives with identical predicates in both alternatives.\r
-\r
- Here are some examples of #pred statements:\r
-\r
- #pred IsLabel <<isLabel(LATEXT(1))>>?\r
- #pred IsLocalVar <<isLocalVar(LATEXT(1))>>?\r
- #pred IsGlobalVar <<isGlobalVar(LATEXT(1)>>?\r
- #pred IsVar <<isVar(LATEXT(1))>>? IsLocalVar || IsGlobalVar\r
- #pred IsScoped <<isScoped(LATEXT(1))>>? IsLabel || IsLocalVar\r
-\r
- I hope that the use of EBNF notation to describe the syntax of the\r
- #pred statement will not cause problems for my readers (joke).\r
-\r
- predStatement : "#pred"\r
- CapitalizedName\r
- (\r
- "<<predicate_literal>>?"\r
- | "<<predicate_literal>>?" predOrExpr\r
- | predOrExpr\r
- )\r
- ;\r
-\r
- predOrExpr : predAndExpr ( "||" predAndExpr ) * ;\r
-\r
- predAndExpr : predPrimary ( "&&" predPrimary ) * ;\r
-\r
- predPrimary : CapitalizedName\r
- | "!" predPrimary\r
- | "(" predOrExpr ")"\r
- ;\r
-\r
- What is the purpose of this nonsense ?\r
-\r
- To understand how predicate symbols help, you need to realize that\r
- predicate symbols are used in two different ways with two different\r
- goals.\r
-\r
- a. Allow simplification of predicates which have been combined\r
- during predicate hoisting.\r
-\r
- b. Allow recognition of identical predicates which can't disambiguate\r
- alternatives with common lookahead.\r
-\r
- First we will discuss goal (a). Consider the following rule:\r
-\r
- rule0: rule1\r
- | ID\r
- | ...\r
- ;\r
-\r
- rule1: rule2\r
- | rule3\r
- ;\r
-\r
- rule2: <<isX(LATEXT(1))>>? ID ;\r
- rule3: <<!isX(LATEXT(1)>>? ID ;\r
-\r
- When the predicates in rule2 and rule3 are combined by hoisting\r
- to create a prediction expression for rule1 the result is:\r
-\r
- if ( LA(1)==ID\r
- && ( isX(LATEXT(1) || !isX(LATEXT(1) ) ) { rule1(); ...\r
-\r
- This is inefficient, but more importantly, can lead to false\r
- assumptions that the predicate expression distinguishes the rule1\r
- alternative with some other alternative with lookahead ID. In\r
- MR11 one can write:\r
-\r
- #pred IsX <<isX(LATEXT(1))>>?\r
-\r
- ...\r
-\r
- rule2: <<IsX>>? ID ;\r
- rule3: <<!IsX>>? ID ;\r
-\r
- During hoisting MR11 recognizes this as a special case and\r
- eliminates the predicates. The result is a prediction\r
- expression like the following:\r
-\r
- if ( LA(1)==ID ) { rule1(); ...\r
-\r
- Please note that the following cases which appear to be equivalent\r
- *cannot* be simplified by MR11 during hoisting because the hoisting\r
- logic only checks for a "!" in the predicate action, not in the\r
- predicate expression for a predicate symbol.\r
-\r
- *Not* equivalent and is not simplified during hoisting:\r
-\r
- #pred IsX <<isX(LATEXT(1))>>?\r
- #pred NotX <<!isX(LATEXT(1))>>?\r
- ...\r
- rule2: <<IsX>>? ID ;\r
- rule3: <<NotX>>? ID ;\r
-\r
- *Not* equivalent and is not simplified during hoisting:\r
-\r
- #pred IsX <<isX(LATEXT(1))>>?\r
- #pred NotX !IsX\r
- ...\r
- rule2: <<IsX>>? ID ;\r
- rule3: <<NotX>>? ID ;\r
-\r
- Now we will discuss goal (b).\r
-\r
- When antlr discovers that there is a lookahead ambiguity between\r
- two alternatives it attempts to resolve the ambiguity by searching\r
- for predicates in both alternatives. In the past any predicate\r
- would do, even if the same one appeared in both alternatives:\r
-\r
- rule: <<p(LATEXT(1))>>? X\r
- | <<p(LATEXT(1))>>? X\r
- ;\r
-\r
- The #pred statement is a start towards solving this problem.\r
- During ambiguity resolution (*not* predicate hoisting) the\r
- predicates for the two alternatives are expanded and compared.\r
- Consider the following example:\r
-\r
- #pred Upper <<isUpper(LATEXT(1))>>?\r
- #pred Lower <<isLower(LATEXT(1))>>?\r
- #pred Alpha <<isAlpha(LATEXT(1))>>? Upper || Lower\r
-\r
- rule0: rule1\r
- | <<Alpha>>? ID\r
- ;\r
-\r
- rule1:\r
- | rule2\r
- | rule3\r
- ...\r
- ;\r
-\r
- rule2: <<Upper>>? ID;\r
- rule3: <<Lower>>? ID;\r
-\r
- The definition of #pred Alpha expresses:\r
-\r
- a. to test the predicate use the C code "isAlpha(LATEXT(1))"\r
-\r
- b. to analyze the predicate use the information that\r
- Alpha is equivalent to the union of Upper and Lower,\r
-\r
- During ambiguity resolution the definition of Alpha is expanded\r
- into "Upper || Lower" and compared with the predicate in the other\r
- alternative, which is also "Upper || Lower". Because they are\r
- identical MR11 will report a problem.\r
-\r
- -------------------------------------------------------------------------\r
- t10.g, line 5: warning: the predicates used to disambiguate rule rule0\r
- (file t10.g alt 1 line 5 and alt 2 line 6)\r
- are identical when compared without context and may have no\r
- resolving power for some lookahead sequences.\r
- -------------------------------------------------------------------------\r
-\r
- If you use the "-info p" option the output file will contain:\r
-\r
- +----------------------------------------------------------------------+\r
- |#if 0 |\r
- | |\r
- |The following predicates are identical when compared without |\r
- | lookahead context information. For some ambiguous lookahead |\r
- | sequences they may not have any power to resolve the ambiguity. |\r
- | |\r
- |Choice 1: rule0/1 alt 1 line 5 file t10.g |\r
- | |\r
- | The original predicate for choice 1 with available context |\r
- | information: |\r
- | |\r
- | OR expr |\r
- | |\r
- | pred << Upper>>? |\r
- | depth=k=1 rule rule2 line 14 t10.g |\r
- | set context: |\r
- | ID |\r
- | |\r
- | pred << Lower>>? |\r
- | depth=k=1 rule rule3 line 15 t10.g |\r
- | set context: |\r
- | ID |\r
- | |\r
- | The predicate for choice 1 after expansion (but without context |\r
- | information): |\r
- | |\r
- | OR expr |\r
- | |\r
- | pred << isUpper(LATEXT(1))>>? |\r
- | depth=k=1 rule line 1 t10.g |\r
- | |\r
- | pred << isLower(LATEXT(1))>>? |\r
- | depth=k=1 rule line 2 t10.g |\r
- | |\r
- | |\r
- |Choice 2: rule0/2 alt 2 line 6 file t10.g |\r
- | |\r
- | The original predicate for choice 2 with available context |\r
- | information: |\r
- | |\r
- | pred << Alpha>>? |\r
- | depth=k=1 rule rule0 line 6 t10.g |\r
- | set context: |\r
- | ID |\r
- | |\r
- | The predicate for choice 2 after expansion (but without context |\r
- | information): |\r
- | |\r
- | OR expr |\r
- | |\r
- | pred << isUpper(LATEXT(1))>>? |\r
- | depth=k=1 rule line 1 t10.g |\r
- | |\r
- | pred << isLower(LATEXT(1))>>? |\r
- | depth=k=1 rule line 2 t10.g |\r
- | |\r
- | |\r
- |#endif |\r
- +----------------------------------------------------------------------+\r
-\r
- The comparison of the predicates for the two alternatives takes\r
- place without context information, which means that in some cases\r
- the predicates will be considered identical even though they operate\r
- on disjoint lookahead sets. Consider:\r
-\r
- #pred Alpha\r
-\r
- rule1: <<Alpha>>? ID\r
- | <<Alpha>>? Label\r
- ;\r
-\r
- Because the comparison of predicates takes place without context\r
- these will be considered identical. The reason for comparing\r
- without context is that otherwise it would be necessary to re-evaluate\r
- the entire predicate expression for each possible lookahead sequence.\r
- This would require more code to be written and more CPU time during\r
- grammar analysis, and it is not yet clear whether anyone will even make\r
- use of the new #pred facility.\r
-\r
- A temporary workaround might be to use different #pred statements\r
- for predicates you know have different context. This would avoid\r
- extraneous warnings.\r
-\r
- The above example might be termed a "false positive". Comparison\r
- without context will also lead to "false negatives". Consider the\r
- following example:\r
-\r
- #pred Alpha\r
- #pred Beta\r
-\r
- rule1: <<Alpha>>? A\r
- | rule2\r
- ;\r
-\r
- rule2: <<Alpha>>? A\r
- | <<Beta>>? B\r
- ;\r
-\r
- The predicate used for alt 2 of rule1 is (Alpha || Beta). This\r
- appears to be different than the predicate Alpha used for alt1.\r
- However, the context of Beta is B. Thus when the lookahead is A\r
- Beta will have no resolving power and Alpha will be used for both\r
- alternatives. Using the same predicate for both alternatives isn't\r
- very helpful, but this will not be detected with 1.33MR11.\r
-\r
- To properly handle this the predicate expression would have to be\r
- evaluated for each distinct lookahead context.\r
-\r
- To determine whether two predicate expressions are identical is\r
- difficult. The routine may fail to identify identical predicates.\r
-\r
- The #pred feature also compares predicates to see if a choice between\r
- alternatives which is resolved by a predicate which makes the second\r
- choice unreachable. Consider the following example:\r
-\r
- #pred A <<A(LATEXT(1)>>?\r
- #pred B <<B(LATEXT(1)>>?\r
- #pred A_or_B A || B\r
-\r
- r : s\r
- | t\r
- ;\r
- s : <<A_or_B>>? ID\r
- ;\r
- t : <<A>>? ID\r
- ;\r
-\r
- ----------------------------------------------------------------------------\r
- t11.g, line 5: warning: the predicate used to disambiguate the\r
- first choice of rule r\r
- (file t11.g alt 1 line 5 and alt 2 line 6)\r
- appears to "cover" the second predicate when compared without context.\r
- The second predicate may have no resolving power for some lookahead\r
- sequences.\r
- ----------------------------------------------------------------------------\r
-\r
-#132. (Changed in 1.33MR11) Recognition of identical predicates in alts\r
-\r
- Prior to 1.33MR11, there would be no ambiguity warning when the\r
- very same predicate was used to disambiguate both alternatives:\r
-\r
- test: ref B\r
- | ref C\r
- ;\r
-\r
- ref : <<pred(LATEXT(1)>>? A\r
-\r
- In 1.33MR11 this will cause the warning:\r
-\r
- warning: the predicates used to disambiguate rule test\r
- (file v98.g alt 1 line 1 and alt 2 line 2)\r
- are identical and have no resolving power\r
-\r
- ----------------- Note -----------------\r
-\r
- This is different than the following case\r
-\r
- test: <<pred(LATEXT(1))>>? A B\r
- | <<pred(LATEXT(1)>>? A C\r
- ;\r
-\r
- In this case there are two distinct predicates\r
- which have exactly the same text. In the first\r
- example there are two references to the same\r
- predicate. The problem represented by this\r
- grammar will be addressed later.\r
-\r
-\r
-#127. (Changed in 1.33MR11)\r
-\r
- Count Syntax Errors Count DLG Errors\r
- ------------------- ----------------\r
-\r
- C++ mode ANTLRParser:: DLGLexerBase::\r
- syntaxErrCount lexErrCount\r
- C mode zzSyntaxErrCount zzLexErrCount\r
-\r
- The C mode variables are global and initialized to 0.\r
- They are *not* reset to 0 automatically when antlr is\r
- restarted.\r
-\r
- The C++ mode variables are public. They are initialized\r
- to 0 by the constructors. They are *not* reset to 0 by the\r
- ANTLRParser::init() method.\r
-\r
- Suggested by Reinier van den Born (reinier@vnet.ibm.com).\r
-\r
-#126. (Changed in 1.33MR11) Addition of #first <<...>>\r
-\r
- The #first <<...>> inserts the specified text in the output\r
- files before any other #include statements required by pccts.\r
- The only things before the #first text are comments and\r
- a #define ANTLR_VERSION.\r
-\r
- Requested by and Esa Pulkkinen (esap@cs.tut.fi) and Alexin\r
- Zoltan (alexin@inf.u-szeged.hu).\r
-\r
-#124. A Note on the New "&&" Style Guarded Predicates\r
-\r
- I've been asked several times, "What is the difference between\r
- the old "=>" style guard predicates and the new style "&&" guard\r
- predicates, and how do you choose one over the other" ?\r
-\r
- The main difference is that the "=>" does not apply the\r
- predicate if the context guard doesn't match, whereas\r
- the && form always does. What is the significance ?\r
-\r
- If you have a predicate which is not on the "leading edge"\r
- it is cannot be hoisted. Suppose you need a predicate that\r
- looks at LA(2). You must introduce it manually. The\r
- classic example is:\r
-\r
- castExpr :\r
- LP typeName RP\r
- | ....\r
- ;\r
-\r
- typeName : <<isTypeName(LATEXT(1))>>? ID\r
- | STRUCT ID\r
- ;\r
-\r
- The problem is that isTypeName() isn't on the leading edge\r
- of typeName, so it won't be hoisted into castExpr to help\r
- make a decision on which production to choose.\r
-\r
- The *first* attempt to fix it is this:\r
-\r
- castExpr :\r
- <<isTypeName(LATEXT(2))>>?\r
- LP typeName RP\r
- | ....\r
- ;\r
-\r
- Unfortunately, this won't work because it ignores\r
- the problem of STRUCT. The solution is to apply\r
- isTypeName() in castExpr if LA(2) is an ID and\r
- don't apply it when LA(2) is STRUCT:\r
-\r
- castExpr :\r
- (LP ID)? => <<isTypeName(LATEXT(2))>>?\r
- LP typeName RP\r
- | ....\r
- ;\r
-\r
- In conclusion, the "=>" style guarded predicate is\r
- useful when:\r
-\r
- a. the tokens required for the predicate\r
- are not on the leading edge\r
- b. there are alternatives in the expression\r
- selected by the predicate for which the\r
- predicate is inappropriate\r
-\r
- If (b) were false, then one could use a simple\r
- predicate (assuming "-prc on"):\r
-\r
- castExpr :\r
- <<isTypeName(LATEXT(2))>>?\r
- LP typeName RP\r
- | ....\r
- ;\r
-\r
- typeName : <<isTypeName(LATEXT(1))>>? ID\r
- ;\r
-\r
- So, when do you use the "&&" style guarded predicate ?\r
-\r
- The new-style "&&" predicate should always be used with\r
- predicate context. The context guard is in ADDITION to\r
- the automatically computed context. Thus it useful for\r
- predicates which depend on the token type for reasons\r
- other than context.\r
-\r
- The following example is contributed by Reinier van den Born\r
- (reinier@vnet.ibm.com).\r
-\r
- +-------------------------------------------------------------------------+\r
- | This grammar has two ways to call functions: |\r
- | |\r
- | - a "standard" call syntax with parens and comma separated args |\r
- | - a shell command like syntax (no parens and spacing separated args) |\r
- | |\r
- | The former also allows a variable to hold the name of the function, |\r
- | the latter can also be used to call external commands. |\r
- | |\r
- | The grammar (simplified) looks like this: |\r
- | |\r
- | fun_call : ID "(" { expr ("," expr)* } ")" |\r
- | /* ID is function name */ |\r
- | | "@" ID "(" { expr ("," expr)* } ")" |\r
- | /* ID is var containing fun name */ |\r
- | ; |\r
- | |\r
- | command : ID expr* /* ID is function name */ |\r
- | | path expr* /* path is external command name */ |\r
- | ; |\r
- | |\r
- | path : ID /* left out slashes and such */ |\r
- | | "@" ID /* ID is environment var */ |\r
- | ; |\r
- | |\r
- | expr : .... |\r
- | | "(" expr ")"; |\r
- | |\r
- | call : fun_call |\r
- | | command |\r
- | ; |\r
- | |\r
- | Obviously the call is wildly ambiguous. This is more or less how this |\r
- | is to be resolved: |\r
- | |\r
- | A call begins with an ID or an @ followed by an ID. |\r
- | |\r
- | If it is an ID and if it is an ext. command name -> command |\r
- | if followed by a paren -> fun_call |\r
- | otherwise -> command |\r
- | |\r
- | If it is an @ and if the ID is a var name -> fun_call |\r
- | otherwise -> command |\r
- | |\r
- | One can implement these rules quite neatly using && predicates: |\r
- | |\r
- | call : ("@" ID)? && <<isVarName(LT(2))>>? fun_call |\r
- | | (ID)? && <<isExtCmdName>>? command |\r
- | | (ID "(")? fun_call |\r
- | | command |\r
- | ; |\r
- | |\r
- | This can be done better, so it is not an ideal example, but it |\r
- | conveys the principle. |\r
- +-------------------------------------------------------------------------+\r
-\r
-#122. (Changed in 1.33MR11) Member functions to reset DLG in C++ mode\r
-\r
- void DLGFileReset(FILE *f) { input = f; found_eof = 0; }\r
- void DLGStringReset(DLGChar *s) { input = s; p = &input[0]; }\r
-\r
- Supplied by R.A. Nelson (cowboy@VNET.IBM.COM)\r
-\r
-#119. (Changed in 1.33MR11) Ambiguity aid for grammars\r
-\r
- The user can ask for additional information on ambiguities reported\r
- by antlr to stdout. At the moment, only one ambiguity report can\r
- be created in an antlr run.\r
-\r
- This feature is enabled using the "-aa" (Ambiguity Aid) option.\r
-\r
- The following options control the reporting of ambiguities:\r
-\r
- -aa ruleName Selects reporting by name of rule\r
- -aa lineNumber Selects reporting by line number\r
- (file name not compared)\r
-\r
- -aam Selects "multiple" reporting for a token\r
- in the intersection set of the\r
- alternatives.\r
-\r
- For instance, the token ID may appear dozens\r
- of times in various paths as the program\r
- explores the rules which are reachable from\r
- the point of an ambiguity. With option -aam\r
- every possible path the search program\r
- encounters is reported.\r
-\r
- Without -aam only the first encounter is\r
- reported. This may result in incomplete\r
- information, but the information may be\r
- sufficient and much shorter.\r
-\r
- -aad depth Selects the depth of the search.\r
- The default value is 1.\r
-\r
- The number of paths to be searched, and the\r
- size of the report can grow geometrically\r
- with the -ck value if a full search for all\r
- contributions to the source of the ambiguity\r
- is explored.\r
-\r
- The depth represents the number of tokens\r
- in the lookahead set which are matched against\r
- the set of ambiguous tokens. A depth of 1\r
- means that the search stops when a lookahead\r
- sequence of just one token is matched.\r
-\r
- A k=1 ck=6 grammar might generate 5,000 items\r
- in a report if a full depth 6 search is made\r
- with the Ambiguity Aid. The source of the\r
- problem may be in the first token and obscured\r
- by the volume of data - I hesitate to call\r
- it information.\r
-\r
- When the user selects a depth > 1, the search\r
- is first performed at depth=1 for both\r
- alternatives, then depth=2 for both alternatives,\r
- etc.\r
-\r
- Sample output for rule grammar in antlr.g itself:\r
-\r
- +---------------------------------------------------------------------+\r
- | Ambiguity Aid |\r
- | |\r
- | Choice 1: grammar/70 line 632 file a.g |\r
- | Choice 2: grammar/82 line 644 file a.g |\r
- | |\r
- | Intersection of lookahead[1] sets: |\r
- | |\r
- | "\}" "class" "#errclass" "#tokclass" |\r
- | |\r
- | Choice:1 Depth:1 Group:1 ("#errclass") |\r
- | 1 in (...)* block grammar/70 line 632 a.g |\r
- | 2 to error grammar/73 line 635 a.g |\r
- | 3 error error/1 line 894 a.g |\r
- | 4 #token "#errclass" error/2 line 895 a.g |\r
- | |\r
- | Choice:1 Depth:1 Group:2 ("#tokclass") |\r
- | 2 to tclass grammar/74 line 636 a.g |\r
- | 3 tclass tclass/1 line 937 a.g |\r
- | 4 #token "#tokclass" tclass/2 line 938 a.g |\r
- | |\r
- | Choice:1 Depth:1 Group:3 ("class") |\r
- | 2 to class_def grammar/75 line 637 a.g |\r
- | 3 class_def class_def/1 line 669 a.g |\r
- | 4 #token "class" class_def/3 line 671 a.g |\r
- | |\r
- | Choice:1 Depth:1 Group:4 ("\}") |\r
- | 2 #token "\}" grammar/76 line 638 a.g |\r
- | |\r
- | Choice:2 Depth:1 Group:5 ("#errclass") |\r
- | 1 in (...)* block grammar/83 line 645 a.g |\r
- | 2 to error grammar/93 line 655 a.g |\r
- | 3 error error/1 line 894 a.g |\r
- | 4 #token "#errclass" error/2 line 895 a.g |\r
- | |\r
- | Choice:2 Depth:1 Group:6 ("#tokclass") |\r
- | 2 to tclass grammar/94 line 656 a.g |\r
- | 3 tclass tclass/1 line 937 a.g |\r
- | 4 #token "#tokclass" tclass/2 line 938 a.g |\r
- | |\r
- | Choice:2 Depth:1 Group:7 ("class") |\r
- | 2 to class_def grammar/95 line 657 a.g |\r
- | 3 class_def class_def/1 line 669 a.g |\r
- | 4 #token "class" class_def/3 line 671 a.g |\r
- | |\r
- | Choice:2 Depth:1 Group:8 ("\}") |\r
- | 2 #token "\}" grammar/96 line 658 a.g |\r
- +---------------------------------------------------------------------+\r
-\r
- For a linear lookahead set ambiguity (where k=1 or for k>1 but\r
- when all lookahead sets [i] with i<k all have degree one) the\r
- reports appear in the following order:\r
-\r
- for (depth=1 ; depth <= "-aad depth" ; depth++) {\r
- for (alternative=1; alternative <=2 ; alternative++) {\r
- while (matches-are-found) {\r
- group++;\r
- print-report\r
- };\r
- };\r
- };\r
-\r
- For reporting a k-tuple ambiguity, the reports appear in the\r
- following order:\r
-\r
- for (depth=1 ; depth <= "-aad depth" ; depth++) {\r
- while (matches-are-found) {\r
- for (alternative=1; alternative <=2 ; alternative++) {\r
- group++;\r
- print-report\r
- };\r
- };\r
- };\r
-\r
- This is because matches are generated in different ways for\r
- linear lookahead and k-tuples.\r
-\r
-#117. (Changed in 1.33MR10) new EXPERIMENTAL predicate hoisting code\r
-\r
- The hoisting of predicates into rules to create prediction\r
- expressions is a problem in antlr. Consider the following\r
- example (k=1 with -prc on):\r
-\r
- start : (a)* "@" ;\r
- a : b | c ;\r
- b : <<isUpper(LATEXT(1))>>? A ;\r
- c : A ;\r
-\r
- Prior to 1.33MR10 the code generated for "start" would resemble:\r
-\r
- while {\r
- if (LA(1)==A &&\r
- (!LA(1)==A || isUpper())) {\r
- a();\r
- }\r
- };\r
-\r
- This code is wrong because it makes rule "c" unreachable from\r
- "start". The essence of the problem is that antlr fails to\r
- recognize that there can be a valid alternative within "a" even\r
- when the predicate <<isUpper(LATEXT(1))>>? is false.\r
-\r
- In 1.33MR10 with -mrhoist the hoisting of the predicate into\r
- "start" is suppressed because it recognizes that "c" can\r
- cover all the cases where the predicate is false:\r
-\r
- while {\r
- if (LA(1)==A) {\r
- a();\r
- }\r
- };\r
-\r
- With the antlr "-info p" switch the user will receive information\r
- about the predicate suppression in the generated file:\r
-\r
- --------------------------------------------------------------\r
- #if 0\r
-\r
- Hoisting of predicate suppressed by alternative without predicate.\r
- The alt without the predicate includes all cases where\r
- the predicate is false.\r
-\r
- WITH predicate: line 7 v1.g\r
- WITHOUT predicate: line 7 v1.g\r
-\r
- The context set for the predicate:\r
-\r
- A\r
-\r
- The lookahead set for the alt WITHOUT the semantic predicate:\r
-\r
- A\r
-\r
- The predicate:\r
-\r
- pred << isUpper(LATEXT(1))>>?\r
- depth=k=1 rule b line 9 v1.g\r
- set context:\r
- A\r
- tree context: null\r
-\r
- Chain of referenced rules:\r
-\r
- #0 in rule start (line 5 v1.g) to rule a\r
- #1 in rule a (line 7 v1.g)\r
-\r
- #endif\r
- --------------------------------------------------------------\r
-\r
- A predicate can be suppressed by a combination of alternatives\r
- which, taken together, cover a predicate:\r
-\r
- start : (a)* "@" ;\r
-\r
- a : b | ca | cb | cc ;\r
-\r
- b : <<isUpper(LATEXT(1))>>? ( A | B | C ) ;\r
-\r
- ca : A ;\r
- cb : B ;\r
- cc : C ;\r
-\r
- Consider a more complex example in which "c" covers only part of\r
- a predicate:\r
-\r
- start : (a)* "@" ;\r
-\r
- a : b\r
- | c\r
- ;\r
-\r
- b : <<isUpper(LATEXT(1))>>?\r
- ( A\r
- | X\r
- );\r
-\r
- c : A\r
- ;\r
-\r
- Prior to 1.33MR10 the code generated for "start" would resemble:\r
-\r
- while {\r
- if ( (LA(1)==A || LA(1)==X) &&\r
- (! (LA(1)==A || LA(1)==X) || isUpper()) {\r
- a();\r
- }\r
- };\r
-\r
- With 1.33MR10 and -mrhoist the predicate context is restricted to\r
- the non-covered lookahead. The code resembles:\r
-\r
- while {\r
- if ( (LA(1)==A || LA(1)==X) &&\r
- (! (LA(1)==X) || isUpper()) {\r
- a();\r
- }\r
- };\r
-\r
- With the antlr "-info p" switch the user will receive information\r
- about the predicate restriction in the generated file:\r
-\r
- --------------------------------------------------------------\r
- #if 0\r
-\r
- Restricting the context of a predicate because of overlap\r
- in the lookahead set between the alternative with the\r
- semantic predicate and one without\r
- Without this restriction the alternative without the predicate\r
- could not be reached when input matched the context of the\r
- predicate and the predicate was false.\r
-\r
- WITH predicate: line 11 v4.g\r
- WITHOUT predicate: line 12 v4.g\r
-\r
- The original context set for the predicate:\r
-\r
- A X\r
-\r
- The lookahead set for the alt WITHOUT the semantic predicate:\r
-\r
- A\r
-\r
- The intersection of the two sets\r
-\r
- A\r
-\r
- The original predicate:\r
-\r
- pred << isUpper(LATEXT(1))>>?\r
- depth=k=1 rule b line 15 v4.g\r
- set context:\r
- A X\r
- tree context: null\r
-\r
- The new (modified) form of the predicate:\r
-\r
- pred << isUpper(LATEXT(1))>>?\r
- depth=k=1 rule b line 15 v4.g\r
- set context:\r
- X\r
- tree context: null\r
-\r
- #endif\r
- --------------------------------------------------------------\r
-\r
- The bad news about -mrhoist:\r
-\r
- (a) -mrhoist does not analyze predicates with lookahead\r
- depth > 1.\r
-\r
- (b) -mrhoist does not look past a guarded predicate to\r
- find context which might cover other predicates.\r
-\r
- For these cases you might want to use syntactic predicates.\r
- When a semantic predicate fails during guess mode the guess\r
- fails and the next alternative is tried.\r
-\r
- Limitation (a) is illustrated by the following example:\r
-\r
- start : (stmt)* EOF ;\r
-\r
- stmt : cast\r
- | expr\r
- ;\r
- cast : <<isTypename(LATEXT(2))>>? LP ID RP ;\r
-\r
- expr : LP ID RP ;\r
-\r
- This is not much different from the first example, except that\r
- it requires two tokens of lookahead context to determine what\r
- to do. This predicate is NOT suppressed because the current version\r
- is unable to handle predicates with depth > 1.\r
-\r
- A predicate can be combined with other predicates during hoisting.\r
- In those cases the depth=1 predicates are still handled. Thus,\r
- in the following example the isUpper() predicate will be suppressed\r
- by line #4 when hoisted from "bizarre" into "start", but will still\r
- be present in "bizarre" in order to predict "stmt".\r
-\r
- start : (bizarre)* EOF ; // #1\r
- // #2\r
- bizarre : stmt // #3\r
- | A // #4\r
- ;\r
-\r
- stmt : cast\r
- | expr\r
- ;\r
-\r
- cast : <<isTypename(LATEXT(2))>>? LP ID RP ;\r
-\r
- expr : LP ID RP ;\r
- | <<isUpper(LATEXT(1))>>? A\r
-\r
- Limitation (b) is illustrated by the following example of a\r
- context guarded predicate:\r
-\r
- rule : (A)? <<p>>? // #1\r
- (A // #2\r
- |B // #3\r
- ) // #4\r
- | <<q>> B // #5\r
- ;\r
-\r
- Recall that this means that when the lookahead is NOT A then\r
- the predicate "p" is ignored and it attempts to match "A|B".\r
- Ideally, the "B" at line #3 should suppress predicate "q".\r
- However, the current version does not attempt to look past\r
- the guard predicate to find context which might suppress other\r
- predicates.\r
-\r
- In some cases -mrhoist will lead to the reporting of ambiguities\r
- which were not visible before:\r
-\r
- start : (a)* "@";\r
- a : bc | d;\r
- bc : b | c ;\r
-\r
- b : <<isUpper(LATEXT(1))>>? A;\r
- c : A ;\r
-\r
- d : A ;\r
-\r
- In this case there is a true ambiguity in "a" between "bc" and "d"\r
- which can both match "A". Without -mrhoist the predicate in "b"\r
- is hoisted into "a" and there is no ambiguity reported. However,\r
- with -mrhoist, the predicate in "b" is suppressed by "c" (as it\r
- should be) making the ambiguity in "a" apparent.\r
-\r
- The motivations for these changes were hoisting problems reported\r
- by Reinier van den Born (reinier@vnet.ibm.com) and several others.\r
-\r
-#113. (Changed in 1.33MR10) new context guarded pred: (g)? && <<p>>? expr\r
-\r
- The existing context guarded predicate:\r
-\r
- rule : (guard)? => <<p>>? expr\r
- | next_alternative\r
- ;\r
-\r
- generates code which resembles:\r
-\r
- if (lookahead(expr) && (!guard || pred)) {\r
- expr()\r
- } else ....\r
-\r
- This is not suitable for some applications because it allows\r
- expr() to be invoked when the predicate is false. This is\r
- intentional because it is meant to mimic automatically computed\r
- predicate context.\r
-\r
- The new context guarded predicate uses the guard information\r
- differently because it has a different goal. Consider:\r
-\r
- rule : (guard)? && <<p>>? expr\r
- | next_alternative\r
- ;\r
-\r
- The new style of context guarded predicate is equivalent to:\r
-\r
- rule : <<guard==true && pred>>? expr\r
- | next_alternative\r
- ;\r
-\r
- It generates code which resembles:\r
-\r
- if (lookahead(expr) && guard && pred) {\r
- expr();\r
- } else ...\r
-\r
- Both forms of guarded predicates severely restrict the form of\r
- the context guard: it can contain no rule references, no\r
- (...)*, no (...)+, and no {...}. It may contain token and\r
- token class references, and alternation ("|").\r
-\r
- Addition for 1.33MR11: in the token expression all tokens must\r
- be at the same height of the token tree:\r
-\r
- (A ( B | C))? && ... is ok (all height 2)\r
- (A ( B | ))? && ... is not ok (some 1, some 2)\r
- (A B C D | E F G H)? && ... is ok (all height 4)\r
- (A B C D | E )? && ... is not ok (some 4, some 1)\r
-\r
- This restriction is required in order to properly compute the lookahead\r
- set for expressions like:\r
-\r
- rule1 : (A B C)? && <<pred>>? rule2 ;\r
- rule2 : (A|X) (B|Y) (C|Z);\r
-\r
- This addition was suggested by Rienier van den Born (reinier@vnet.ibm.com)\r
-\r
-#109. (Changed in 1.33MR10) improved trace information\r
-\r
- The quality of the trace information provided by the "-gd"\r
- switch has been improved significantly. Here is an example\r
- of the output from a test program. It shows the rule name,\r
- the first token of lookahead, the call depth, and the guess\r
- status:\r
-\r
- exit rule gusxx {"?"} depth 2\r
- enter rule gusxx {"?"} depth 2\r
- enter rule gus1 {"o"} depth 3 guessing\r
- guess done - returning to rule gus1 {"o"} at depth 3\r
- (guess mode continues - an enclosing guess is still active)\r
- guess done - returning to rule gus1 {"Z"} at depth 3\r
- (guess mode continues - an enclosing guess is still active)\r
- exit rule gus1 {"Z"} depth 3 guessing\r
- guess done - returning to rule gusxx {"o"} at depth 2 (guess mode ends)\r
- enter rule gus1 {"o"} depth 3\r
- guess done - returning to rule gus1 {"o"} at depth 3 (guess mode ends)\r
- guess done - returning to rule gus1 {"Z"} at depth 3 (guess mode ends)\r
- exit rule gus1 {"Z"} depth 3\r
- line 1: syntax error at "Z" missing SC\r
- ...\r
-\r
- Rule trace reporting is controlled by the value of the integer\r
- [zz]traceOptionValue: when it is positive tracing is enabled,\r
- otherwise it is disabled. Tracing during guess mode is controlled\r
- by the value of the integer [zz]traceGuessOptionValue. When\r
- it is positive AND [zz]traceOptionValue is positive rule trace\r
- is reported in guess mode.\r
-\r
- The values of [zz]traceOptionValue and [zz]traceGuessOptionValue\r
- can be adjusted by subroutine calls listed below.\r
-\r
- Depending on the presence or absence of the antlr -gd switch\r
- the variable [zz]traceOptionValueDefault is set to 0 or 1. When\r
- the parser is initialized or [zz]traceReset() is called the\r
- value of [zz]traceOptionValueDefault is copied to [zz]traceOptionValue.\r
- The value of [zz]traceGuessOptionValue is always initialzed to 1,\r
- but, as noted earlier, nothing will be reported unless\r
- [zz]traceOptionValue is also positive.\r
-\r
- When the parser state is saved/restored the value of the trace\r
- variables are also saved/restored. If a restore causes a change in\r
- reporting behavior from on to off or vice versa this will be reported.\r
-\r
- When the -gd option is selected, the macro "#define zzTRACE_RULES"\r
- is added to appropriate output files.\r
-\r
- C++ mode\r
- --------\r
- int traceOption(int delta)\r
- int traceGuessOption(int delta)\r
- void traceReset()\r
- int traceOptionValueDefault\r
-\r
- C mode\r
- --------\r
- int zzTraceOption(int delta)\r
- int zzTraceGuessOption(int delta)\r
- void zzTraceReset()\r
- int zzTraceOptionValueDefault\r
-\r
- The argument "delta" is added to the traceOptionValue. To\r
- turn on trace when inside a particular rule one:\r
-\r
- rule : <<traceOption(+1);>>\r
- (\r
- rest-of-rule\r
- )\r
- <<traceOption(-1);>>\r
- ; /* fail clause */ <<traceOption(-1);>>\r
-\r
- One can use the same idea to turn *off* tracing within a\r
- rule by using a delta of (-1).\r
-\r
- An improvement in the rule trace was suggested by Sramji\r
- Ramanathan (ps@kumaran.com).\r
-\r
-#108. A Note on Deallocation of Variables Allocated in Guess Mode\r
-\r
- NOTE\r
- ------------------------------------------------------\r
- This mechanism only works for heap allocated variables\r
- ------------------------------------------------------\r
-\r
- The rewrite of the trace provides the machinery necessary\r
- to properly free variables or undo actions following a\r
- failed guess.\r
-\r
- The macro zzUSER_GUESS_HOOK(guessSeq,zzrv) is expanded\r
- as part of the zzGUESS macro. When a guess is opened\r
- the value of zzrv is 0. When a longjmp() is executed to\r
- undo the guess, the value of zzrv will be 1.\r
-\r
- The macro zzUSER_GUESS_DONE_HOOK(guessSeq) is expanded\r
- as part of the zzGUESS_DONE macro. This is executed\r
- whether the guess succeeds or fails as part of closing\r
- the guess.\r
-\r
- The guessSeq is a sequence number which is assigned to each\r
- guess and is incremented by 1 for each guess which becomes\r
- active. It is needed by the user to associate the start of\r
- a guess with the failure and/or completion (closing) of a\r
- guess.\r
-\r
- Guesses are nested. They must be closed in the reverse\r
- of the order that they are opened.\r
-\r
- In order to free memory used by a variable during a guess\r
- a user must write a routine which can be called to\r
- register the variable along with the current guess sequence\r
- number provided by the zzUSER_GUESS_HOOK macro. If the guess\r
- fails, all variables tagged with the corresponding guess\r
- sequence number should be released. This is ugly, but\r
- it would require a major rewrite of antlr 1.33 to use\r
- some mechanism other than setjmp()/longjmp().\r
-\r
- The order of calls for a *successful* guess would be:\r
-\r
- zzUSER_GUESS_HOOK(guessSeq,0);\r
- zzUSER_GUESS_DONE_HOOK(guessSeq);\r
-\r
- The order of calls for a *failed* guess would be:\r
-\r
- zzUSER_GUESS_HOOK(guessSeq,0);\r
- zzUSER_GUESS_HOOK(guessSeq,1);\r
- zzUSER_GUESS_DONE_HOOK(guessSeq);\r
-\r
- The default definitions of these macros are empty strings.\r
-\r
- Here is an example in C++ mode. The zzUSER_GUESS_HOOK and\r
- zzUSER_GUESS_DONE_HOOK macros and myGuessHook() routine\r
- can be used without change in both C and C++ versions.\r
-\r
- ----------------------------------------------------------------------\r
- <<\r
-\r
- #include "AToken.h"\r
-\r
- typedef ANTLRCommonToken ANTLRToken;\r
-\r
- #include "DLGLexer.h"\r
-\r
- int main() {\r
-\r
- {\r
- DLGFileInput in(stdin);\r
- DLGLexer lexer(&in,2000);\r
- ANTLRTokenBuffer pipe(&lexer,1);\r
- ANTLRCommonToken aToken;\r
- P parser(&pipe);\r
-\r
- lexer.setToken(&aToken);\r
- parser.init();\r
- parser.start();\r
- };\r
-\r
- fclose(stdin);\r
- fclose(stdout);\r
- return 0;\r
- }\r
-\r
- >>\r
-\r
- <<\r
- char *s=NULL;\r
-\r
- #undef zzUSER_GUESS_HOOK\r
- #define zzUSER_GUESS_HOOK(guessSeq,zzrv) myGuessHook(guessSeq,zzrv);\r
- #undef zzUSER_GUESS_DONE_HOOK\r
- #define zzUSER_GUESS_DONE_HOOK(guessSeq) myGuessHook(guessSeq,2);\r
-\r
- void myGuessHook(int guessSeq,int zzrv) {\r
- if (zzrv == 0) {\r
- fprintf(stderr,"User hook: starting guess #%d\n",guessSeq);\r
- } else if (zzrv == 1) {\r
- free (s);\r
- s=NULL;\r
- fprintf(stderr,"User hook: failed guess #%d\n",guessSeq);\r
- } else if (zzrv == 2) {\r
- free (s);\r
- s=NULL;\r
- fprintf(stderr,"User hook: ending guess #%d\n",guessSeq);\r
- };\r
- }\r
-\r
- >>\r
-\r
- #token A "a"\r
- #token "[\t \ \n]" <<skip();>>\r
-\r
- class P {\r
-\r
- start : (top)+\r
- ;\r
-\r
- top : (which) ? <<fprintf(stderr,"%s is a which\n",s); free(s); s=NULL; >>\r
- | other <<fprintf(stderr,"%s is an other\n",s); free(s); s=NULL; >>\r
- ; <<if (s != NULL) free(s); s=NULL; >>\r
-\r
- which : which2\r
- ;\r
-\r
- which2 : which3\r
- ;\r
- which3\r
- : (label)? <<fprintf(stderr,"%s is a label\n",s);>>\r
- | (global)? <<fprintf(stderr,"%s is a global\n",s);>>\r
- | (exclamation)? <<fprintf(stderr,"%s is an exclamation\n",s);>>\r
- ;\r
-\r
- label : <<s=strdup(LT(1)->getText());>> A ":" ;\r
-\r
- global : <<s=strdup(LT(1)->getText());>> A "::" ;\r
-\r
- exclamation : <<s=strdup(LT(1)->getText());>> A "!" ;\r
-\r
- other : <<s=strdup(LT(1)->getText());>> "other" ;\r
-\r
- }\r
- ----------------------------------------------------------------------\r
-\r
- This is a silly example, but illustrates the idea. For the input\r
- "a ::" with tracing enabled the output begins:\r
-\r
- ----------------------------------------------------------------------\r
- enter rule "start" depth 1\r
- enter rule "top" depth 2\r
- User hook: starting guess #1\r
- enter rule "which" depth 3 guessing\r
- enter rule "which2" depth 4 guessing\r
- enter rule "which3" depth 5 guessing\r
- User hook: starting guess #2\r
- enter rule "label" depth 6 guessing\r
- guess failed\r
- User hook: failed guess #2\r
- guess done - returning to rule "which3" at depth 5 (guess mode continues\r
- - an enclosing guess is still active)\r
- User hook: ending guess #2\r
- User hook: starting guess #3\r
- enter rule "global" depth 6 guessing\r
- exit rule "global" depth 6 guessing\r
- guess done - returning to rule "which3" at depth 5 (guess mode continues\r
- - an enclosing guess is still active)\r
- User hook: ending guess #3\r
- enter rule "global" depth 6 guessing\r
- exit rule "global" depth 6 guessing\r
- exit rule "which3" depth 5 guessing\r
- exit rule "which2" depth 4 guessing\r
- exit rule "which" depth 3 guessing\r
- guess done - returning to rule "top" at depth 2 (guess mode ends)\r
- User hook: ending guess #1\r
- enter rule "which" depth 3\r
- .....\r
- ----------------------------------------------------------------------\r
-\r
- Remember:\r
-\r
- (a) Only init-actions are executed during guess mode.\r
- (b) A rule can be invoked multiple times during guess mode.\r
- (c) If the guess succeeds the rule will be called once more\r
- without guess mode so that normal actions will be executed.\r
- This means that the init-action might need to distinguish\r
- between guess mode and non-guess mode using the variable\r
- [zz]guessing.\r
-\r
-#101. (Changed in 1.33MR10) antlr -info command line switch\r
-\r
- -info\r
-\r
- p - extra predicate information in generated file\r
-\r
- t - information about tnode use:\r
- at the end of each rule in generated file\r
- summary on stderr at end of program\r
-\r
- m - monitor progress\r
- prints name of each rule as it is started\r
- flushes output at start of each rule\r
-\r
- f - first/follow set information to stdout\r
-\r
- 0 - no operation (added in 1.33MR11)\r
-\r
- The options may be combined and may appear in any order.\r
- For example:\r
-\r
- antlr -info ptm -CC -gt -mrhoist on mygrammar.g\r
-\r
-#100a. (Changed in 1.33MR10) Predicate tree simplification\r
-\r
- When the same predicates can be referenced in more than one\r
- alternative of a block large predicate trees can be formed.\r
-\r
- The difference that these optimizations make is so dramatic\r
- that I have decided to use it even when -mrhoist is not selected.\r
-\r
- Consider the following grammar:\r
-\r
- start : ( all )* ;\r
-\r
- all : a\r
- | d\r
- | e\r
- | f\r
- ;\r
-\r
- a : c A B\r
- | c A C\r
- ;\r
-\r
- c : <<AAA(LATEXT(2))>>?\r
- ;\r
-\r
- d : <<BBB(LATEXT(2))>>? B C\r
- ;\r
-\r
- e : <<CCC(LATEXT(2))>>? B C\r
- ;\r
-\r
- f : e X Y\r
- ;\r
-\r
- In rule "a" there is a reference to rule "c" in both alternatives.\r
- The length of the predicate AAA is k=2 and it can be followed in\r
- alternative 1 only by (A B) while in alternative 2 it can be\r
- followed only by (A C). Thus they do not have identical context.\r
-\r
- In rule "all" the alternatives which refer to rules "e" and "f" allow\r
- elimination of the duplicate reference to predicate CCC.\r
-\r
- The table below summarized the kind of simplification performed by\r
- 1.33MR10. In the table, X and Y stand for single predicates\r
- (not trees).\r
-\r
- (OR X (OR Y (OR Z))) => (OR X Y Z)\r
- (AND X (AND Y (AND Z))) => (AND X Y Z)\r
-\r
- (OR X (... (OR X Y) ... )) => (OR X (... Y ... ))\r
- (AND X (... (AND X Y) ... )) => (AND X (... Y ... ))\r
- (OR X (... (AND X Y) ... )) => (OR X (... ... ))\r
- (AND X (... (OR X Y) ... )) => (AND X (... ... ))\r
-\r
- (AND X) => X\r
- (OR X) => X\r
-\r
- In a test with a complex grammar for a real application, a predicate\r
- tree with six OR nodes and 12 leaves was reduced to "(OR X Y Z)".\r
-\r
- In 1.33MR10 there is a greater effort to release memory used\r
- by predicates once they are no longer in use.\r
-\r
-#100b. (Changed in 1.33MR10) Suppression of extra predicate tests\r
-\r
- The following optimizations require that -mrhoist be selected.\r
-\r
- It is relatively easy to optimize the code generated for predicate\r
- gates when they are of the form:\r
-\r
- (AND X Y Z ...)\r
- or (OR X Y Z ...)\r
-\r
- where X, Y, Z, and "..." represent individual predicates (leaves) not\r
- predicate trees.\r
-\r
- If the predicate is an AND the contexts of the X, Y, Z, etc. are\r
- ANDed together to create a single Tree context for the group and\r
- context tests for the individual predicates are suppressed:\r
-\r
- --------------------------------------------------\r
- Note: This was incorrect. The contexts should be\r
- ORed together. This has been fixed. A more \r
- complete description is available in item #152.\r
- ---------------------------------------------------\r
-\r
- Optimization 1: (AND X Y Z ...)\r
-\r
- Suppose the context for Xtest is LA(1)==LP and the context for\r
- Ytest is LA(1)==LP && LA(2)==ID.\r
-\r
- Without the optimization the code would resemble:\r
-\r
- if (lookaheadContext &&\r
- !(LA(1)==LP && LA(1)==LP && LA(2)==ID) ||\r
- ( (! LA(1)==LP || Xtest) &&\r
- (! (LA(1)==LP || LA(2)==ID) || Xtest)\r
- )) {...\r
-\r
- With the -mrhoist optimization the code would resemble:\r
-\r
- if (lookaheadContext &&\r
- ! (LA(1)==LP && LA(2)==ID) || (Xtest && Ytest) {...\r
-\r
- Optimization 2: (OR X Y Z ...) with identical contexts\r
-\r
- Suppose the context for Xtest is LA(1)==ID and for Ytest\r
- the context is also LA(1)==ID.\r
-\r
- Without the optimization the code would resemble:\r
-\r
- if (lookaheadContext &&\r
- ! (LA(1)==ID || LA(1)==ID) ||\r
- (LA(1)==ID && Xtest) ||\r
- (LA(1)==ID && Ytest) {...\r
-\r
- With the -mrhoist optimization the code would resemble:\r
-\r
- if (lookaheadContext &&\r
- (! LA(1)==ID) || (Xtest || Ytest) {...\r
-\r
- Optimization 3: (OR X Y Z ...) with distinct contexts\r
-\r
- Suppose the context for Xtest is LA(1)==ID and for Ytest\r
- the context is LA(1)==LP.\r
-\r
- Without the optimization the code would resemble:\r
-\r
- if (lookaheadContext &&\r
- ! (LA(1)==ID || LA(1)==LP) ||\r
- (LA(1)==ID && Xtest) ||\r
- (LA(1)==LP && Ytest) {...\r
-\r
- With the -mrhoist optimization the code would resemble:\r
-\r
- if (lookaheadContext &&\r
- (zzpf=0,\r
- (LA(1)==ID && (zzpf=1) && Xtest) ||\r
- (LA(1)==LP && (zzpf=1) && Ytest) ||\r
- !zzpf) {\r
-\r
- These may appear to be of similar complexity at first,\r
- but the non-optimized version contains two tests of each\r
- context while the optimized version contains only one\r
- such test, as well as eliminating some of the inverted\r
- logic (" !(...) || ").\r
-\r
- Optimization 4: Computation of predicate gate trees\r
-\r
- When generating code for the gates of predicate expressions\r
- antlr 1.33 vanilla uses a recursive procedure to generate\r
- "&&" and "||" expressions for testing the lookahead. As each\r
- layer of the predicate tree is exposed a new set of "&&" and\r
- "||" expressions on the lookahead are generated. In many\r
- cases the lookahead being tested has already been tested.\r
-\r
- With -mrhoist a lookahead tree is computed for the entire\r
- lookahead expression. This means that predicates with identical\r
- context or context which is a subset of another predicate's\r
- context disappear.\r
-\r
- This is especially important for predicates formed by rules\r
- like the following:\r
-\r
- uppperCaseVowel : <<isUpperCase(LATEXT(1))>>? vowel;\r
- vowel: : <<isVowel(LATEXT(1))>>? LETTERS;\r
-\r
- These predicates are combined using AND since both must be\r
- satisfied for rule upperCaseVowel. They have identical\r
- context which makes this optimization very effective.\r
-\r
- The affect of Items #100a and #100b together can be dramatic. In\r
- a very large (but real world) grammar one particular predicate\r
- expression was reduced from an (unreadable) 50 predicate leaves,\r
- 195 LA(1) terms, and 5500 characters to an (easily comprehensible)\r
- 3 predicate leaves (all different) and a *single* LA(1) term.\r
-\r
-#98. (Changed in 1.33MR10) Option "-info p"\r
-\r
- When the user selects option "-info p" the program will generate\r
- detailed information about predicates. If the user selects\r
- "-mrhoist on" additional detail will be provided explaining\r
- the promotion and suppression of predicates. The output is part\r
- of the generated file and sandwiched between #if 0/#endif statements.\r
-\r
- Consider the following k=1 grammar:\r
-\r
- start : ( all ) * ;\r
-\r
- all : ( a\r
- | b\r
- )\r
- ;\r
-\r
- a : c B\r
- ;\r
-\r
- c : <<LATEXT(1)>>?\r
- | B\r
- ;\r
-\r
- b : <<LATEXT(1)>>? X\r
- ;\r
-\r
- Below is an excerpt of the output for rule "start" for the three\r
- predicate options (off, on, and maintenance release style hoisting).\r
-\r
- For those who do not wish to use the "-mrhoist on" option for code\r
- generation the option can be used in a "diagnostic" mode to provide\r
- valuable information:\r
-\r
- a. where one should insert null actions to inhibit hoisting\r
- b. a chain of rule references which shows where predicates are\r
- being hoisted\r
-\r
- ======================================================================\r
- Example of "-info p" with "-mrhoist on"\r
- ======================================================================\r
- #if 0\r
-\r
- Hoisting of predicate suppressed by alternative without predicate.\r
- The alt without the predicate includes all cases where the\r
- predicate is false.\r
-\r
- WITH predicate: line 11 v36.g\r
- WITHOUT predicate: line 12 v36.g\r
-\r
- The context set for the predicate:\r
-\r
- B\r
-\r
- The lookahead set for alt WITHOUT the semantic predicate:\r
-\r
- B\r
-\r
- The predicate:\r
-\r
- pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g\r
-\r
- set context:\r
- B\r
- tree context: null\r
-\r
- Chain of referenced rules:\r
-\r
- #0 in rule start (line 1 v36.g) to rule all\r
- #1 in rule all (line 3 v36.g) to rule a\r
- #2 in rule a (line 8 v36.g) to rule c\r
- #3 in rule c (line 11 v36.g)\r
-\r
- #endif\r
- &&\r
- #if 0\r
-\r
- pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g\r
-\r
- set context:\r
- X\r
- tree context: null\r
-\r
- #endif\r
- ======================================================================\r
- Example of "-info p" with the default -prc setting ( "-prc off")\r
- ======================================================================\r
- #if 0\r
-\r
- OR\r
- pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g\r
-\r
- set context:\r
- nil\r
- tree context: null\r
-\r
- pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g\r
-\r
- set context:\r
- nil\r
- tree context: null\r
-\r
- #endif\r
- ======================================================================\r
- Example of "-info p" with "-prc on" and "-mrhoist off"\r
- ======================================================================\r
- #if 0\r
-\r
- OR\r
- pred << LATEXT(1)>>? depth=k=1 rule c line 11 v36.g\r
-\r
- set context:\r
- B\r
- tree context: null\r
-\r
- pred << LATEXT(1)>>? depth=k=1 rule b line 15 v36.g\r
-\r
- set context:\r
- X\r
- tree context: null\r
-\r
- #endif\r
- ======================================================================\r
-\r
-#60. (Changed in 1.33MR7) Major changes to exception handling\r
-\r
- There were significant problems in the handling of exceptions\r
- in 1.33 vanilla. The general problem is that it can only\r
- process one level of exception handler. For example, a named\r
- exception handler, an exception handler for an alternative, or\r
- an exception for a subrule always went to the rule's exception\r
- handler if there was no "catch" which matched the exception.\r
-\r
- In 1.33MR7 the exception handlers properly "nest". If an\r
- exception handler does not have a matching "catch" then the\r
- nextmost outer exception handler is checked for an appropriate\r
- "catch" clause, and so on until an exception handler with an\r
- appropriate "catch" is found.\r
-\r
- There are still undesirable features in the way exception\r
- handlers are implemented, but I do not have time to fix them\r
- at the moment:\r
-\r
- The exception handlers for alternatives are outside the\r
- block containing the alternative. This makes it impossible\r
- to access variables declared in a block or to resume the\r
- parse by "falling through". The parse can still be easily\r
- resumed in other ways, but not in the most natural fashion.\r
-\r
- This results in an inconsistentcy between named exception\r
- handlers and exception handlers for alternatives. When\r
- an exception handler for an alternative "falls through"\r
- it goes to the nextmost outer handler - not the "normal\r
- action".\r
-\r
- A major difference between 1.33MR7 and 1.33 vanilla is\r
- the default action after an exception is caught:\r
-\r
- 1.33 Vanilla\r
- ------------\r
- In 1.33 vanilla the signal value is set to zero ("NoSignal")\r
- and the code drops through to the code following the exception.\r
- For named exception handlers this is the "normal action".\r
- For alternative exception handlers this is the rule's handler.\r
-\r
- 1.33MR7\r
- -------\r
- In 1.33MR7 the signal value is NOT automatically set to zero.\r
-\r
- There are two cases:\r
-\r
- For named exception handlers: if the signal value has been\r
- set to zero the code drops through to the "normal action".\r
-\r
- For all other cases the code branches to the nextmost outer\r
- exception handler until it reaches the handler for the rule.\r
-\r
- The following macros have been defined for convenience:\r
-\r
- C/C++ Mode Name\r
- --------------------\r
- (zz)suppressSignal\r
- set signal & return signal arg to 0 ("NoSignal")\r
- (zz)setSignal(intValue)\r
- set signal & return signal arg to some value\r
- (zz)exportSignal\r
- copy the signal value to the return signal arg\r
-\r
- I'm not sure why PCCTS make a distinction between the local\r
- signal value and the return signal argument, but I'm loathe\r
- to change the code. The burden of copying the local signal\r
- value to the return signal argument can be given to the\r
- default signal handler, I suppose.\r
-\r
-#53. (Explanation for 1.33MR6) What happens after an exception is caught ?\r
-\r
- The Book is silent about what happens after an exception\r
- is caught.\r
-\r
- The following code fragment prints "Error Action" followed\r
- by "Normal Action".\r
-\r
- test : Word ex:Number <<printf("Normal Action\n");>>\r
- exception[ex]\r
- catch NoViableAlt:\r
- <<printf("Error Action\n");>>\r
- ;\r
-\r
- The reason for "Normal Action" is that the normal flow of the\r
- program after a user-written exception handler is to "drop through".\r
- In the case of an exception handler for a rule this results in\r
- the exection of a "return" statement. In the case of an\r
- exception handler attached to an alternative, rule, or token\r
- this is the code that would have executed had there been no\r
- exception.\r
-\r
- The user can achieve the desired result by using a "return"\r
- statement.\r
-\r
- test : Word ex:Number <<printf("Normal Action\n");>>\r
- exception[ex]\r
- catch NoViableAlt:\r
- <<printf("Error Action\n"); return;>>\r
- ;\r
-\r
- The most powerful mechanism for recovery from parse errors\r
- in pccts is syntactic predicates because they provide\r
- backtracking. Exceptions allow "return", "break",\r
- "consumeUntil(...)", "goto _handler", "goto _fail", and\r
- changing the _signal value.\r
-\r
-#41. (Added in 1.33MR6) antlr -stdout\r
-\r
- Using "antlr -stdout ..." forces the text that would\r
- normally go to the grammar.c or grammar.cpp file to\r
- stdout.\r
-\r
-#40. (Added in 1.33MR6) antlr -tab to change tab stops\r
-\r
- Using "antlr -tab number ..." changes the tab stops\r
- for the grammar.c or grammar.cpp file. The number\r
- must be between 0 and 8. Using 0 gives tab characters,\r
- values between 1 and 8 give the appropriate number of\r
- space characters.\r
-\r
-#34. (Added to 1.33MR1) Add public DLGLexerBase::set_line(int newValue)\r
-\r
- Previously there was no public function for changing the line\r
- number maintained by the lexer.\r
-\r
-#28. (Added to 1.33MR1) More control over DLG header\r
-\r
- Version 1.33MR1 adds the following directives to PCCTS\r
- for C++ mode:\r
-\r
- #lexprefix <<source code>>\r
-\r
- Adds source code to the DLGLexer.h file\r
- after the #include "DLexerBase.h" but\r
- before the start of the class definition.\r
-\r
- #lexmember <<source code>>\r
-\r
- Adds source code to the DLGLexer.h file\r
- as part of the DLGLexer class body. It\r
- appears immediately after the start of\r
- the class and a "public: statement.\r
-\r