]> git.proxmox.com Git - mirror_edk2.git/blob - Tools/CCode/Source/Pccts/antlr/antlr1.txt
Fixed all scripts to use new directory layout.
[mirror_edk2.git] / Tools / CCode / Source / Pccts / antlr / antlr1.txt
1
2
3
4 ANTLR(1) PCCTS Manual Pages ANTLR(1)
5
6
7
8 NAME
9 antlr - ANother Tool for Language Recognition
10
11 SYNTAX
12 antlr [_\bo_\bp_\bt_\bi_\bo_\bn_\bs] _\bg_\br_\ba_\bm_\bm_\ba_\br__\bf_\bi_\bl_\be_\bs
13
14 DESCRIPTION
15 _\bA_\bn_\bt_\bl_\br converts an extended form of context-free grammar into
16 a set of C functions which directly implement an efficient
17 form of deterministic recursive-descent LL(k) parser.
18 Context-free grammars may be augmented with predicates to
19 allow semantics to influence parsing; this allows a form of
20 context-sensitive parsing. Selective backtracking is also
21 available to handle non-LL(k) and even non-LALR(k) con-
22 structs. _\bA_\bn_\bt_\bl_\br also produces a definition of a lexer which
23 can be automatically converted into C code for a DFA-based
24 lexer by _\bd_\bl_\bg. Hence, _\ba_\bn_\bt_\bl_\br serves a function much like that
25 of _\by_\ba_\bc_\bc, however, it is notably more flexible and is more
26 integrated with a lexer generator (_\ba_\bn_\bt_\bl_\br directly generates
27 _\bd_\bl_\bg code, whereas _\by_\ba_\bc_\bc and _\bl_\be_\bx are given independent
28 descriptions). Unlike _\by_\ba_\bc_\bc which accepts LALR(1) grammars,
29 _\ba_\bn_\bt_\bl_\br accepts LL(k) grammars in an extended BNF notation -
30 which eliminates the need for precedence rules.
31
32 Like _\by_\ba_\bc_\bc grammars, _\ba_\bn_\bt_\bl_\br grammars can use automatically-
33 maintained symbol attribute values referenced as dollar
34 variables. Further, because _\ba_\bn_\bt_\bl_\br generates top-down
35 parsers, arbitrary values may be inherited from parent rules
36 (passed like function parameters). _\bA_\bn_\bt_\bl_\br also has a mechan-
37 ism for creating and manipulating abstract-syntax-trees.
38
39 There are various other niceties in _\ba_\bn_\bt_\bl_\br, including the
40 ability to spread one grammar over multiple files or even
41 multiple grammars in a single file, the ability to generate
42 a version of the grammar with actions stripped out (for
43 documentation purposes), and lots more.
44
45 OPTIONS
46 -ck _\bn
47 Use up to _\bn symbols of lookahead when using compressed
48 (linear approximation) lookahead. This type of looka-
49 head is very cheap to compute and is attempted before
50 full LL(k) lookahead, which is of exponential complex-
51 ity in the worst case. In general, the compressed loo-
52 kahead can be much deeper (e.g, -ck 10) _\bt_\bh_\ba_\bn _\bt_\bh_\be _\bf_\bu_\bl_\bl
53 _\bl_\bo_\bo_\bk_\ba_\bh_\be_\ba_\bd (_\bw_\bh_\bi_\bc_\bh _\bu_\bs_\bu_\ba_\bl_\bl_\by _\bm_\bu_\bs_\bt _\bb_\be _\bl_\be_\bs_\bs _\bt_\bh_\ba_\bn _\b4).
54
55 -CC Generate C++ output from both ANTLR and DLG.
56
57 -cr Generate a cross-reference for all rules. For each
58 rule, print a list of all other rules that reference
59 it.
60
61 -e1 Ambiguities/errors shown in low detail (default).
62
63 -e2 Ambiguities/errors shown in more detail.
64
65 -e3 Ambiguities/errors shown in excruciating detail.
66
67 -fe file
68 Rename err.c to file.
69
70 -fh file
71 Rename stdpccts.h header (turns on -gh) to file.
72
73 -fl file
74 Rename lexical output, parser.dlg, to file.
75
76 -fm file
77 Rename file with lexical mode definitions, mode.h, to
78 file.
79
80 -fr file
81 Rename file which remaps globally visible symbols,
82 remap.h, to file.
83
84 -ft file
85 Rename tokens.h to file.
86
87 -ga Generate ANSI-compatible code (default case). This has
88 not been rigorously tested to be ANSI XJ11 C compliant,
89 but it is close. The normal output of _\ba_\bn_\bt_\bl_\br is
90 currently compilable under both K&R, ANSI C, and C++-
91 this option does nothing because _\ba_\bn_\bt_\bl_\br generates a
92 bunch of #ifdef's to do the right thing depending on
93 the language.
94
95 -gc Indicates that _\ba_\bn_\bt_\bl_\br should generate no C code, i.e.,
96 only perform analysis on the grammar.
97
98 -gd C code is inserted in each of the _\ba_\bn_\bt_\bl_\br generated pars-
99 ing functions to provide for user-defined handling of a
100 detailed parse trace. The inserted code consists of
101 calls to the user-supplied macros or functions called
102 zzTRACEIN and zzTRACEOUT. The only argument is a _\bc_\bh_\ba_\br
103 * pointing to a C-style string which is the grammar
104 rule recognized by the current parsing function. If no
105 definition is given for the trace functions, upon rule
106 entry and exit, a message will be printed indicating
107 that a particular rule as been entered or exited.
108
109 -ge Generate an error class for each non-terminal.
110
111 -gh Generate stdpccts.h for non-ANTLR-generated files to
112 include. This file contains all defines needed to
113 describe the type of parser generated by _\ba_\bn_\bt_\bl_\br (e.g.
114 how much lookahead is used and whether or not trees are
115 constructed) and contains the header action specified
116 by the user.
117
118 -gk Generate parsers that delay lookahead fetches until
119 needed. Without this option, _\ba_\bn_\bt_\bl_\br generates parsers
120 which always have _\bk tokens of lookahead available.
121
122 -gl Generate line info about grammar actions in C parser of
123 the form # _\bl_\bi_\bn_\be "_\bf_\bi_\bl_\be" which makes error messages from
124 the C/C++ compiler make more sense as they will point
125 into the grammar file not the resulting C file.
126 Debugging is easier as well, because you will step
127 through the grammar not C file.
128
129 -gs Do not generate sets for token expression lists;
130 instead generate a ||-separated sequence of
131 LA(1)==_\bt_\bo_\bk_\be_\bn__\bn_\bu_\bm_\bb_\be_\br. The default is to generate sets.
132
133 -gt Generate code for Abstract-Syntax Trees.
134
135 -gx Do not create the lexical analyzer files (dlg-related).
136 This option should be given when the user wishes to
137 provide a customized lexical analyzer. It may also be
138 used in _\bm_\ba_\bk_\be scripts to cause only the parser to be
139 rebuilt when a change not affecting the lexical struc-
140 ture is made to the input grammars.
141
142 -k _\bn Set k of LL(k) to _\bn; i.e. set tokens of look-ahead
143 (default==1).
144
145 -o dir
146 Directory where output files should go (default=".").
147 This is very nice for keeping the source directory
148 clear of ANTLR and DLG spawn.
149
150 -p The complete grammar, collected from all input grammar
151 files and stripped of all comments and embedded
152 actions, is listed to stdout. This is intended to aid
153 in viewing the entire grammar as a whole and to elim-
154 inate the need to keep actions concisely stated so that
155 the grammar is easier to read. Hence, it is preferable
156 to embed even complex actions directly in the grammar,
157 rather than to call them as subroutines, since the sub-
158 routine call overhead will be saved.
159
160 -pa This option is the same as -p except that the output is
161 annotated with the first sets determined from grammar
162 analysis.
163
164 -prc on
165 Turn on the computation and hoisting of predicate con-
166 text.
167
168 -prc off
169 Turn off the computation and hoisting of predicate con-
170 text. This option makes 1.10 behave like the 1.06
171 release with option -pr on. Context computation is off
172 by default.
173
174 -rl _\bn
175 Limit the maximum number of tree nodes used by grammar
176 analysis to _\bn. Occasionally, _\ba_\bn_\bt_\bl_\br is unable to
177 analyze a grammar submitted by the user. This rare
178 situation can only occur when the grammar is large and
179 the amount of lookahead is greater than one. A non-
180 linear analysis algorithm is used by PCCTS to handle
181 the general case of LL(k) parsing. The average com-
182 plexity of analysis, however, is near linear due to
183 some fancy footwork in the implementation which reduces
184 the number of calls to the full LL(k) algorithm. An
185 error message will be displayed, if this limit is
186 reached, which indicates the grammar construct being
187 analyzed when _\ba_\bn_\bt_\bl_\br hit a non-linearity. Use this
188 option if _\ba_\bn_\bt_\bl_\br seems to go out to lunch and your disk
189 start thrashing; try _\bn=10000 to start. Once the
190 offending construct has been identified, try to remove
191 the ambiguity that _\ba_\bn_\bt_\bl_\br was trying to overcome with
192 large lookahead analysis. The introduction of (...)?
193 backtracking blocks eliminates some of these problems -
194 _\ba_\bn_\bt_\bl_\br does not analyze alternatives that begin with
195 (...)? (it simply backtracks, if necessary, at run
196 time).
197
198 -w1 Set low warning level. Do not warn if semantic
199 predicates and/or (...)? blocks are assumed to cover
200 ambiguous alternatives.
201
202 -w2 Ambiguous parsing decisions yield warnings even if
203 semantic predicates or (...)? blocks are used. Warn if
204 predicate context computed and semantic predicates
205 incompletely disambiguate alternative productions.
206
207 - Read grammar from standard input and generate stdin.c
208 as the parser file.
209
210 SPECIAL CONSIDERATIONS
211 _\bA_\bn_\bt_\bl_\br works... we think. There is no implicit guarantee of
212 anything. We reserve no legal rights to the software known
213 as the Purdue Compiler Construction Tool Set (PCCTS) - PCCTS
214 is in the public domain. An individual or company may do
215 whatever they wish with source code distributed with PCCTS
216 or the code generated by PCCTS, including the incorporation
217 of PCCTS, or its output, into commercial software. We
218 encourage users to develop software with PCCTS. However, we
219 do ask that credit is given to us for developing PCCTS. By
220 "credit", we mean that if you incorporate our source code
221 into one of your programs (commercial product, research pro-
222 ject, or otherwise) that you acknowledge this fact somewhere
223 in the documentation, research report, etc... If you like
224 PCCTS and have developed a nice tool with the output, please
225 mention that you developed it using PCCTS. As long as these
226 guidelines are followed, we expect to continue enhancing
227 this system and expect to make other tools available as they
228 are completed.
229
230 FILES
231 *.c output C parser.
232
233 *.cpp
234 output C++ parser when C++ mode is used.
235
236 parser.dlg
237 output _\bd_\bl_\bg lexical analyzer.
238
239 err.c
240 token string array, error sets and error support rou-
241 tines. Not used in C++ mode.
242
243 remap.h
244 file that redefines all globally visible parser sym-
245 bols. The use of the #parser directive creates this
246 file. Not used in C++ mode.
247
248 stdpccts.h
249 list of definitions needed by C files, not generated by
250 PCCTS, that reference PCCTS objects. This is not gen-
251 erated by default. Not used in C++ mode.
252
253 tokens.h
254 output #_\bd_\be_\bf_\bi_\bn_\be_\bs for tokens used and function prototypes
255 for functions generated for rules.
256
257
258 SEE ALSO
259 dlg(1), pccts(1)
260
261
262
263
264