[mirror_edk2.git] / Tools / CCode / Source / Pccts / antlr / antlr.1

.TH ANTLR 1 "September 1995" "ANTLR" "PCCTS Manual Pages"\r
.SH NAME\r
antlr \- ANother Tool for Language Recognition\r
.SH SYNTAX\r
.LP\r
\fBantlr\fR [\fIoptions\fR] \fIgrammar_files\fR\r
.SH DESCRIPTION\r
.PP\r
\fIAntlr\fP converts an extended form of context-free grammar into a\r
set of C functions which directly implement an efficient form of\r
deterministic recursive-descent LL(k) parser.  Context-free grammars\r
may be augmented with predicates to allow semantics to influence\r
parsing; this allows a form of context-sensitive parsing.  Selective\r
backtracking is also available to handle non-LL(k) and even\r
non-LALR(k) constructs.  \fIAntlr\fP also produces a definition of a\r
lexer which can be automatically converted into C code for a DFA-based\r
lexer by \fIdlg\fR.  Hence, \fIantlr\fR serves a function much like\r
that of \fIyacc\fR, however, it is notably more flexible and is more\r
integrated with a lexer generator (\fIantlr\fR directly generates\r
\fIdlg\fR code, whereas \fIyacc\fR and \fIlex\fR are given independent\r
descriptions).  Unlike \fIyacc\fR which accepts LALR(1) grammars,\r
\fIantlr\fR accepts LL(k) grammars in an extended BNF notation \(em\r
which eliminates the need for precedence rules.\r
.PP\r
Like \fIyacc\fR grammars, \fIantlr\fR grammars can use\r
automatically-maintained symbol attribute values referenced as dollar\r
variables.  Further, because \fIantlr\fR generates top-down parsers,\r
arbitrary values may be inherited from parent rules (passed like\r
function parameters).  \fIAntlr\fP also has a mechanism for creating\r
and manipulating abstract-syntax-trees.\r
.PP\r
There are various other niceties in \fIantlr\fR, including the ability to\r
spread one grammar over multiple files or even multiple grammars in a single\r
file, the ability to generate a version of the grammar with actions stripped\r
out (for documentation purposes), and lots more.\r
.SH OPTIONS\r
.IP "\fB-ck \fIn\fR"\r
Use up to \fIn\fR symbols of lookahead when using compressed (linear\r
approximation) lookahead.  This type of lookahead is very cheap to\r
compute and is attempted before full LL(k) lookahead, which is of\r
exponential complexity in the worst case.  In general, the compressed\r
lookahead can be much deeper (e.g, \f(CW-ck 10\fP) than the full\r
lookahead (which usually must be less than 4).\r
.IP \fB-CC\fP\r
Generate C++ output from both ANTLR and DLG.\r
.IP \fB-cr\fP\r
Generate a cross-reference for all rules.  For each rule, print a list\r
of all other rules that reference it.\r
.IP \fB-e1\fP\r
Ambiguities/errors shown in low detail (default).\r
.IP \fB-e2\fP\r
Ambiguities/errors shown in more detail.\r
.IP \fB-e3\fP\r
Ambiguities/errors shown in excruciating detail.\r
.IP "\fB-fe\fP file"\r
Rename \fBerr.c\fP to file.\r
.IP "\fB-fh\fP file"\r
Rename \fBstdpccts.h\fP header (turns on \fB-gh\fP) to file.\r
.IP "\fB-fl\fP file"\r
Rename lexical output, \fBparser.dlg\fP, to file.\r
.IP "\fB-fm\fP file"\r
Rename file with lexical mode definitions, \fBmode.h\fP, to file.\r
.IP "\fB-fr\fP file"\r
Rename file which remaps globally visible symbols, \fBremap.h\fP, to file.\r
.IP "\fB-ft\fP file"\r
Rename \fBtokens.h\fP to file.\r
.IP \fB-ga\fP\r
Generate ANSI-compatible code (default case).  This has not been\r
rigorously tested to be ANSI XJ11 C compliant, but it is close.  The\r
normal output of \fIantlr\fP is currently compilable under both K&R,\r
ANSI C, and C++\(emthis option does nothing because \fIantlr\fP\r
generates a bunch of #ifdef's to do the right thing depending on the\r
language.\r
.IP \fB-gc\fP\r
Indicates that \fIantlr\fP should generate no C code, i.e., only\r
perform analysis on the grammar.\r
.IP \fB-gd\fP\r
C code is inserted in each of the \fIantlr\fR generated parsing functions to\r
provide for user-defined handling of a detailed parse trace.  The inserted\r
code consists of calls to the user-supplied macros or functions called\r
\fBzzTRACEIN\fR and \fBzzTRACEOUT\fP.  The only argument is a\r
\fIchar *\fR pointing to a C-style string which is the grammar rule\r
recognized by the current parsing function.  If no definition is given\r
for the trace functions, upon rule entry and exit, a message will be\r
printed indicating that a particular rule as been entered or exited.\r
.IP \fB-ge\fP\r
Generate an error class for each non-terminal.\r
.IP \fB-gh\fP\r
Generate \fBstdpccts.h\fP for non-ANTLR-generated files to include.\r
This file contains all defines needed to describe the type of parser\r
generated by \fIantlr\fP (e.g. how much lookahead is used and whether\r
or not trees are constructed) and contains the \fBheader\fP action\r
specified by the user.\r
.IP \fB-gk\fP\r
Generate parsers that delay lookahead fetches until needed.  Without\r
this option, \fIantlr\fP generates parsers which always have \fIk\fP\r
tokens of lookahead available.\r
.IP \fB-gl\fP\r
Generate line info about grammar actions in C parser of the form\r
\fB#\ \fIline\fP\ "\fIfile\fP"\fR which makes error messages from\r
the C/C++ compiler make more sense as they will \*Qpoint\*U into the\r
grammar file not the resulting C file.  Debugging is easier as well,\r
because you will step through the grammar not C file.\r
.IP \fB-gs\fR\r
Do not generate sets for token expression lists; instead generate a\r
\fB||\fP-separated sequence of \fBLA(1)==\fItoken_number\fR.  The\r
default is to generate sets.\r
.IP \fB-gt\fP\r
Generate code for Abstract-Syntax Trees.\r
.IP \fB-gx\fP\r
Do not create the lexical analyzer files (dlg-related).  This option\r
should be given when the user wishes to provide a customized lexical\r
analyzer.  It may also be used in \fImake\fR scripts to cause only the\r
parser to be rebuilt when a change not affecting the lexical structure\r
is made to the input grammars.\r
.IP "\fB-k \fIn\fR"\r
Set k of LL(k) to \fIn\fR; i.e. set tokens of look-ahead (default==1).\r
.IP "\fB-o\fP dir\r
Directory where output files should go (default=".").  This is very\r
nice for keeping the source directory clear of ANTLR and DLG spawn.\r
.IP \fB-p\fP\r
The complete grammar, collected from all input grammar files and\r
stripped of all comments and embedded actions, is listed to\r
\fBstdout\fP.  This is intended to aid in viewing the entire grammar\r
as a whole and to eliminate the need to keep actions concisely stated\r
so that the grammar is easier to read.  Hence, it is preferable to\r
embed even complex actions directly in the grammar, rather than to\r
call them as subroutines, since the subroutine call overhead will be\r
saved.\r
.IP \fB-pa\fP\r
This option is the same as \fB-p\fP except that the output is\r
annotated with the first sets determined from grammar analysis.\r
.IP "\fB-prc on\fR\r
Turn on the computation and hoisting of predicate context.\r
.IP "\fB-prc off\fR\r
Turn off the computation and hoisting of predicate context.  This\r
option makes 1.10 behave like the 1.06 release with option \fB-pr\fR\r
on.  Context computation is off by default.\r
.IP "\fB-rl \fIn\fR\r
Limit the maximum number of tree nodes used by grammar analysis to\r
\fIn\fP.  Occasionally, \fIantlr\fP is unable to analyze a grammar\r
submitted by the user.  This rare situation can only occur when the\r
grammar is large and the amount of lookahead is greater than one.  A\r
nonlinear analysis algorithm is used by PCCTS to handle the general\r
case of LL(k) parsing.  The average complexity of analysis, however, is\r
near linear due to some fancy footwork in the implementation which\r
reduces the number of calls to the full LL(k) algorithm.  An error\r
message will be displayed, if this limit is reached, which indicates\r
the grammar construct being analyzed when \fIantlr\fP hit a\r
non-linearity.  Use this option if \fIantlr\fP seems to go out to\r
lunch and your disk start thrashing; try \fIn\fP=10000 to start.  Once\r
the offending construct has been identified, try to remove the\r
ambiguity that \fIantlr\fP was trying to overcome with large lookahead\r
analysis.  The introduction of (...)? backtracking blocks eliminates\r
some of these problems\ \(em \fIantlr\fP does not analyze alternatives\r
that begin with (...)? (it simply backtracks, if necessary, at run\r
time).\r
.IP \fB-w1\fR\r
Set low warning level.  Do not warn if semantic predicates and/or\r
(...)? blocks are assumed to cover ambiguous alternatives.\r
.IP \fB-w2\fR\r
Ambiguous parsing decisions yield warnings even if semantic predicates\r
or (...)? blocks are used.  Warn if predicate context computed and\r
semantic predicates incompletely disambiguate alternative productions.\r
.IP \fB-\fR\r
Read grammar from standard input and generate \fBstdin.c\fP as the\r
parser file.\r
.SH "SPECIAL CONSIDERATIONS"\r
.PP\r
\fIAntlr\fP works...  we think.  There is no implicit guarantee of\r
anything.  We reserve no \fBlegal\fP rights to the software known as\r
the Purdue Compiler Construction Tool Set (PCCTS) \(em PCCTS is in the\r
public domain.  An individual or company may do whatever they wish\r
with source code distributed with PCCTS or the code generated by\r
PCCTS, including the incorporation of PCCTS, or its output, into\r
commercial software.  We encourage users to develop software with\r
PCCTS.  However, we do ask that credit is given to us for developing\r
PCCTS.  By "credit", we mean that if you incorporate our source code\r
into one of your programs (commercial product, research project, or\r
otherwise) that you acknowledge this fact somewhere in the\r
documentation, research report, etc...  If you like PCCTS and have\r
developed a nice tool with the output, please mention that you\r
developed it using PCCTS.  As long as these guidelines are followed,\r
we expect to continue enhancing this system and expect to make other\r
tools available as they are completed.\r
.SH FILES\r
.IP *.c\r
output C parser.\r
.IP *.cpp\r
output C++ parser when C++ mode is used.\r
.IP \fBparser.dlg\fP\r
output \fIdlg\fR lexical analyzer.\r
.IP \fBerr.c\fP\r
token string array, error sets and error support routines.  Not used in\r
C++ mode.\r
.IP \fBremap.h\fP\r
file that redefines all globally visible parser symbols.  The use of\r
the #parser directive creates this file.  Not used in\r
C++ mode.\r
.IP \fBstdpccts.h\fP\r
list of definitions needed by C files, not generated by PCCTS, that\r
reference PCCTS objects.  This is not generated by default.  Not used in\r
C++ mode.\r
.IP \fBtokens.h\fP\r
output \fI#defines\fR for tokens used and function prototypes for\r
functions generated for rules.\r
.SH "SEE ALSO"\r
.LP\r
dlg(1), pccts(1)\r
Commit	Line	Data
878ddf1f	1	.TH ANTLR 1 "September 1995" "ANTLR" "PCCTS Manual Pages"\r
	2	.SH NAME\r
	3	antlr \- ANother Tool for Language Recognition\r
	4	.SH SYNTAX\r
	5	.LP\r
	6	\fBantlr\fR [\fIoptions\fR] \fIgrammar_files\fR\r
	7	.SH DESCRIPTION\r
	8	.PP\r
	9	\fIAntlr\fP converts an extended form of context-free grammar into a\r
	10	set of C functions which directly implement an efficient form of\r
	11	deterministic recursive-descent LL(k) parser. Context-free grammars\r
	12	may be augmented with predicates to allow semantics to influence\r
	13	parsing; this allows a form of context-sensitive parsing. Selective\r
	14	backtracking is also available to handle non-LL(k) and even\r
	15	non-LALR(k) constructs. \fIAntlr\fP also produces a definition of a\r
	16	lexer which can be automatically converted into C code for a DFA-based\r
	17	lexer by \fIdlg\fR. Hence, \fIantlr\fR serves a function much like\r
	18	that of \fIyacc\fR, however, it is notably more flexible and is more\r
	19	integrated with a lexer generator (\fIantlr\fR directly generates\r
	20	\fIdlg\fR code, whereas \fIyacc\fR and \fIlex\fR are given independent\r
	21	descriptions). Unlike \fIyacc\fR which accepts LALR(1) grammars,\r
	22	\fIantlr\fR accepts LL(k) grammars in an extended BNF notation \(em\r
	23	which eliminates the need for precedence rules.\r
	24	.PP\r
	25	Like \fIyacc\fR grammars, \fIantlr\fR grammars can use\r
	26	automatically-maintained symbol attribute values referenced as dollar\r
	27	variables. Further, because \fIantlr\fR generates top-down parsers,\r
	28	arbitrary values may be inherited from parent rules (passed like\r
	29	function parameters). \fIAntlr\fP also has a mechanism for creating\r
	30	and manipulating abstract-syntax-trees.\r
	31	.PP\r
	32	There are various other niceties in \fIantlr\fR, including the ability to\r
	33	spread one grammar over multiple files or even multiple grammars in a single\r
	34	file, the ability to generate a version of the grammar with actions stripped\r
	35	out (for documentation purposes), and lots more.\r
	36	.SH OPTIONS\r
	37	.IP "\fB-ck \fIn\fR"\r
	38	Use up to \fIn\fR symbols of lookahead when using compressed (linear\r
	39	approximation) lookahead. This type of lookahead is very cheap to\r
	40	compute and is attempted before full LL(k) lookahead, which is of\r
	41	exponential complexity in the worst case. In general, the compressed\r
	42	lookahead can be much deeper (e.g, \f(CW-ck 10\fP) than the full\r
	43	lookahead (which usually must be less than 4).\r
	44	.IP \fB-CC\fP\r
	45	Generate C++ output from both ANTLR and DLG.\r
	46	.IP \fB-cr\fP\r
	47	Generate a cross-reference for all rules. For each rule, print a list\r
	48	of all other rules that reference it.\r
	49	.IP \fB-e1\fP\r
	50	Ambiguities/errors shown in low detail (default).\r
	51	.IP \fB-e2\fP\r
	52	Ambiguities/errors shown in more detail.\r
	53	.IP \fB-e3\fP\r
	54	Ambiguities/errors shown in excruciating detail.\r
	55	.IP "\fB-fe\fP file"\r
	56	Rename \fBerr.c\fP to file.\r
	57	.IP "\fB-fh\fP file"\r
	58	Rename \fBstdpccts.h\fP header (turns on \fB-gh\fP) to file.\r
	59	.IP "\fB-fl\fP file"\r
	60	Rename lexical output, \fBparser.dlg\fP, to file.\r
	61	.IP "\fB-fm\fP file"\r
	62	Rename file with lexical mode definitions, \fBmode.h\fP, to file.\r
	63	.IP "\fB-fr\fP file"\r
	64	Rename file which remaps globally visible symbols, \fBremap.h\fP, to file.\r
65	.IP "\fB-ft\fP file"\r
66	Rename \fBtokens.h\fP to file.\r
67	.IP \fB-ga\fP\r
68	Generate ANSI-compatible code (default case). This has not been\r
69	rigorously tested to be ANSI XJ11 C compliant, but it is close. The\r
70	normal output of \fIantlr\fP is currently compilable under both K&R,\r
71	ANSI C, and C++\(emthis option does nothing because \fIantlr\fP\r
72	generates a bunch of #ifdef's to do the right thing depending on the\r
73	language.\r
74	.IP \fB-gc\fP\r
75	Indicates that \fIantlr\fP should generate no C code, i.e., only\r
76	perform analysis on the grammar.\r
77	.IP \fB-gd\fP\r
78	C code is inserted in each of the \fIantlr\fR generated parsing functions to\r
79	provide for user-defined handling of a detailed parse trace. The inserted\r
80	code consists of calls to the user-supplied macros or functions called\r
81	\fBzzTRACEIN\fR and \fBzzTRACEOUT\fP. The only argument is a\r
82	\fIchar *\fR pointing to a C-style string which is the grammar rule\r
83	recognized by the current parsing function. If no definition is given\r
84	for the trace functions, upon rule entry and exit, a message will be\r
85	printed indicating that a particular rule as been entered or exited.\r
86	.IP \fB-ge\fP\r
87	Generate an error class for each non-terminal.\r
88	.IP \fB-gh\fP\r
89	Generate \fBstdpccts.h\fP for non-ANTLR-generated files to include.\r
90	This file contains all defines needed to describe the type of parser\r
91	generated by \fIantlr\fP (e.g. how much lookahead is used and whether\r
92	or not trees are constructed) and contains the \fBheader\fP action\r
93	specified by the user.\r
94	.IP \fB-gk\fP\r
95	Generate parsers that delay lookahead fetches until needed. Without\r
96	this option, \fIantlr\fP generates parsers which always have \fIk\fP\r
97	tokens of lookahead available.\r
98	.IP \fB-gl\fP\r
99	Generate line info about grammar actions in C parser of the form\r
100	\fB#\ \fIline\fP\ "\fIfile\fP"\fR which makes error messages from\r
101	the C/C++ compiler make more sense as they will \Qpoint\U into the\r
102	grammar file not the resulting C file. Debugging is easier as well,\r
103	because you will step through the grammar not C file.\r
104	.IP \fB-gs\fR\r
105	Do not generate sets for token expression lists; instead generate a\r
106	\fB\|\|\fP-separated sequence of \fBLA(1)==\fItoken_number\fR. The\r
107	default is to generate sets.\r
108	.IP \fB-gt\fP\r
109	Generate code for Abstract-Syntax Trees.\r
110	.IP \fB-gx\fP\r
111	Do not create the lexical analyzer files (dlg-related). This option\r
112	should be given when the user wishes to provide a customized lexical\r
113	analyzer. It may also be used in \fImake\fR scripts to cause only the\r
114	parser to be rebuilt when a change not affecting the lexical structure\r
115	is made to the input grammars.\r
116	.IP "\fB-k \fIn\fR"\r
117	Set k of LL(k) to \fIn\fR; i.e. set tokens of look-ahead (default==1).\r
118	.IP "\fB-o\fP dir\r
119	Directory where output files should go (default="."). This is very\r
120	nice for keeping the source directory clear of ANTLR and DLG spawn.\r
121	.IP \fB-p\fP\r
122	The complete grammar, collected from all input grammar files and\r
123	stripped of all comments and embedded actions, is listed to\r
124	\fBstdout\fP. This is intended to aid in viewing the entire grammar\r
125	as a whole and to eliminate the need to keep actions concisely stated\r
126	so that the grammar is easier to read. Hence, it is preferable to\r
127	embed even complex actions directly in the grammar, rather than to\r
128	call them as subroutines, since the subroutine call overhead will be\r
129	saved.\r
130	.IP \fB-pa\fP\r
131	This option is the same as \fB-p\fP except that the output is\r
132	annotated with the first sets determined from grammar analysis.\r
133	.IP "\fB-prc on\fR\r
134	Turn on the computation and hoisting of predicate context.\r
135	.IP "\fB-prc off\fR\r
136	Turn off the computation and hoisting of predicate context. This\r
137	option makes 1.10 behave like the 1.06 release with option \fB-pr\fR\r
138	on. Context computation is off by default.\r
139	.IP "\fB-rl \fIn\fR\r
140	Limit the maximum number of tree nodes used by grammar analysis to\r
141	\fIn\fP. Occasionally, \fIantlr\fP is unable to analyze a grammar\r
142	submitted by the user. This rare situation can only occur when the\r
143	grammar is large and the amount of lookahead is greater than one. A\r
144	nonlinear analysis algorithm is used by PCCTS to handle the general\r
145	case of LL(k) parsing. The average complexity of analysis, however, is\r
146	near linear due to some fancy footwork in the implementation which\r
147	reduces the number of calls to the full LL(k) algorithm. An error\r
148	message will be displayed, if this limit is reached, which indicates\r
149	the grammar construct being analyzed when \fIantlr\fP hit a\r
150	non-linearity. Use this option if \fIantlr\fP seems to go out to\r
151	lunch and your disk start thrashing; try \fIn\fP=10000 to start. Once\r
152	the offending construct has been identified, try to remove the\r
153	ambiguity that \fIantlr\fP was trying to overcome with large lookahead\r
154	analysis. The introduction of (...)? backtracking blocks eliminates\r
155	some of these problems\ \(em \fIantlr\fP does not analyze alternatives\r
156	that begin with (...)? (it simply backtracks, if necessary, at run\r
157	time).\r
158	.IP \fB-w1\fR\r
159	Set low warning level. Do not warn if semantic predicates and/or\r
160	(...)? blocks are assumed to cover ambiguous alternatives.\r
161	.IP \fB-w2\fR\r
162	Ambiguous parsing decisions yield warnings even if semantic predicates\r
163	or (...)? blocks are used. Warn if predicate context computed and\r
164	semantic predicates incompletely disambiguate alternative productions.\r
165	.IP \fB-\fR\r
166	Read grammar from standard input and generate \fBstdin.c\fP as the\r
167	parser file.\r
168	.SH "SPECIAL CONSIDERATIONS"\r
169	.PP\r
170	\fIAntlr\fP works... we think. There is no implicit guarantee of\r
171	anything. We reserve no \fBlegal\fP rights to the software known as\r
172	the Purdue Compiler Construction Tool Set (PCCTS) \(em PCCTS is in the\r
173	public domain. An individual or company may do whatever they wish\r
174	with source code distributed with PCCTS or the code generated by\r
175	PCCTS, including the incorporation of PCCTS, or its output, into\r
176	commercial software. We encourage users to develop software with\r
177	PCCTS. However, we do ask that credit is given to us for developing\r
178	PCCTS. By "credit", we mean that if you incorporate our source code\r
179	into one of your programs (commercial product, research project, or\r
180	otherwise) that you acknowledge this fact somewhere in the\r
181	documentation, research report, etc... If you like PCCTS and have\r
182	developed a nice tool with the output, please mention that you\r
183	developed it using PCCTS. As long as these guidelines are followed,\r
184	we expect to continue enhancing this system and expect to make other\r
185	tools available as they are completed.\r
186	.SH FILES\r
187	.IP *.c\r
188	output C parser.\r
189	.IP *.cpp\r
190	output C++ parser when C++ mode is used.\r
191	.IP \fBparser.dlg\fP\r
192	output \fIdlg\fR lexical analyzer.\r
193	.IP \fBerr.c\fP\r
194	token string array, error sets and error support routines. Not used in\r
195	C++ mode.\r
196	.IP \fBremap.h\fP\r
197	file that redefines all globally visible parser symbols. The use of\r
198	the #parser directive creates this file. Not used in\r
199	C++ mode.\r
200	.IP \fBstdpccts.h\fP\r
201	list of definitions needed by C files, not generated by PCCTS, that\r
202	reference PCCTS objects. This is not generated by default. Not used in\r
203	C++ mode.\r
204	.IP \fBtokens.h\fP\r
205	output \fI#defines\fR for tokens used and function prototypes for\r
206	functions generated for rules.\r
207	.SH "SEE ALSO"\r
208	.LP\r
209	dlg(1), pccts(1)\r