]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [/============================================================================== |
2 | Copyright (C) 2001-2011 Joel de Guzman | |
3 | Copyright (C) 2001-2011 Hartmut Kaiser | |
4 | ||
5 | Distributed under the Boost Software License, Version 1.0. (See accompanying | |
6 | file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) | |
7 | ===============================================================================/] | |
8 | [section Syntax Diagram] | |
9 | ||
10 | In the next section, we will deal with Parsing Expression Grammars | |
11 | (PEG) [footnote Bryan Ford: Parsing Expression Grammars: A Recognition-Based | |
12 | Syntactic Foundation, [@http://pdos.csail.mit.edu/~baford/packrat/popl04/]], | |
13 | a variant of Extended Backus-Naur Form (EBNF) [footnote Richard E. Pattis: EBNF: | |
14 | A Notation to Describe Syntax, [@http://www.cs.cmu.edu/~pattis/misc/ebnf.pdf]] | |
15 | with a different interpretation. It is easier to understand PEG using Syntax | |
16 | Diagrams. Syntax diagrams represent a grammar graphically. It was used extensively | |
17 | by Niklaus Wirth [footnote Niklaus Wirth: The Programming Language | |
18 | Pascal. (July 1973)] in the "Pascal User Manual". Syntax Diagrams are | |
19 | easily understandable by programmers due to their similarity to flow | |
20 | charts. The isomorphism of the diagrams and functions make them ideal for | |
21 | representing __rd__ parsers which are essentially mutually recursive | |
22 | functions. | |
23 | ||
24 | Historically, Parsing Expression Grammars have been used for describing grammars | |
25 | for parsers only (hence the name). In __spirit__ we use a very similar notation | |
26 | for output generation as well. Almost all the concepts described here are | |
27 | equally applicable both to __qi__ parsers and to __karma__ generators. | |
28 | ||
29 | [heading Elements] | |
30 | ||
31 | All diagrams have one entry and one exit point. Arrows connect all possible | |
32 | paths through the grammar from the entry point to the exit point. | |
33 | ||
34 | [:__sd_start_stop__] | |
35 | ||
36 | Terminals are represented by round boxes. Terminals are atomic and | |
37 | usually represent plain characters, strings or tokens. | |
38 | ||
39 | [:__sd_terminals__] | |
40 | ||
41 | Non-terminals are represented by boxes. Diagrams are modularized using | |
42 | named non-terminals. A complex diagram can be broken down into a set of | |
43 | non-terminals. Non-terminals also allow recursion (i.e. a non-terminal | |
44 | can call itself). | |
45 | ||
46 | [:__sd_non_terminals__] | |
47 | ||
48 | [heading Constructs] | |
49 | ||
50 | The most basic composition is the Sequence. B follows A: | |
51 | ||
52 | [:__sd_sequence__] | |
53 | ||
54 | The ordered choice henceforth we will call /alternatives/. In PEG, | |
55 | ordered choice and alternatives are not quite the same. PEG allows | |
56 | ambiguity of choice where one or more branches can succeed. In PEG, in | |
57 | case of ambiguity, the first one always wins. | |
58 | ||
59 | [:__sd_choice__] | |
60 | ||
61 | The optional (zero-or-one): | |
62 | ||
63 | [:__sd_optional__] | |
64 | ||
65 | Now, the loops. We have the zero-or-more and one-or-more: | |
66 | ||
67 | [:__sd_kleene__] | |
68 | [:__sd_plus__] | |
69 | ||
70 | Take note that, as in PEG, these loops behave greedily. If there is | |
71 | another 'A' just before the end-point, it will always fail because the | |
72 | preceding loop has already exhausted all 'A's and there is nothing more | |
73 | left. This is a crucial difference between PEG and general Context Free | |
74 | Grammars (CFGs). This behavior is quite obvious with syntax diagrams as | |
75 | they resemble flow-charts. | |
76 | ||
77 | [heading Predicates] | |
78 | ||
79 | Now, the following are Syntax Diagram versions of PEG predicates. These | |
80 | are not traditionally found in Syntax Diagrams. These are special | |
81 | extensions we invented to closely follow PEGs. | |
82 | ||
83 | First, we introduce a new element, the Predicate: | |
84 | ||
85 | [:__sd_predicate__] | |
86 | ||
87 | This is similar to the conditionals in flow charts where the 'No' branch | |
88 | is absent and always signals a failed parse. | |
89 | ||
90 | We have two versions of the predicate, the /And-Predicate/ and the | |
91 | /Not-Predicate/: | |
92 | ||
93 | [:__sd_and_predicate__] | |
94 | [:__sd_not_predicate__] | |
95 | ||
96 | The /And-Predicate/ tries the predicate, P, and succeeds if P succeeds, | |
97 | or otherwise fail. The opposite is true with the /Not-Predicate/. It | |
98 | tries the predicate, P, and fails if P succeeds, or otherwise succeeds. | |
99 | Both versions do a look-ahead but do not consume any input regardless if | |
100 | P succeeds or not. | |
101 | ||
102 | [endsect] | |
103 | ||
104 |