]> git.proxmox.com Git - ceph.git/blob - ceph/src/boost/libs/spirit/doc/abstracts/peg.qbk
add subtree-ish sources for 12.0.3
[ceph.git] / ceph / src / boost / libs / spirit / doc / abstracts / peg.qbk
1 [/==============================================================================
2 Copyright (C) 2001-2011 Joel de Guzman
3 Copyright (C) 2001-2011 Hartmut Kaiser
4
5 Distributed under the Boost Software License, Version 1.0. (See accompanying
6 file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
7 ===============================================================================/]
8 [section Parsing Expression Grammar]
9
10 Parsing Expression Grammars (PEG) [footnote Bryan Ford: Parsing Expression
11 Grammars: A Recognition-Based Syntactic Foundation,
12 [@http://pdos.csail.mit.edu/~baford/packrat/popl04/]] are a derivative of
13 Extended Backus-Naur Form (EBNF) [footnote Richard E. Pattis: EBNF: A Notation
14 to Describe Syntax, [@http://www.cs.cmu.edu/~pattis/misc/ebnf.pdf]]
15 with a different interpretation, designed to represent a recursive descent
16 parser. A PEG can be directly represented as a recursive-descent parser.
17
18 Like EBNF, PEG is a formal grammar for describing a formal language in
19 terms of a set of rules used to recognize strings of this language.
20 Unlike EBNF, PEGs have an exact interpretation. There is only one valid
21 parse tree (see __ast__) for each PEG grammar.
22
23 [heading Sequences]
24
25 Sequences are represented by juxtaposition like in EBNF:
26
27 a b
28
29 The PEG expression above states that, in order for this to succeed,
30 `b` must follow `a`. Here's the syntax diagram:
31
32 [:__sd_sequence__]
33
34 Here's a trivial example:
35
36 'x' digit
37
38 which means the character `x` must be followed by a digit.
39
40 [note In __qi__, we use the `>>` for sequences since C++ does not
41 allow juxtaposition, while in __karma__ we use the `<<` instead.]
42
43 [heading Alternatives]
44
45 Alternatives are represented in PEG using the slash:
46
47 a / b
48
49 [note In __qi__ and __karma__, we use the `|` for alternatives just as in EBNF.]
50
51 Alternatives allow for choices. The expression above reads: try to match
52 `a`. If `a` succeeds, success, if not try to match `b`. This is a bit of
53 a deviation from the usual EBNF interpretation where you simply match
54 `a` *or* `b`. Here's the syntax diagram:
55
56 [:__sd_choice__]
57
58 PEGs allow for ambiguity in the alternatives. In the expression above,
59 both `a` or `b` can both match an input string. However, only the first
60 matching alternative is valid. As noted, there can only be one valid
61 parse tree. [/FIXME: $$$ explain more about this $$$]
62
63 [heading Loops]
64
65 Again, like EBNF, PEG uses the regular-expression Kleene star and the
66 plus loops:
67
68 a*
69 a+
70
71 [note __qi__ and __karma__ use the prefix star and plus since there is no
72 postfix star or plus in C++.]
73
74 Here are the syntax diagrams:
75
76 [:__sd_kleene__]
77 [:__sd_plus__]
78
79 The first, called the Kleene star, matches zero or more of its subject
80 `a`. The second, plus, matches one ore more of its subject `a`.
81
82 Unlike EBNF, PEGs have greedy loops. It will match as much as it can
83 until its subject fails to match without regard to what follows. The
84 following is a classic example of a fairly common EBNF/regex expression
85 failing to match in PEG:
86
87 alnum* digit
88
89 In PEG, alnum will eat as much alpha-numeric characters as it can
90 leaving nothing more left behind. Thus, the trailing digit will get
91 nothing. Loops are simply implemented in recursive descent code as
92 for/while loops making them extremely efficient. That is a definite
93 advantage. On the other hand, those who are familiar with EBNF and regex
94 behavior might find the behavior a major gotcha. PEG provides a couple
95 of other mechanisms to circumvent this. We will see more of these other
96 mechanisms shortly.
97
98 [heading Difference]
99
100 In some cases, you may want to restrict a certain expression. You can
101 think of a PEG expression as a match for a potentially infinite set of
102 strings. The difference operator allows you to restrict this set:
103
104 a - b
105
106 The expression reads: match `a` but not `b`.
107
108 [note There is no difference operator in __karma__, as the concept does not
109 make sense in the context of output generation.]
110
111 [endsect]
112
113