]> git.proxmox.com Git - ceph.git/blob - ceph/src/boost/libs/spirit/doc/rationale.qbk
bump version to 12.2.2-pve1
[ceph.git] / ceph / src / boost / libs / spirit / doc / rationale.qbk
1 [/==============================================================================
2 Copyright (C) 2001-2011 Joel de Guzman
3 Copyright (C) 2001-2011 Hartmut Kaiser
4
5 Distributed under the Boost Software License, Version 1.0. (See accompanying
6 file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
7 ===============================================================================/]
8
9 [section Rationale]
10
11 [heading Naming]
12
13 Why use the name "Spirit", "Qi" and "Karma"? Because xpressive names
14 have a better spirit, brings qi to your software and will enhance your
15 karma so they can heal your (con)fusion and make you wave like a phoenix
16 from the ashes. (Joachim Faulhaber)
17
18 [heading Type Erasure: From static to dynamic C++]
19
20 Rules straddle the border between static and dynamic C++. In effect, a
21 rule transforms compile-time polymorphism (using templates) into
22 run-time polymorphism (using virtual functions). This is necessary due
23 to C++'s inability to automatically declare a variable of a type deduced
24 from an arbitrarily complex expression in the right-hand side (rhs) of
25 an assignment. Basically, we want to do something like:
26
27 T rule = an_arbitrarily_complex_expression;
28
29 without having to know or care about the resulting type of the
30 right-hand side (rhs) of the assignment expression. This can be done
31 with modern C++ 0x compilers using `auto`:
32
33 auto rule = an_arbitrarily_complex_expression;
34
35 Apart from this, we also need a facility to forward declare an unknown
36 type:
37
38 T rule;
39 ...
40 rule = a | b;
41
42 These limitations lead us to this implementation of rules. This comes at
43 the expense of the overhead of a type-erased call which is an indirect
44 function call that connot be inlined, once through each invocation of a
45 rule.
46
47 [heading Multiple declaration]
48
49 Some BNF variants allow multiple declarations of a rule. The
50 declarations are taken as alternatives. Example:
51
52 r = a;
53 r = b;
54
55 is equivalent to:
56
57 r = a | b;
58
59 Spirit v1.3 allowed this behavior. However, the current version of
60 Spirit no longer allows this because experience shows that this behavior
61 leads to unwanted gotchas (for instance, it does not allow rules to be
62 held in containers). In the current release of Spirit, a second
63 assignment to a rule will simply redefine it. The old definition is
64 destructed. This follows more closely C++ semantics and is more in line
65 with what the user expects the rule to behave.
66
67 [heading Sequencing Syntax]
68
69 The comma operator as in `a, b` seems to be a better candidate,
70 syntax-wise. But then the problem is with its precedence. It has the
71 lowest precedence in C/C++, which makes it virtually useless.
72
73 Bjarne Stroustrup, in his article "Generalizing Overloading for C++2000"
74 talks about overloading whitespace. Such a feature would allow
75 juxtapositioning of parser objects exactly as we do in (E)BNF (e.g. a b
76 | c instead of a >> b | c). Unfortunately, the article was dated April
77 1, 1998. Oh well.
78
79 [heading Forward iterators]
80
81 In general, the expected iterator is at least a standard conforming
82 forward iterator. Forward iterators are needed for backtracking where
83 the iterator needs to be saved and restored later. Generally speaking,
84 Spirit is a backtracking parser. The implication of this is that at some
85 point, the iterator position will have to be saved to allow the parser
86 to backtrack to a previous point. Thus, for backtracking to work, the
87 framework requires at least a forward iterator.
88
89 There are remedies of course. In cases where we need to use input
90 iterators, you can use the __multi_pass__ iterator to make the forward
91 iterators.
92
93 Some parsers might require more specialized iterators (bi-directional or
94 even random access). Perhaps in the future, deterministic parsers when
95 added to the framework, will perform no backtracking and will need just
96 a single token lookahead, hence will require input iterators only.
97
98 [heading Exhaustive backtracking and greedy RD]
99
100 Spirit doesn't do exhaustive backtracking like regular expressions are
101 expected to. For example:
102
103 *char_('a') >> char_('a');
104
105 will always fail to match because Spirit's Kleene star does not back off
106 when the rest of the rule fails to match.
107
108 Actually, there's a solution to this greedy RD problem. Such a scheme is
109 discussed in section 6.6.2 of Parsing Techniques: A Practical Guide. The
110 trick involves passing a tail parser (in addition to the scanner) to
111 each parser. The start parser will then simply be:
112
113 start >> end;
114
115 (end is the start's tail).
116
117 Spirit is greedy --using straight forward, naive RD. It is certainly
118 possible to implement the fully backtracking scheme presented above, but
119 there will be also certainly be a performance hit. The scheme will
120 always try to match all possible parser paths (full parser hierarchy
121 traversal) until it reaches a point of certainty, that the whole thing
122 matches or fails to match.
123
124 [:Backtracking and Greedy RD
125
126 Spirit is quite consistent and intuitive about when it backtracks and to
127 where, although it may not be obvious to those coming from different
128 backgrounds. In general, any (sub)parser will, given the same input,
129 always match the same portion of the input (or fail to match the input
130 at all). This means that Spirit is inherently greedy. Spirit will only
131 backtrack when a (sub)parser fails to match the input, and it will
132 always backtrack to the next choice point upward (not backward) in the
133 parser structure. In other words abb|ab will match `"ab"`, as will
134 `a(bb|b)`, but `(ab|a)b` won't because the `(ab|a)` subparser will
135 always match the `'b'` after the `'a'` if it is available.
136
137 --Rainer Deyke]
138
139 This is the very nature of __peg__.
140
141 There's a strong preference on "simplicity with all the knobs when you
142 need them" approach, right now. On the other hand, the flexibility of
143 Spirit makes it possible to have different optional schemes available.
144 It might be possible to implement an exhaustive backtracking RD scheme
145 as an optional feature in the future.
146
147 [endsect]