]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [/============================================================================== |
2 | Copyright (C) 2001-2011 Joel de Guzman | |
3 | Copyright (C) 2001-2011 Hartmut Kaiser | |
4 | ||
5 | Distributed under the Boost Software License, Version 1.0. (See accompanying | |
6 | file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) | |
7 | ===============================================================================/] | |
8 | ||
9 | [section Rationale] | |
10 | ||
11 | [heading Naming] | |
12 | ||
13 | Why use the name "Spirit", "Qi" and "Karma"? Because xpressive names | |
14 | have a better spirit, brings qi to your software and will enhance your | |
15 | karma so they can heal your (con)fusion and make you wave like a phoenix | |
16 | from the ashes. (Joachim Faulhaber) | |
17 | ||
18 | [heading Type Erasure: From static to dynamic C++] | |
19 | ||
20 | Rules straddle the border between static and dynamic C++. In effect, a | |
21 | rule transforms compile-time polymorphism (using templates) into | |
22 | run-time polymorphism (using virtual functions). This is necessary due | |
23 | to C++'s inability to automatically declare a variable of a type deduced | |
24 | from an arbitrarily complex expression in the right-hand side (rhs) of | |
25 | an assignment. Basically, we want to do something like: | |
26 | ||
27 | T rule = an_arbitrarily_complex_expression; | |
28 | ||
29 | without having to know or care about the resulting type of the | |
30 | right-hand side (rhs) of the assignment expression. This can be done | |
31 | with modern C++ 0x compilers using `auto`: | |
32 | ||
33 | auto rule = an_arbitrarily_complex_expression; | |
34 | ||
35 | Apart from this, we also need a facility to forward declare an unknown | |
36 | type: | |
37 | ||
38 | T rule; | |
39 | ... | |
40 | rule = a | b; | |
41 | ||
42 | These limitations lead us to this implementation of rules. This comes at | |
43 | the expense of the overhead of a type-erased call which is an indirect | |
44 | function call that connot be inlined, once through each invocation of a | |
45 | rule. | |
46 | ||
47 | [heading Multiple declaration] | |
48 | ||
49 | Some BNF variants allow multiple declarations of a rule. The | |
50 | declarations are taken as alternatives. Example: | |
51 | ||
52 | r = a; | |
53 | r = b; | |
54 | ||
55 | is equivalent to: | |
56 | ||
57 | r = a | b; | |
58 | ||
59 | Spirit v1.3 allowed this behavior. However, the current version of | |
60 | Spirit no longer allows this because experience shows that this behavior | |
61 | leads to unwanted gotchas (for instance, it does not allow rules to be | |
62 | held in containers). In the current release of Spirit, a second | |
63 | assignment to a rule will simply redefine it. The old definition is | |
64 | destructed. This follows more closely C++ semantics and is more in line | |
65 | with what the user expects the rule to behave. | |
66 | ||
67 | [heading Sequencing Syntax] | |
68 | ||
69 | The comma operator as in `a, b` seems to be a better candidate, | |
70 | syntax-wise. But then the problem is with its precedence. It has the | |
71 | lowest precedence in C/C++, which makes it virtually useless. | |
72 | ||
73 | Bjarne Stroustrup, in his article "Generalizing Overloading for C++2000" | |
74 | talks about overloading whitespace. Such a feature would allow | |
75 | juxtapositioning of parser objects exactly as we do in (E)BNF (e.g. a b | |
76 | | c instead of a >> b | c). Unfortunately, the article was dated April | |
77 | 1, 1998. Oh well. | |
78 | ||
79 | [heading Forward iterators] | |
80 | ||
81 | In general, the expected iterator is at least a standard conforming | |
82 | forward iterator. Forward iterators are needed for backtracking where | |
83 | the iterator needs to be saved and restored later. Generally speaking, | |
84 | Spirit is a backtracking parser. The implication of this is that at some | |
85 | point, the iterator position will have to be saved to allow the parser | |
86 | to backtrack to a previous point. Thus, for backtracking to work, the | |
87 | framework requires at least a forward iterator. | |
88 | ||
89 | There are remedies of course. In cases where we need to use input | |
90 | iterators, you can use the __multi_pass__ iterator to make the forward | |
91 | iterators. | |
92 | ||
93 | Some parsers might require more specialized iterators (bi-directional or | |
94 | even random access). Perhaps in the future, deterministic parsers when | |
95 | added to the framework, will perform no backtracking and will need just | |
96 | a single token lookahead, hence will require input iterators only. | |
97 | ||
98 | [heading Exhaustive backtracking and greedy RD] | |
99 | ||
100 | Spirit doesn't do exhaustive backtracking like regular expressions are | |
101 | expected to. For example: | |
102 | ||
103 | *char_('a') >> char_('a'); | |
104 | ||
105 | will always fail to match because Spirit's Kleene star does not back off | |
106 | when the rest of the rule fails to match. | |
107 | ||
108 | Actually, there's a solution to this greedy RD problem. Such a scheme is | |
109 | discussed in section 6.6.2 of Parsing Techniques: A Practical Guide. The | |
110 | trick involves passing a tail parser (in addition to the scanner) to | |
111 | each parser. The start parser will then simply be: | |
112 | ||
113 | start >> end; | |
114 | ||
115 | (end is the start's tail). | |
116 | ||
117 | Spirit is greedy --using straight forward, naive RD. It is certainly | |
118 | possible to implement the fully backtracking scheme presented above, but | |
119 | there will be also certainly be a performance hit. The scheme will | |
120 | always try to match all possible parser paths (full parser hierarchy | |
121 | traversal) until it reaches a point of certainty, that the whole thing | |
122 | matches or fails to match. | |
123 | ||
124 | [:Backtracking and Greedy RD | |
125 | ||
126 | Spirit is quite consistent and intuitive about when it backtracks and to | |
127 | where, although it may not be obvious to those coming from different | |
128 | backgrounds. In general, any (sub)parser will, given the same input, | |
129 | always match the same portion of the input (or fail to match the input | |
130 | at all). This means that Spirit is inherently greedy. Spirit will only | |
131 | backtrack when a (sub)parser fails to match the input, and it will | |
132 | always backtrack to the next choice point upward (not backward) in the | |
133 | parser structure. In other words abb|ab will match `"ab"`, as will | |
134 | `a(bb|b)`, but `(ab|a)b` won't because the `(ab|a)` subparser will | |
135 | always match the `'b'` after the `'a'` if it is available. | |
136 | ||
137 | --Rainer Deyke] | |
138 | ||
139 | This is the very nature of __peg__. | |
140 | ||
141 | There's a strong preference on "simplicity with all the knobs when you | |
142 | need them" approach, right now. On the other hand, the flexibility of | |
143 | Spirit makes it possible to have different optional schemes available. | |
144 | It might be possible to implement an exhaustive backtracking RD scheme | |
145 | as an optional feature in the future. | |
146 | ||
147 | [endsect] |