]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [/============================================================================== |
2 | Copyright (C) 2001-2011 Joel de Guzman | |
3 | Copyright (C) 2001-2011 Hartmut Kaiser | |
4 | ||
5 | Distributed under the Boost Software License, Version 1.0. (See accompanying | |
6 | file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) | |
7 | ===============================================================================/] | |
8 | ||
9 | [section:lexer_semantic_actions Lexer Semantic Actions] | |
10 | ||
11 | The main task of a lexer normally is to recognize tokens in the input. | |
12 | Traditionally this has been complemented with the possibility to execute | |
13 | arbitrary code whenever a certain token has been detected. __lex__ has been | |
14 | designed to support this mode of operation as well. We borrow from the concept | |
15 | of semantic actions for parsers (__qi__) and generators (__karma__). Lexer | |
16 | semantic actions may be attached to any token definition. These are C++ | |
17 | functions or function objects that are called whenever a token definition | |
18 | successfully recognizes a portion of the input. Say you have a token definition | |
19 | `D`, and a C++ function `f`, you can make the lexer call `f` whenever it matches | |
20 | an input by attaching `f`: | |
21 | ||
22 | D[f] | |
23 | ||
24 | The expression above links `f` to the token definition, `D`. The required | |
25 | prototype of `f` is: | |
26 | ||
27 | void f (Iterator& start, Iterator& end, pass_flag& matched, Idtype& id, Context& ctx); | |
28 | ||
29 | [variablelist where: | |
30 | [[`Iterator& start`] [This is the iterator pointing to the begin of the | |
31 | matched range in the underlying input sequence. The | |
32 | type of the iterator is the same as specified while | |
33 | defining the type of the `lexertl::actor_lexer<...>` | |
34 | (its first template parameter). The semantic action | |
35 | is allowed to change the value of this iterator | |
36 | influencing, the matched input sequence.]] | |
37 | [[`Iterator& end`] [This is the iterator pointing to the end of the | |
38 | matched range in the underlying input sequence. The | |
39 | type of the iterator is the same as specified while | |
40 | defining the type of the `lexertl::actor_lexer<...>` | |
41 | (its first template parameter). The semantic action | |
42 | is allowed to change the value of this iterator | |
43 | influencing, the matched input sequence.]] | |
44 | [[`pass_flag& matched`] [This value is pre/initialized to `pass_normal`. | |
45 | If the semantic action sets it to `pass_fail` this | |
46 | behaves as if the token has not been matched in | |
47 | the first place. If the semantic action sets this | |
48 | to `pass_ignore` the lexer ignores the current | |
49 | token and tries to match a next token from the | |
50 | input.]] | |
51 | [[`Idtype& id`] [This is the token id of the type Idtype (most of | |
52 | the time this will be a `std::size_t`) for the | |
53 | matched token. The semantic action is allowed to | |
54 | change the value of this token id, influencing the | |
55 | if of the created token.]] | |
56 | [[`Context& ctx`] [This is a reference to a lexer specific, | |
57 | unspecified type, providing the context for the | |
58 | current lexer state. It can be used to access | |
59 | different internal data items and is needed for | |
60 | lexer state control from inside a semantic | |
61 | action.]] | |
62 | ] | |
63 | ||
64 | When using a C++ function as the semantic action the following prototypes are | |
65 | allowed as well: | |
66 | ||
67 | void f (Iterator& start, Iterator& end, pass_flag& matched, Idtype& id); | |
68 | void f (Iterator& start, Iterator& end, pass_flag& matched); | |
69 | void f (Iterator& start, Iterator& end); | |
70 | void f (); | |
71 | ||
72 | [important In order to use lexer semantic actions you need to use type | |
73 | `lexertl::actor_lexer<>` as your lexer class (instead of the | |
74 | type `lexertl::lexer<>` as described in earlier examples).] | |
75 | ||
76 | [heading The context of a lexer semantic action] | |
77 | ||
78 | The last parameter passed to any lexer semantic action is a reference to an | |
79 | unspecified type (see the `Context` type in the table above). This type is | |
80 | unspecified because it depends on the token type returned by the lexer. It is | |
81 | implemented in the internals of the iterator type exposed by the lexer. | |
82 | Nevertheless, any context type is expected to expose a couple of | |
83 | functions allowing to influence the behavior of the lexer. The following table | |
84 | gives an overview and a short description of the available functionality. | |
85 | ||
86 | [table Functions exposed by any context passed to a lexer semantic action | |
87 | [[Name] [Description]] | |
88 | [[`Iterator const& get_eoi() const`] | |
89 | [The function `get_eoi()` may be used by to access the end iterator of | |
90 | the input stream the lexer has been initialized with]] | |
91 | [[`void more()`] | |
92 | [The function `more()` tells the lexer that the next time it matches a | |
93 | rule, the corresponding token should be appended onto the current token | |
94 | value rather than replacing it.]] | |
95 | [[`Iterator const& less(Iterator const& it, int n)`] | |
96 | [The function `less()` returns an iterator positioned to the nth input | |
97 | character beyond the current token start iterator (i.e. by passing the | |
98 | return value to the parameter `end` it is possible to return all but the | |
99 | first n characters of the current token back to the input stream.]] | |
100 | [[`bool lookahead(std::size_t id)`] | |
101 | [The function `lookahead()` can be used to implement lookahead for lexer | |
102 | engines not supporting constructs like flex' `a/b` | |
103 | (match `a`, but only when followed by `b`). It invokes the lexer on the | |
104 | input following the current token without actually moving forward in the | |
105 | input stream. The function returns whether the lexer was able to match a | |
106 | token with the given token-id `id`.]] | |
107 | [[`std::size_t get_state() const` and `void set_state(std::size_t state)`] | |
108 | [The functions `get_state()` and `set_state()` may be used to introspect | |
109 | and change the current lexer state.]] | |
110 | [[`token_value_type get_value() const` and `void set_value(Value const&)`] | |
111 | [The functions `get_value()` and `set_value()` may be used to introspect | |
112 | and change the current token value.]] | |
113 | ] | |
114 | ||
115 | [heading Lexer Semantic Actions Using Phoenix] | |
116 | ||
117 | Even if it is possible to write your own function object implementations (i.e. | |
118 | using Boost.Lambda or Boost.Bind), the preferred way of defining lexer semantic | |
119 | actions is to use __boost_phoenix__. In this case you can access the parameters | |
120 | described above by using the predefined __spirit__ placeholders: | |
121 | ||
122 | [table Predefined Phoenix placeholders for lexer semantic actions | |
123 | [[Placeholder] [Description]] | |
124 | [[`_start`] | |
125 | [Refers to the iterator pointing to the beginning of the matched input | |
126 | sequence. Any modifications to this iterator value will be reflected in | |
127 | the generated token.]] | |
128 | [[`_end`] | |
129 | [Refers to the iterator pointing past the end of the matched input | |
130 | sequence. Any modifications to this iterator value will be reflected in | |
131 | the generated token.]] | |
132 | [[`_pass`] | |
133 | [References the value signaling the outcome of the semantic action. This | |
134 | is pre-initialized to `lex::pass_flags::pass_normal`. If this is set to | |
135 | `lex::pass_flags::pass_fail`, the lexer will behave as if no token has | |
136 | been matched, if is set to `lex::pass_flags::pass_ignore`, the lexer will | |
137 | ignore the current match and proceed trying to match tokens from the | |
138 | input.]] | |
139 | [[`_tokenid`] | |
140 | [Refers to the token id of the token to be generated. Any modifications | |
141 | to this value will be reflected in the generated token.]] | |
142 | [[`_val`] | |
143 | [Refers to the value the next token will be initialized from. Any | |
144 | modifications to this value will be reflected in the generated token.]] | |
145 | [[`_state`] | |
146 | [Refers to the lexer state the input has been match in. Any modifications | |
147 | to this value will be reflected in the lexer itself (the next match will | |
148 | start in the new state). The currently generated token is not affected | |
149 | by changes to this variable.]] | |
150 | [[`_eoi`] | |
151 | [References the end iterator of the overall lexer input. This value | |
152 | cannot be changed.]] | |
153 | ] | |
154 | ||
155 | The context object passed as the last parameter to any lexer semantic action is | |
156 | not directly accessible while using __boost_phoenix__ expressions. We rather provide | |
157 | predefined Phoenix functions allowing to invoke the different support functions | |
158 | as mentioned above. The following table lists the available support functions | |
159 | and describes their functionality: | |
160 | ||
161 | [table Support functions usable from Phoenix expressions inside lexer semantic actions | |
162 | [[Plain function] [Phoenix function] [Description]] | |
163 | [[`ctx.more()`] | |
164 | [`more()`] | |
165 | [The function `more()` tells the lexer that the next time it matches a | |
166 | rule, the corresponding token should be appended onto the current token | |
167 | value rather than replacing it.]] | |
168 | [[`ctx.less()`] | |
169 | [`less(n)`] | |
170 | [The function `less()` takes a single integer parameter `n` and returns an | |
171 | iterator positioned to the nth input character beyond the current token | |
172 | start iterator (i.e. by assigning the return value to the placeholder | |
173 | `_end` it is possible to return all but the first `n` characters of the | |
174 | current token back to the input stream.]] | |
175 | [[`ctx.lookahead()`] | |
176 | [`lookahead(std::size_t)` or `lookahead(token_def)`] | |
177 | [The function `lookahead()` takes a single parameter specifying the token | |
178 | to match in the input. The function can be used for instance to implement | |
179 | lookahead for lexer engines not supporting constructs like flex' `a/b` | |
180 | (match `a`, but only when followed by `b`). It invokes the lexer on the | |
181 | input following the current token without actually moving forward in the | |
182 | input stream. The function returns whether the lexer was able to match | |
183 | the specified token.]] | |
184 | ] | |
185 | ||
186 | [endsect] |