]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [#manual] |
2 | [section User manual] | |
3 | ||
4 | [section What is a parser] | |
5 | ||
6 | See the [link parser parser] section of the [link reference reference] for the | |
7 | explanation of what a parser is. | |
8 | ||
9 | [section The input of the parsers] | |
10 | ||
11 | Parsers take a [link string `string`] as input, which represents a string | |
12 | for template metaprograms. For example the string `"Hello World!"` can be | |
13 | defined the following way: | |
14 | ||
15 | string<'H','e','l','l','o',' ','W','o','r','l','d','!'> | |
16 | ||
17 | This syntax makes the input of the parsers difficult to read. Metaparse works | |
18 | with compilers using C++98, but the input of the parsers has to be defined the | |
19 | way it is described above. | |
20 | ||
21 | Based on `constexpr`, a feature provided by C++11, Metaparse provides a macro, | |
22 | [link BOOST_METAPARSE_STRING `BOOST_METAPARSE_STRING`] for defining strings: | |
23 | ||
24 | BOOST_METAPARSE_STRING("Hello World!") | |
25 | ||
26 | This defines a [link string `string`] as well, however, it is easier to | |
27 | read. The maximum length of the string that can be defined this way is limited, | |
28 | however, this limit is configurable. It is specified by the | |
29 | `BOOST_METAPARSE_LIMIT_STRING_SIZE` macro. | |
30 | ||
31 | [endsect] | |
32 | ||
33 | [section Source positions] | |
34 | ||
35 | A source position is described using a compile-time data structure. The | |
36 | following functions can be used to query it: | |
37 | ||
38 | * [link get_col `get_col`] | |
39 | * [link get_line `get_line`] | |
40 | ||
41 | The beginning of the input is [link start `start`] which requires | |
42 | `<boost/metaparse/start.hpp>` to be included. | |
43 | ||
44 | [endsect] | |
45 | ||
46 | [section Error handling] | |
47 | ||
48 | An error is described using a compile-time data structure. It contains | |
49 | information about the source position where the error was detected and some | |
50 | [link parsing_error_message description] about the error. | |
51 | [link debug_parsing_error `debug_parsing_error`] can be used to display the | |
52 | error message. Metaparse provides the | |
53 | [link BOOST_METAPARSE_DEFINE_ERROR `BOOST_METAPARSE_DEFINE_ERROR`] macro for | |
54 | defining simple [link parsing_error_message parsing error message]s. | |
55 | ||
56 | [endsect] | |
57 | ||
58 | [section Some examples of simple parsers] | |
59 | ||
60 | * A parser that parses nothing and always succeeds is | |
61 | [link return_ `return_`]. | |
62 | * A parser that always fails is [link fail `fail`]. | |
63 | * A parser that parses one character and returns the parsed character as the | |
64 | result is [link one_char `one_char`]. | |
65 | ||
66 | [endsect] | |
67 | ||
68 | [section Combining parsers] | |
69 | ||
70 | Complex parsers can be built by combining simple parsers. The parser library | |
71 | contains a number of parser combinators that build new parsers from already | |
72 | existing ones. | |
73 | ||
74 | For example | |
75 | [link accept_when `accept_when`]`<Parser, Predicate, RejectErrorMsg>` is a | |
76 | parser. It uses `Parser` to parse the input. When `Parser` rejects the input, | |
77 | the combinator returns the error `Parser` failed with. When `Parser` is | |
78 | successful, the combinator validates the result using `Predicate`. If the | |
79 | predicate returns true, the combinator accepts the input, otherwise it generates | |
80 | an error with the message `RejectErrorMsg`. | |
81 | ||
82 | Having [link accept_when `accept_when`], [link one_char `one_char`] can be | |
83 | used to build parsers that accept only digit characters, only whitespaces, etc. | |
84 | For example [link digit `digit`] accepts only digit characters: | |
85 | ||
86 | typedef | |
87 | boost::metaparse::accept_when< | |
88 | boost::metaparse::one_char, | |
89 | boost::metaparse::util::is_digit, | |
90 | boost::metaparse::errors::digit_expected | |
91 | > | |
92 | digit; | |
93 | ||
94 | [endsect] | |
95 | ||
96 | [section Sequence] | |
97 | ||
98 | The result of a successful parsing is some value and the remaining string that | |
99 | was not parsed. The remaining string can be processed by another parser. The | |
100 | parser library provides a parser combinator, [link sequence `sequence`], | |
101 | that takes a number of parsers as arguments and builds a new parser from them | |
102 | that: | |
103 | ||
104 | * Parses the input using the first parser | |
105 | * If parsing succeeds, it parses the remaining string with the second parser | |
106 | * It continues applying the parsers in order as long as they succeed | |
107 | * If all of them succeed, it returns the list of results | |
108 | * If any of the parsers fails, the combinator fails as well and returns the | |
109 | error the first failing parser returned with | |
110 | ||
111 | [endsect] | |
112 | ||
113 | [#repetition] | |
114 | [section Repetition] | |
115 | ||
116 | It is a common thing to parse a list of things of unknown length. As an example | |
117 | let's start with something simple: the text is a list of numbers. For example: | |
118 | ||
119 | 11 13 3 21 | |
120 | ||
121 | We want the result of parsing to be the sum of these values. Metaparse provides | |
122 | the [link int_ `int_`] parser we can use to parse one of these numbers. | |
123 | Metaparse provides the [link token `token`] combinator to consume the | |
124 | whitespaces after the number. So the following parser parses one number and the | |
125 | whitespaces after it: | |
126 | ||
127 | using int_token = token<int_>; | |
128 | ||
129 | The result of parsing is a boxed integer value: the value of the parsed number. | |
130 | For example parsing | |
131 | [link BOOST_METAPARSE_STRING `BOOST_METAPARSE_STRING`]`("13 ")` gives | |
132 | `boost::mpl::int_<13>` as the result. | |
133 | ||
134 | Our example input is a list of numbers. Each number can be parsed by | |
135 | `int_token`: | |
136 | ||
137 | [$images/metaparse/repeated_diag0.png [width 70%]] | |
138 | ||
139 | This diagram shows how the repeated application of `int_token` can parse the | |
140 | example input. Metaparse provides the [link repeated `repeated`] parser to | |
141 | easily implement this. The result of parsing is a typelist: the list of the | |
142 | individual numbers. | |
143 | ||
144 | [$images/metaparse/repeated_diag1.png [width 70%]] | |
145 | ||
146 | This diagram shows how [link repeated `repeated`]`<int_token>` works. It uses | |
147 | the `int_token` parser repeatedly and builds a `boost::mpl::vector` from the | |
148 | results it provides. | |
149 | ||
150 | But we need the sum of these, so we need to summarise the result. We can do this | |
151 | by wrapping our parser, [link repeated `repeated`]`<int_token>` with | |
152 | [link transform `transform`]. That gives us the opportunity to specify a | |
153 | function transforming this typelist to some other value - the sum of the | |
154 | elements in our case. Initially let's ignore how to summarise the elements in | |
155 | the vector. Let's assume that it can be implemented by a lambda expression and | |
156 | use `boost::mpl::lambda<...>::type` representing that lambda expression. Here is | |
157 | an example using [link transform `transform`] and this lambda expression: | |
158 | ||
159 | using sum_parser = | |
160 | transform< | |
161 | repeated<int_token>, | |
162 | boost::mpl::lambda<...>::type | |
163 | >; | |
164 | ||
165 | The [link transform `transform`]`<>` parser combinator wraps the | |
166 | [link repeated `repeated`]`<int_token>` to build the parser we need. Here is a | |
167 | diagram showing how it works: | |
168 | ||
169 | [$images/metaparse/repeated_diag2.png [width 70%]] | |
170 | ||
171 | As the diagram shows, the | |
172 | [link transform `transform`]`<`[link repeated `repeated`]`<int_token>, ...>` | |
173 | parser parses the input using [link repeated `repeated`]`<int_token>` and then | |
174 | does some processing on the result of parsing. | |
175 | ||
176 | Let's implement the missing lambda expression that tells | |
177 | [link transform `transform`] how to change the result coming from | |
178 | [link repeated `repeated`]`<int_token>`. We can summarise the numbers in a | |
179 | typelist by using Boost.MPL's `fold` or `accumulate`. Here is an example doing | |
180 | that: | |
181 | ||
182 | using sum_op = mpl::lambda<mpl::plus<mpl::_1, mpl::_2>>::type; | |
183 | ||
184 | using sum_parser = | |
185 | transform< | |
186 | repeated<int_token>, | |
187 | mpl::lambda< | |
188 | mpl::fold<mpl::_1, mpl::int_<0>, sum_op> | |
189 | >::type | |
190 | >; | |
191 | ||
192 | Here is an extended version of the above diagram showing what happens here: | |
193 | ||
194 | [$images/metaparse/repeated_diag3.png [width 70%]] | |
195 | ||
196 | This example parses the input, builds the list of numbers and then loops over it | |
197 | and summarises the values. It starts with the second argument of `fold`, | |
198 | `int_<0>` and adds every item of the list of numbers (which is the result of | |
199 | the parser [link repeated `repeated`]`<int_token>`) one by one. | |
200 | ||
201 | [note | |
202 | Note that [link transform `transform`] wraps another parser, | |
203 | [link repeated `repeated`]`<int_token>` here. It parses the input with that | |
204 | parser, gets the result of that parsing and changes that result. | |
205 | [link transform `transform`] itself will be a parser returning that updated | |
206 | result. | |
207 | ] | |
208 | ||
209 | [#introducing-foldl] | |
210 | [section Introducing foldl] | |
211 | ||
212 | It works, however, this is rather inefficient: it has a loop parsing the | |
213 | integers one by one, building a typelist and then it loops over this typelist to | |
214 | summarise the result. Using template metaprograms in your applications can have | |
215 | a serious impact on the compiler's memory usage and the speed of the | |
216 | compilation, therefore I recommend being careful with these things. | |
217 | ||
218 | Metaparse offers more efficient ways of achieving the same result. You don't | |
219 | need two loops: you can merge them together and add every number to your summary | |
220 | right after parsing it. Metaparse offers the [link foldl `foldl`] for this. | |
221 | ||
222 | With [link foldl `foldl`] you specify: | |
223 | ||
224 | * the parser to parse the individual elements of the list | |
225 | (which is `int_token` in our example) | |
226 | * the initial value used for folding (which is `int_<0>` in our example) | |
227 | * the forward operation merging the sub-result we have so far and the value | |
228 | coming from the last application of the parser (this was `sum_op` in our | |
229 | example) | |
230 | ||
231 | Our parser can be implemented this way: | |
232 | ||
233 | using better_sum_parser = foldl<int_token, mpl::int_<0>, sum_op>; | |
234 | ||
235 | As you can see the implementation of the parser is more compact. | |
236 | Here is a diagram showing what happens when you use this parser to parse some | |
237 | input: | |
238 | ||
239 | [$images/metaparse/foldl_diag1.png [width 70%]] | |
240 | ||
241 | As you can see, not only the implementation of the parser is more compact, but | |
242 | it achieves the same result by doing less as well. It parses the input by | |
243 | applying `int_token` repeatedly, just like the previous solution. But it | |
244 | produces the final result without building a typelist as an internal step. Here | |
245 | is how it works internally: | |
246 | ||
247 | [$images/metaparse/foldl_diag2.png [width 70%]] | |
248 | ||
249 | It summarises the results of the repeated `int_token` application using | |
250 | `sum_op`. This implementation is more efficient. It accepts an empty string as a | |
251 | valid input: the sum of it is `0`. It may be good for you, in which case you are | |
252 | done. If you don't wan to accept it, you can use [link foldl1 `foldl1`] instead | |
253 | of [link foldl `foldl`]. This is the same, but it rejects empty input. | |
254 | (Metaparse offers [link repeated1 `repeated1`] as well if you choose the first | |
255 | approach and would like to reject empty string) | |
256 | ||
257 | [endsect] | |
258 | ||
259 | [#introducing-foldr] | |
260 | [section Introducing foldr] | |
261 | ||
262 | [note | |
263 | Note that if you are reading this manual for the first time, you probably want | |
264 | to skip this section and proceed with | |
265 | [link introducing-foldl_start_with_parser Introducing foldl_start_with_parser] | |
266 | ] | |
267 | ||
268 | You might have noticed that Metaparse offers [link foldr `foldr`] as well. The | |
269 | difference between [link foldl `foldl`] and [link foldr `foldr`] is the | |
270 | direction in which the results are summarised. (`l` stands for ['from the Left] | |
271 | and `r` stands for ['from the Right]) Here is a diagram showing how | |
272 | `better_sum_parser` works if it is implemented using [link foldr `foldr`]: | |
273 | ||
274 | [$images/metaparse/foldr_diag1.png [width 70%]] | |
275 | ||
276 | As you can see this is very similar to using [link foldl `foldl`], but the | |
277 | results coming out of the individual applications of `int_token` are summarised | |
278 | in a right-to-left order. As `sum_op` is addition, it does not affect the end | |
279 | result, but in other cases it might. | |
280 | ||
281 | [note | |
282 | Note that the implementation of [link foldl `foldl`] is more efficient than | |
283 | [link foldr `foldr`]. Prefer [link foldl `foldl`] whenever possible. | |
284 | ] | |
285 | ||
286 | As you might expect it, Metaparse offers [link foldr1 `foldr1`] as well, which | |
287 | folds from the right and rejects empty input. | |
288 | ||
289 | [endsect] | |
290 | ||
291 | [#introducing-foldl_start_with_parser] | |
292 | [section Introducing foldl_start_with_parser] | |
293 | ||
294 | Let's change the grammar of our little language. Instead of a list of numbers, | |
295 | let's expect numbers separated by a `+` symbol. Our example input becomes the | |
296 | following: | |
297 | ||
298 | BOOST_METAPARSE_STRING("11 + 13 + 3 + 21") | |
299 | ||
300 | Parsing it with [link foldl `foldl`] or [link repeated `repeated`] is difficult: | |
301 | there has to be a `+` symbol before every element ['except] the first one. None | |
302 | of the already introduced repetition constructs offer a way of treating the | |
303 | first element in a different way. | |
304 | ||
305 | If we forget about the first number for a moment, the rest of the input is | |
306 | `"+ 13 + 3 + 21"`. This can easily be parsed by [link foldl `foldl`] (or | |
307 | [link repeated `repeated`]): | |
308 | ||
309 | using plus_token = token<lit_c<'+'>>; | |
310 | using plus_int = last_of<plus_token, int_token>; | |
311 | ||
312 | using sum_parser2 = foldl<plus_int, int_<0>, sum_op>; | |
313 | ||
314 | It uses `plus_int`, that is [link last_of `last_of`]`<plus_token, int_token>` | |
315 | as the parser that is used repeatedly to get the numbers. It does the following: | |
316 | ||
317 | * Uses `plus_token` to parse the `+` symbol and any whitespace that might follow | |
318 | it. | |
319 | * Uses then `int_token` to parse the number | |
320 | * Combines the above two with [link last_of `last_of`] to use both parsers in | |
321 | order and keep only the result of using the second one (the result of parsing | |
322 | the `+` symbol is thrown away - we don't care about it). | |
323 | ||
324 | This way [link last_of `last_of`]`<plus_token, int_token>` returns the value of | |
325 | the number as the result of parsing, just like our previous parser, `int_token` | |
326 | did. Because of this, it can be used as a drop-in replacement of `int_token` in | |
327 | the previous example and we get a parser for our updated language. Or at least | |
328 | for all number except the first one. | |
329 | ||
330 | This [link foldl `foldl`] can not parse the first element, because it expects a | |
331 | `+` symbol before every number. You might think of making the `+` symbol | |
332 | optional in the above approach - don't do that. It makes the parser accept | |
333 | `"11 + 13 3 21"` as well as the `+` symbol is now optional ['everywhere]. | |
334 | ||
335 | What you could do is parsing the first element with `int_token`, the rest of | |
336 | the elements with the above [link foldl `foldl`]-based solution and add the | |
337 | result of the two. This is left as an exercise to the reader. | |
338 | ||
339 | Metaparse offers [link foldl_start_with_parser `foldl_start_with_parser`] to | |
340 | implement this. [link foldl_start_with_parser `foldl_start_with_parser`] is the | |
341 | same as [link foldl `foldl`]. The difference is that instead of an initial value | |
342 | to combine the list elements with it takes an ['initial parser]: | |
343 | ||
344 | using plus_token = token<lit_c<'+'>>; | |
345 | using plus_int = last_of<plus_token, int_token>; | |
346 | ||
347 | using sum_parser3 = foldl_start_with_parser<plus_int, int_token, sum_op>; | |
348 | ||
349 | [link foldl_start_with_parser `foldl_start_with_parser`] starts with applying | |
350 | that initial parser and uses the result it returns as the initial value for | |
351 | folding. It does the same as [link foldl `foldl`] after that. The following | |
352 | diagram shows how it can be used to parse a list of numbers separated by `+` | |
353 | symbols: | |
354 | ||
355 | [$images/metaparse/foldl_start_with_parser_diag1.png [width 70%]] | |
356 | ||
357 | As the diagram shows, it start parsing the list of numbers with `int_token`, | |
358 | uses its value as the starting value for folding (earlier approaches were using | |
359 | the value `int_<0>` as this starting value). Then it parses all elements of the | |
360 | list by using `plus_int` multiple times. | |
361 | ||
362 | [endsect] | |
363 | ||
364 | [#introducing-foldr_start_with_parser] | |
365 | [section Introducing foldr_start_with_parser] | |
366 | ||
367 | [note | |
368 | Note that if you are reading this manual for the first time, you probably want | |
369 | to skip this section and try creating some parsers using | |
370 | [link foldl_start_with_parser `foldl_start_with_parser`] instead. | |
371 | ] | |
372 | ||
373 | [@foldl_start_with_parser.hpp `foldl_start_with_parser`] has its | |
374 | ['from the right] pair, | |
375 | [link foldr_start_with_parser `foldr_start_with_parser`]. It uses the same | |
376 | elements as [link foldl_start_with_parser `foldl_start_with_parser`] but in a | |
377 | different order. Here is a parser for our example language implemented with | |
378 | [link foldr_start_with_parser `foldr_start_with_parser`]: | |
379 | ||
380 | using plus_token = token<lit_c<'+'>>; | |
381 | using int_plus = first_of<int_token, plus_token>; | |
382 | ||
383 | using sum_parser4 = foldr_start_with_parser<int_plus, int_token, sum_op>; | |
384 | ||
385 | Note that it uses `int_plus` instead of `plus_int`. This is because the parser | |
386 | the initial value for folding comes from is used after `int_plus` has parsed the | |
387 | input as many times as it could. It might sound strange for the first time, but | |
388 | the following diagram should help you understand how it works: | |
389 | ||
390 | [$images/metaparse/foldr_start_with_parser_diag1.png [width 70%]] | |
391 | ||
392 | As you can see, it starts with the parser that is applied repeatedly on the | |
393 | input, thus instead of parsing `plus_token int_token` repeatedly, we need to | |
394 | parse `int_token plus_token` repeatedly. The last number is not followed by `+`, | |
395 | thus `int_plus` fails to parse it and it stops the iteration. | |
396 | [link foldr_start_with_parser `foldr_start_with_parser`] then uses the other | |
397 | parser, `int_token` to parse the input. It succeeds and the result it returns is | |
398 | used as the starting value for folding from the right. | |
399 | ||
400 | [note | |
401 | Note that as the above description also suggests, the implementation of | |
402 | [link foldl_start_with_parser `foldl_start_with_parser`] is more efficient | |
403 | than [link foldr_start_with_parser `foldr_start_with_parser`]. Prefer | |
404 | [link foldl_start_with_parser `foldl_start_with_parser`] whenever possible. | |
405 | ] | |
406 | ||
407 | [endsect] | |
408 | ||
409 | [#introducing-foldl_reject_incomplete_start_with_parser] | |
410 | [section Introducing foldl_reject_incomplete_start_with_parser] | |
411 | ||
412 | Using a parser built with | |
413 | [link foldl_start_with_parser `foldl_start_with_parser`] we can parse the input | |
414 | when the input is correct. However, it is not always the case. Consider the | |
415 | following input for example: | |
416 | ||
417 | BOOST_METAPARSE_STRING("11 + 13 + 3 + 21 +") | |
418 | ||
419 | This is an invalid expression. However, if we parse it using the | |
420 | [link foldl_start_with_parser `foldl_start_with_parser`]-based parser presented | |
421 | earlier (`sum_parser3`), it accepts the input and the result is `48`. This is | |
422 | because [link foldl_start_with_parser `foldl_start_with_parser`] parses the | |
423 | input ['as long as it can]. It parses the first`int_token` (`11`) and then it | |
424 | starts parsing the `plus_int` elements (`+ 13`, `+ 3`, `+ 21`). After parsing | |
425 | all of these, it tries to parse the remaining `" +"` input using `plus_int` | |
426 | which fails and therefore | |
427 | [link foldl_start_with_parser `foldl_start_with_parser`] stops after `+ 21`. | |
428 | ||
429 | The problem is that the parser parses the longest sub-expression starting from | |
430 | the beginning, that represents a valid expression. The rest is ignored. The | |
431 | parser can be wrapped by [link entire_input `entire_input`] to make sure to | |
432 | reject expressions with invalid extra characters at the end, however, that | |
433 | won't make the error message useful. ([link entire_input `entire_input`] can | |
434 | only tell the author of the invalid expression that after `+ 21` is something | |
435 | wrong). | |
436 | ||
437 | Metaparse provides | |
438 | [link foldl_reject_incomplete_start_with_parser `foldl_reject_incomplete_start_with_parser`], | |
439 | which does the same as [link foldl_start_with_parser `foldl_start_with_parser`], | |
440 | except that once no further repetitions are found, it checks ['where] the | |
441 | repeated parser (in our example `plus_int`) fails. When it can make any progress | |
442 | (eg. it finds a `+` symbol), then | |
443 | [link foldl_reject_incomplete_start_with_parser `foldl_reject_incomplete_start_with_parser`] | |
444 | assumes, that the expression's author intended to make the repetition longer, | |
445 | but made a mistake and propagates the error message coming from that last broken | |
446 | expression. | |
447 | ||
448 | [$images/metaparse/foldl_reject_incomplete_start_with_parser_diag1.png [width 70%]] | |
449 | ||
450 | The above diagram shows how | |
451 | [link foldl_reject_incomplete_start_with_parser `foldl_reject_incomplete_start_with_parser`] | |
452 | parses the example invalid input and how it fails. This can be used for better | |
453 | error reporting from the parsers. | |
454 | ||
455 | Other folding parsers also have their `f` version. (eg. | |
456 | [link foldr_reject_incomplete `foldr_reject_incomplete`], | |
457 | [link foldl_reject_incomplete1 `foldl_reject_incomplete1`], etc). | |
458 | ||
459 | [endsect] | |
460 | ||
461 | [#finding-the-right-folding-parser-combinator] | |
462 | [section Finding the right folding parser combinator] | |
463 | ||
464 | As you might have noticed, there are a lot of different folding parser | |
465 | combinators. To help you find the right one, the following naming convention is | |
466 | used: | |
467 | ||
468 | [$images/metaparse/folds.png [width 70%]] | |
469 | ||
470 | [note | |
471 | Note that there is no `foldr_reject_incomplete_start_with_parser`. The `p` | |
472 | version of the right-folding parsers applies the special parser, whose result | |
473 | is the initial value, after the repeated elements. Therefore, when the parser | |
474 | parsing one repeated element fails, `foldr_start_with_parser` would apply that | |
475 | special final parser instead of checking how the repeated element's parser | |
476 | failed. | |
477 | ] | |
478 | ||
479 | [endsect] | |
480 | ||
481 | [endsect] | |
482 | ||
483 | [#result_types] | |
484 | [section What can be built from a compile-time string?] | |
485 | ||
486 | Parsers built using Metaparse are template metaprograms parsing text (or code) | |
487 | at compile-time. Here is a list of things that can be the "result" of parsing: | |
488 | ||
489 | * A ['type]. An example for this is a parser parsing a `printf` format string | |
490 | and returning the typelist (eg. `boost::mpl::vector`) of the expected | |
491 | arguments. | |
492 | * A ['constant value]. An example for this is the result of a calculator | |
493 | language. See the [link getting_started Getting Started] section for further | |
494 | details. | |
495 | * A ['runtime object]. A static runtime object can be generated that might be | |
496 | used at runtime. An example for this is parsing regular expressions at | |
497 | compile-time and building `boost::xpressive::sregex` objects. See the | |
498 | `regex` example of Metaparse for an example. | |
499 | * A C++ ['function], which might be called at runtime. A C++ function can be | |
500 | generated that can be called at runtime. It is good for generating native | |
501 | (and optimised) code from EDSLs. See the `compile_to_native_code` example of | |
502 | Metaparse as an example for this. | |
503 | * A [link metafunction_class ['template metafunction class]]. The result of | |
504 | parsing might be a type, which is a | |
505 | [link metafunction_class template metafunction class]. This is good for | |
506 | building an EDSL for template metaprogramming. See the `meta_hs` example of | |
507 | Metaparse as an example for this. | |
508 | ||
509 | [endsect] | |
510 | ||
511 | [section Grammars] | |
512 | ||
513 | Metaparse provides a way to define grammars in a syntax that resembles EBNF. The | |
514 | [link grammar `grammar`] template can be used to define a grammar. It can be | |
515 | used the following way: | |
516 | ||
517 | grammar<BOOST_METAPARSE_STRING("plus_exp")> | |
518 | ::import<BOOST_METAPARSE_STRING("int_token"), token<int_>>::type | |
519 | ||
520 | ::rule<BOOST_METAPARSE_STRING("ws ::= (' ' | '\n' | '\r' | '\t')*")>::type | |
521 | ::rule<BOOST_METAPARSE_STRING("plus_token ::= '+' ws"), front<_1>>::type | |
522 | ::rule<BOOST_METAPARSE_STRING("plus_exp ::= int_token (plus_token int_token)*"), plus_action>::type | |
523 | ||
524 | The code above defines a parser from a grammar definition. The start symbol of | |
525 | the grammar is `plus_exp`. The lines beginning with `::rule` define rules. | |
526 | Rules optionally have a semantic action, which is a metafunction class that | |
527 | transforms the result of parsing after the rule has been applied. | |
528 | Existing parsers can be bound to names and be used in the rules by importing | |
529 | them. Lines beginning with `::import` bind existing parsers to names. | |
530 | ||
531 | The result of a grammar definition is a parser which can be given to other | |
532 | parser combinators or be used directly. Given that grammars can import existing | |
533 | parsers and build new ones, they are parser combinators as well. | |
534 | ||
535 | [endsect] | |
536 | ||
537 | [endsect] | |
538 | ||
539 | [section Parsing based on `constexpr`] | |
540 | ||
541 | Metaparse is based on template metaprogramming, however, C++11 provides | |
542 | `constexpr`, which can be used for parsing at compile-time as well. While | |
543 | implementing parsers based on `constexpr` is easier for a C++ developer, since | |
544 | its syntax resembles the regular syntax of the language, the result of parsing | |
545 | has to be a `constexpr` value. Parsers based on template metaprogramming can | |
546 | build types as the result of parsing. These types may be boxed `constexpr` | |
547 | values but can be metafunction classes, classes with static functions which can | |
548 | be called at runtime, etc. | |
549 | ||
550 | When a parser built with Metaparse needs a sub-parser for processing a part of | |
551 | the input text and generating a `constexpr` value as the result of parsing, one | |
552 | can implement the sub-parser based on `constexpr` functions. Metaparse | |
553 | can be integrated with them and lift their results into C++ template | |
554 | metaprogramming. An example demonstrating this feature can be found among the | |
555 | examples (`constexpr_parser`). This capability makes it possible to integrate | |
556 | Metaparse with parsing libraries based on `constexpr`. | |
557 | ||
558 | [endsect] | |
559 | ||
560 | [section What types of grammars can be used?] | |
561 | ||
562 | It is possible to write parsers for ['context free grammars] using Metaparse. | |
563 | However, this is not the most general category of grammars that can be used. As | |
564 | Metaparse is a highly extendable framework, it is not clear what should be | |
565 | considered to be the limit of Metaparse itself. For example Metaparse provides | |
566 | the [link accept_when `accept_when`] [link parser_combinator parser combinator]. | |
567 | It can be used to provide arbitrary predicates for enabled/disabling a specific | |
568 | rule. One can go as far as providing the Turing machine (as a | |
569 | [link metafunction metafunction]) of the entire grammar as a predicate, so one | |
570 | can build parsers for ['unrestricted grammars] that can be parsed using a Turing | |
571 | machine. Note that such a parser would not be considered to be a parser built | |
572 | with Metaparse, however, it is not clear how far a solution might go and still | |
573 | be considered using Metaparse. | |
574 | ||
575 | Metaparse assumes that the parsers are ['deterministic], as they have only "one" | |
576 | result. It is of course possible to write parsers and combinators that return a | |
577 | set (or list or some other container) of results as that "one" result, but that | |
578 | can be considered building a new parser library. There is no clear boundary for | |
579 | Metaparse. | |
580 | ||
581 | Metaparse supports building ['top-down parsers] and ['left-recursion] is not | |
582 | supported as it would lead to infinite recursion. ['Right-recursion] is | |
583 | supported, however, in most cases the | |
584 | [link repetition iterative parser combinators] provide better alternatives. | |
585 | ||
586 | [endsect] | |
587 | ||
588 | [endsect] | |
589 |