]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [/============================================================================== |
2 | Copyright (C) 2001-2011 Joel de Guzman | |
3 | Copyright (C) 2001-2011 Hartmut Kaiser | |
4 | ||
5 | Distributed under the Boost Software License, Version 1.0. (See accompanying | |
6 | file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) | |
7 | ===============================================================================/] | |
8 | ||
9 | [section:lexer_quickstart2 Quickstart 2 - A better word counter using __lex__] | |
10 | ||
11 | People familiar with __flex__ will probably complain about the example from the | |
12 | section __sec_lex_quickstart_1__ as being overly complex and not being | |
13 | written to leverage the possibilities provided by this tool. In particular the | |
14 | previous example did not directly use the lexer actions to count the lines, | |
15 | words, and characters. So the example provided in this step of the tutorial will | |
16 | show how to use semantic actions in __lex__. Even though this examples still | |
17 | counts textual elements, the purpose is to introduce new concepts and | |
18 | configuration options along the lines (for the full example code | |
19 | see here: [@../../example/lex/word_count_lexer.cpp word_count_lexer.cpp]). | |
20 | ||
21 | [import ../example/lex/word_count_lexer.cpp] | |
22 | ||
23 | ||
24 | [heading Prerequisites] | |
25 | ||
26 | In addition to the only required `#include` specific to /Spirit.Lex/ this | |
27 | example needs to include a couple of header files from the __boost_phoenix__ | |
28 | library. This example shows how to attach functors to token definitions, which | |
29 | could be done using any type of C++ technique resulting in a callable object. | |
30 | Using __boost_phoenix__ for this task simplifies things and avoids adding | |
31 | dependencies to other libraries (__boost_phoenix__ is already in use for | |
32 | __spirit__ anyway). | |
33 | ||
34 | [wcl_includes] | |
35 | ||
36 | To make all the code below more readable we introduce the following namespaces. | |
37 | ||
38 | [wcl_namespaces] | |
39 | ||
40 | To give a preview at what to expect from this example, here is the flex program | |
41 | which has been used as the starting point. The useful code is directly included | |
42 | inside the actions associated with each of the token definitions. | |
43 | ||
44 | [wcl_flex_version] | |
45 | ||
46 | ||
47 | [heading Semantic Actions in __lex__] | |
48 | ||
49 | __lex__ uses a very similar way of associating actions with the token | |
50 | definitions (which should look familiar to anybody knowledgeable with | |
51 | __spirit__ as well): specifying the operations to execute inside of a pair of | |
52 | `[]` brackets. In order to be able to attach semantic actions to token | |
53 | definitions for each of them there is defined an instance of a `token_def<>`. | |
54 | ||
55 | [wcl_token_definition] | |
56 | ||
57 | The semantics of the shown code is as follows. The code inside the `[]` | |
58 | brackets will be executed whenever the corresponding token has been matched by | |
59 | the lexical analyzer. This is very similar to __flex__, where the action code | |
60 | associated with a token definition gets executed after the recognition of a | |
61 | matching input sequence. The code above uses function objects constructed using | |
62 | __boost_phoenix__, but it is possible to insert any C++ function or function object | |
63 | as long as it exposes the proper interface. For more details on please refer | |
64 | to the section __sec_lex_semactions__. | |
65 | ||
66 | [heading Associating Token Definitions with the Lexer] | |
67 | ||
68 | If you compare this code to the code from __sec_lex_quickstart_1__ with regard | |
69 | to the way how token definitions are associated with the lexer, you will notice | |
70 | a different syntax being used here. In the previous example we have been | |
71 | using the `self.add()` style of the API, while we here directly assign the token | |
72 | definitions to `self`, combining the different token definitions using the `|` | |
73 | operator. Here is the code snippet again: | |
74 | ||
75 | this->self | |
76 | = word [++ref(w), ref(c) += distance(_1)] | |
77 | | eol [++ref(c), ++ref(l)] | |
78 | | any [++ref(c)] | |
79 | ; | |
80 | ||
81 | This way we have a very powerful and natural way of building the lexical | |
82 | analyzer. If translated into English this may be read as: The lexical analyzer | |
83 | will recognize ('`=`') tokens as defined by any of ('`|`') the token | |
84 | definitions `word`, `eol`, and `any`. | |
85 | ||
86 | A second difference to the previous example is that we do not explicitly | |
87 | specify any token ids to use for the separate tokens. Using semantic actions to | |
88 | trigger some useful work has freed us from the need to define those. To ensure | |
89 | every token gets assigned a id the __lex__ library internally assigns unique | |
90 | numbers to the token definitions, starting with the constant defined by | |
91 | `boost::spirit::lex::min_token_id`. | |
92 | ||
93 | [heading Pulling everything together] | |
94 | ||
95 | In order to execute the code defined above we still need to instantiate an | |
96 | instance of the lexer type, feed it from some input sequence and create a pair | |
97 | of iterators allowing to iterate over the token sequence as created by the | |
98 | lexer. This code shows how to achieve these steps: | |
99 | ||
100 | [wcl_main] | |
101 | ||
102 | ||
103 | [endsect] |