]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [/============================================================================== |
2 | Copyright (C) 2001-2011 Joel de Guzman | |
3 | Copyright (C) 2001-2011 Hartmut Kaiser | |
4 | ||
5 | Distributed under the Boost Software License, Version 1.0. (See accompanying | |
6 | file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) | |
7 | ===============================================================================/] | |
8 | ||
9 | [section:lexer_static_model The /Static/ Lexer Model] | |
10 | ||
11 | The documentation of __lex__ so far mostly was about describing the features of | |
12 | the /dynamic/ model, where the tables needed for lexical analysis are generated | |
13 | from the regular expressions at runtime. The big advantage of the dynamic model | |
14 | is its flexibility, and its integration with the __spirit__ library and the C++ | |
15 | host language. Its big disadvantage is the need to spend additional runtime to | |
16 | generate the tables, which especially might be a limitation for larger lexical | |
17 | analyzers. The /static/ model strives to build upon the smooth integration with | |
18 | __spirit__ and C++, and reuses large parts of the __lex__ library as described | |
19 | so far, while overcoming the additional runtime requirements by using | |
20 | pre-generated tables and tokenizer routines. To make the code generation as | |
21 | simple as possible, the static model reuses the token definition types developed | |
22 | for the /dynamic/ model without any changes. As will be shown in this | |
23 | section, building a code generator based on an existing token definition type | |
24 | is a matter of writing 3 lines of code. | |
25 | ||
26 | Assuming you already built a dynamic lexer for your problem, there are two more | |
27 | steps needed to create a static lexical analyzer using __lex__: | |
28 | ||
29 | # generating the C++ code for the static analyzer (including the tokenization | |
30 | function and corresponding tables), and | |
31 | # modifying the dynamic lexical analyzer to use the generated code. | |
32 | ||
33 | Both steps are described in more detail in the two sections below (for the full | |
34 | source code used in this example see the code here: | |
35 | [@../../example/lex/static_lexer/word_count_tokens.hpp the common token definition], | |
36 | [@../../example/lex/static_lexer/word_count_generate.cpp the code generator], | |
37 | [@../../example/lex/static_lexer/word_count_static.hpp the generated code], and | |
38 | [@../../example/lex/static_lexer/word_count_static.cpp the static lexical analyzer]). | |
39 | ||
40 | [import ../example/lex/static_lexer/word_count_tokens.hpp] | |
41 | [import ../example/lex/static_lexer/word_count_static.cpp] | |
42 | [import ../example/lex/static_lexer/word_count_generate.cpp] | |
43 | ||
44 | But first we provide the code snippets needed to further understand the | |
45 | descriptions. Both, the definition of the used token identifier and the of the | |
46 | token definition class in this example are put into a separate header file to | |
47 | make these available to the code generator and the static lexical analyzer. | |
48 | ||
49 | [wc_static_tokenids] | |
50 | ||
51 | The important point here is, that the token definition class is not different | |
52 | from a similar class to be used for a dynamic lexical analyzer. The library | |
53 | has been designed in a way, that all components (dynamic lexical analyzer, code | |
54 | generator, and static lexical analyzer) can reuse the very same token definition | |
55 | syntax. | |
56 | ||
57 | [wc_static_tokendef] | |
58 | ||
59 | The only thing changing between the three different use cases is the template | |
60 | parameter used to instantiate a concrete token definition. For the dynamic | |
61 | model and the code generator you probably will use the __class_lexertl_lexer__ | |
62 | template, where for the static model you will use the | |
63 | __class_lexertl_static_lexer__ type as the template parameter. | |
64 | ||
65 | This example not only shows how to build a static lexer, but it additionally | |
66 | demonstrates how such a lexer can be used for parsing in conjunction with a | |
67 | __qi__ grammar. For completeness, we provide the simple grammar used in this | |
68 | example. As you can see, this grammar does not have any dependencies on the | |
69 | static lexical analyzer, and for this reason it is not different from a grammar | |
70 | used either without a lexer or using a dynamic lexical analyzer as described | |
71 | before. | |
72 | ||
73 | [wc_static_grammar] | |
74 | ||
75 | ||
76 | [heading Generating the Static Analyzer] | |
77 | ||
78 | The first additional step to perform in order to create a static lexical | |
79 | analyzer is to create a small stand alone program for creating the lexer tables | |
80 | and the corresponding tokenization function. For this purpose the __lex__ | |
81 | library exposes a special API - the function __api_generate_static__. It | |
82 | implements the whole code generator, no further code is needed. All what it | |
83 | takes to invoke this function is to supply a token definition instance, an | |
84 | output stream to use to generate the code to, and an optional string to be used | |
85 | as a suffix for the name of the generated function. All in all just a couple | |
86 | lines of code. | |
87 | ||
88 | [wc_static_generate_main] | |
89 | ||
90 | The shown code generator will generate output, which should be stored in a file | |
91 | for later inclusion into the static lexical analyzer as shown in the next | |
92 | topic (the full generated code can be viewed | |
93 | [@../../example/lex/static_lexer/word_count_static.hpp here]). | |
94 | ||
95 | [note The generated code will have compiled in the version number of the | |
96 | current __lex__ library. This version number is used at compilation time | |
97 | of your static lexer object to ensure this is compiled using exactly the | |
98 | same version of the __lex__ library as the lexer tables have been | |
99 | generated with. If the versions do not match you will see an compilation | |
100 | error mentioning an `incompatible_static_lexer_version`. | |
101 | ] | |
102 | ||
103 | [heading Modifying the Dynamic Analyzer] | |
104 | ||
105 | The second required step to convert an existing dynamic lexer into a static one | |
106 | is to change your main program at two places. First, you need to change the | |
107 | type of the used lexer (that is the template parameter used while instantiating | |
108 | your token definition class). While in the dynamic model we have been using the | |
109 | __class_lexertl_lexer__ template, we now need to change that to the | |
110 | __class_lexertl_static_lexer__ type. The second change is tightly related to | |
111 | the first one and involves correcting the corresponding `#include` statement to: | |
112 | ||
113 | [wc_static_include] | |
114 | ||
115 | Otherwise the main program is not different from an equivalent program using | |
116 | the dynamic model. This feature makes it easy to develop the lexer in dynamic | |
117 | mode and to switch to the static mode after the code has been stabilized. | |
118 | The simple generator application shown above enables the integration of the | |
119 | code generator into any existing build process. The following code snippet | |
120 | provides the overall main function, highlighting the code to be changed. | |
121 | ||
122 | [wc_static_main] | |
123 | ||
124 | [important The generated code for the static lexer contains the token ids as | |
125 | they have been assigned, either explicitly by the programmer or | |
126 | implicitly during lexer construction. It is your responsibility | |
127 | to make sure that all instances of a particular static lexer | |
128 | type use exactly the same token ids. The constructor of the lexer | |
129 | object has a second (default) parameter allowing it to designate a | |
130 | starting token id to be used while assigning the ids to the token | |
131 | definitions. The requirement above is fulfilled by default | |
132 | as long as no `first_id` is specified during construction of the | |
133 | static lexer instances. | |
134 | ] | |
135 | ||
136 | ||
137 | [endsect] |