[ceph.git] / ceph / src / boost / libs / spirit / doc / lex / lexer_static_model.qbk

[/==============================================================================
    Copyright (C) 2001-2011 Joel de Guzman
    Copyright (C) 2001-2011 Hartmut Kaiser

    Distributed under the Boost Software License, Version 1.0. (See accompanying
    file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
===============================================================================/]

[section:lexer_static_model The /Static/ Lexer Model]

The documentation of __lex__ so far mostly was about describing the features of
the /dynamic/ model, where the tables needed for lexical analysis are generated
from the regular expressions at runtime. The big advantage of the dynamic model
is its flexibility, and its integration with the __spirit__ library and the C++
host language. Its big disadvantage is the need to spend additional runtime to 
generate the tables, which especially might be a limitation for larger lexical 
analyzers. The /static/ model strives to build upon the smooth integration with
__spirit__ and C++, and reuses large parts of the __lex__ library as described
so far, while overcoming the additional runtime requirements by using 
pre-generated tables and tokenizer routines. To make the code generation as 
simple as possible, the static model reuses the token definition types developed 
for the /dynamic/ model without any changes. As will be shown in this 
section, building a code generator based on an existing token definition type 
is a matter of writing 3 lines of code.

Assuming you already built a dynamic lexer for your problem, there are two more
steps needed to create a static lexical analyzer using __lex__:

# generating the C++ code for the static analyzer (including the tokenization 
  function and corresponding tables), and 
# modifying the dynamic lexical analyzer to use the generated code.

Both steps are described in more detail in the two sections below (for the full
source code used in this example see the code here:
[@../../example/lex/static_lexer/word_count_tokens.hpp the common token definition],
[@../../example/lex/static_lexer/word_count_generate.cpp the code generator], 
[@../../example/lex/static_lexer/word_count_static.hpp the generated code], and 
[@../../example/lex/static_lexer/word_count_static.cpp the static lexical analyzer]).

[import ../example/lex/static_lexer/word_count_tokens.hpp]
[import ../example/lex/static_lexer/word_count_static.cpp]
[import ../example/lex/static_lexer/word_count_generate.cpp]

But first we provide the code snippets needed to further understand the 
descriptions. Both, the definition of the used token identifier and the of the 
token definition class in this example are put into a separate header file to 
make these available to the code generator and the static lexical analyzer. 

[wc_static_tokenids]

The important point here is, that the token definition class is not different 
from a similar class to be used for a dynamic lexical analyzer. The library 
has been designed in a way, that all components (dynamic lexical analyzer, code 
generator, and static lexical analyzer) can reuse the very same token definition 
syntax.

[wc_static_tokendef]

The only thing changing between the three different use cases is the template
parameter used to instantiate a concrete token definition. For the dynamic 
model and the code generator you probably will use the __class_lexertl_lexer__
template, where for the static model you will use the 
__class_lexertl_static_lexer__ type as the template parameter.

This example not only shows how to build a static lexer, but it additionally 
demonstrates how such a lexer can be used for parsing in conjunction with a 
__qi__ grammar. For completeness, we provide the simple grammar used in this 
example. As you can see, this grammar does not have any dependencies on the 
static lexical analyzer, and for this reason it is not different from a grammar
used either without a lexer or using a dynamic lexical analyzer as described 
before.

[wc_static_grammar]


[heading Generating the Static Analyzer]

The first additional step to perform in order to create a static lexical 
analyzer is to create a small stand alone program for creating the lexer tables
and the corresponding tokenization function. For this purpose the __lex__ 
library exposes a special API - the function __api_generate_static__. It 
implements the whole code generator, no further code is needed. All what it 
takes to invoke this function is to supply a token definition instance, an 
output stream to use to generate the code to, and an optional string to be used 
as a suffix for the name of the generated function. All in all just a couple 
lines of code.

[wc_static_generate_main]

The shown code generator will generate output, which should be stored in a file
for later inclusion into the static lexical analyzer as shown in the next 
topic (the full generated code can be viewed 
[@../../example/lex/static_lexer/word_count_static.hpp here]).

[note  The generated code will have compiled in the version number of the 
       current __lex__ library. This version number is used at compilation time
       of your static lexer object to ensure this is compiled using exactly the
       same version of the __lex__ library as the lexer tables have been 
       generated with. If the versions do not match you will see an compilation 
       error mentioning an `incompatible_static_lexer_version`.
]

[heading Modifying the Dynamic Analyzer]

The second required step to convert an existing dynamic lexer into a static one 
is to change your main program at two places. First, you need to change the 
type of the used lexer (that is the template parameter used while instantiating
your token definition class). While in the dynamic model we have been using the
__class_lexertl_lexer__ template, we now need to change that to the 
__class_lexertl_static_lexer__ type. The second change is tightly related to 
the first one and involves correcting the corresponding `#include` statement to:

[wc_static_include]

Otherwise the main program is not different from an equivalent program using 
the dynamic model. This feature makes it easy to develop the lexer in dynamic 
mode and to switch to the static mode after the code has been stabilized. 
The simple generator application shown above enables the integration of the 
code generator into any existing build process. The following code snippet 
provides the overall main function, highlighting the code to be changed.

[wc_static_main]

[important The generated code for the static lexer contains the token ids as 
           they have been assigned, either explicitly by the programmer or 
           implicitly during lexer construction. It is your responsibility
           to make sure that all instances of a particular static lexer  
           type use exactly the same token ids. The constructor of the lexer 
           object has a second (default) parameter allowing it to designate a 
           starting token id to be used while assigning the ids to the token 
           definitions. The requirement above is fulfilled by default
           as long as no `first_id` is specified during construction of the
           static lexer instances. 
]


[endsect]
Commit	Line	Data
7c673cae FG	1	[/==============================================================================
	2	Copyright (C) 2001-2011 Joel de Guzman
	3	Copyright (C) 2001-2011 Hartmut Kaiser
	4
	5	Distributed under the Boost Software License, Version 1.0. (See accompanying
	6	file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
	7	===============================================================================/]
	8
	9	[section:lexer_static_model The /Static/ Lexer Model]
	10
	11	The documentation of __lex__ so far mostly was about describing the features of
	12	the /dynamic/ model, where the tables needed for lexical analysis are generated
	13	from the regular expressions at runtime. The big advantage of the dynamic model
	14	is its flexibility, and its integration with the __spirit__ library and the C++
	15	host language. Its big disadvantage is the need to spend additional runtime to
	16	generate the tables, which especially might be a limitation for larger lexical
	17	analyzers. The /static/ model strives to build upon the smooth integration with
	18	__spirit__ and C++, and reuses large parts of the __lex__ library as described
	19	so far, while overcoming the additional runtime requirements by using
	20	pre-generated tables and tokenizer routines. To make the code generation as
	21	simple as possible, the static model reuses the token definition types developed
	22	for the /dynamic/ model without any changes. As will be shown in this
	23	section, building a code generator based on an existing token definition type
	24	is a matter of writing 3 lines of code.
	25
	26	Assuming you already built a dynamic lexer for your problem, there are two more
	27	steps needed to create a static lexical analyzer using __lex__:
	28
	29	# generating the C++ code for the static analyzer (including the tokenization
	30	function and corresponding tables), and
	31	# modifying the dynamic lexical analyzer to use the generated code.
	32
	33	Both steps are described in more detail in the two sections below (for the full
	34	source code used in this example see the code here:
	35	[@../../example/lex/static_lexer/word_count_tokens.hpp the common token definition],
	36	[@../../example/lex/static_lexer/word_count_generate.cpp the code generator],
	37	[@../../example/lex/static_lexer/word_count_static.hpp the generated code], and
	38	[@../../example/lex/static_lexer/word_count_static.cpp the static lexical analyzer]).
	39
	40	[import ../example/lex/static_lexer/word_count_tokens.hpp]
	41	[import ../example/lex/static_lexer/word_count_static.cpp]
	42	[import ../example/lex/static_lexer/word_count_generate.cpp]
	43
	44	But first we provide the code snippets needed to further understand the
	45	descriptions. Both, the definition of the used token identifier and the of the
	46	token definition class in this example are put into a separate header file to
	47	make these available to the code generator and the static lexical analyzer.
	48
	49	[wc_static_tokenids]
	50
	51	The important point here is, that the token definition class is not different
	52	from a similar class to be used for a dynamic lexical analyzer. The library
	53	has been designed in a way, that all components (dynamic lexical analyzer, code
	54	generator, and static lexical analyzer) can reuse the very same token definition
	55	syntax.
	56
	57	[wc_static_tokendef]
	58
	59	The only thing changing between the three different use cases is the template
	60	parameter used to instantiate a concrete token definition. For the dynamic
	61	model and the code generator you probably will use the __class_lexertl_lexer__
	62	template, where for the static model you will use the
	63	__class_lexertl_static_lexer__ type as the template parameter.
	64
65	This example not only shows how to build a static lexer, but it additionally
66	demonstrates how such a lexer can be used for parsing in conjunction with a
67	__qi__ grammar. For completeness, we provide the simple grammar used in this
68	example. As you can see, this grammar does not have any dependencies on the
69	static lexical analyzer, and for this reason it is not different from a grammar
70	used either without a lexer or using a dynamic lexical analyzer as described
71	before.
72
73	[wc_static_grammar]
74
75
76	[heading Generating the Static Analyzer]
77
78	The first additional step to perform in order to create a static lexical
79	analyzer is to create a small stand alone program for creating the lexer tables
80	and the corresponding tokenization function. For this purpose the __lex__
81	library exposes a special API - the function __api_generate_static__. It
82	implements the whole code generator, no further code is needed. All what it
83	takes to invoke this function is to supply a token definition instance, an
84	output stream to use to generate the code to, and an optional string to be used
85	as a suffix for the name of the generated function. All in all just a couple
86	lines of code.
87
88	[wc_static_generate_main]
89
90	The shown code generator will generate output, which should be stored in a file
91	for later inclusion into the static lexical analyzer as shown in the next
92	topic (the full generated code can be viewed
93	[@../../example/lex/static_lexer/word_count_static.hpp here]).
94
95	[note The generated code will have compiled in the version number of the
96	current __lex__ library. This version number is used at compilation time
97	of your static lexer object to ensure this is compiled using exactly the
98	same version of the __lex__ library as the lexer tables have been
99	generated with. If the versions do not match you will see an compilation
100	error mentioning an `incompatible_static_lexer_version`.
101	]
102
103	[heading Modifying the Dynamic Analyzer]
104
105	The second required step to convert an existing dynamic lexer into a static one
106	is to change your main program at two places. First, you need to change the
107	type of the used lexer (that is the template parameter used while instantiating
108	your token definition class). While in the dynamic model we have been using the
109	__class_lexertl_lexer__ template, we now need to change that to the
110	__class_lexertl_static_lexer__ type. The second change is tightly related to
111	the first one and involves correcting the corresponding `#include` statement to:
112
113	[wc_static_include]
114
115	Otherwise the main program is not different from an equivalent program using
116	the dynamic model. This feature makes it easy to develop the lexer in dynamic
117	mode and to switch to the static mode after the code has been stabilized.
118	The simple generator application shown above enables the integration of the
119	code generator into any existing build process. The following code snippet
120	provides the overall main function, highlighting the code to be changed.
121
122	[wc_static_main]
123
124	[important The generated code for the static lexer contains the token ids as
125	they have been assigned, either explicitly by the programmer or
126	implicitly during lexer construction. It is your responsibility
127	to make sure that all instances of a particular static lexer
128	type use exactly the same token ids. The constructor of the lexer
129	object has a second (default) parameter allowing it to designate a
130	starting token id to be used while assigning the ids to the token
131	definitions. The requirement above is fulfilled by default
132	as long as no `first_id` is specified during construction of the
133	static lexer instances.
134	]
135
136
137	[endsect]