[ceph.git] / ceph / src / boost / libs / spirit / doc / lex / lexer_quickstart2.qbk

[/==============================================================================
    Copyright (C) 2001-2011 Joel de Guzman
    Copyright (C) 2001-2011 Hartmut Kaiser

    Distributed under the Boost Software License, Version 1.0. (See accompanying
    file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
===============================================================================/]

[section:lexer_quickstart2 Quickstart 2 - A better word counter using __lex__]

People familiar with  __flex__ will probably complain about the example from the 
section __sec_lex_quickstart_1__ as being overly complex and not being 
written to leverage the possibilities provided by this tool. In particular the
previous example did not directly use the lexer actions to count the lines, 
words, and characters. So the example provided in this step of the tutorial will 
show how to use semantic actions in __lex__. Even though this examples still 
counts textual elements, the purpose is to introduce new concepts and 
configuration options along the lines (for the full example code 
see here: [@../../example/lex/word_count_lexer.cpp word_count_lexer.cpp]).

[import ../example/lex/word_count_lexer.cpp]


[heading Prerequisites]

In addition to the only required `#include` specific to /Spirit.Lex/ this 
example needs to include a couple of header files from the __boost_phoenix__ 
library. This example shows how to attach functors to token definitions, which 
could be done using any type of C++ technique resulting in a callable object.
Using __boost_phoenix__ for this task simplifies things and avoids adding 
dependencies to other libraries (__boost_phoenix__ is already in use for 
__spirit__ anyway).

[wcl_includes]

To make all the code below more readable we introduce the following namespaces.

[wcl_namespaces]

To give a preview at what to expect from this example, here is the flex program
which has been used as the starting point. The useful code is directly included 
inside the actions associated with each of the token definitions. 

[wcl_flex_version]


[heading Semantic Actions in __lex__]

__lex__ uses a very similar way of associating actions with the token 
definitions (which should look familiar to anybody knowledgeable with 
__spirit__ as well): specifying the operations to execute inside of a pair of
`[]` brackets. In order to be able to attach semantic actions to token 
definitions for each of them there is defined an instance of a `token_def<>`.

[wcl_token_definition]

The semantics of the shown code is as follows. The code inside the `[]` 
brackets will be executed whenever the corresponding token has been matched by 
the lexical analyzer. This is very similar to __flex__, where the action code 
associated with a token definition gets executed after the recognition of a
matching input sequence. The code above uses function objects constructed using 
__boost_phoenix__, but it is possible to insert any C++ function or function object 
as long as it exposes the proper interface. For more details on please refer 
to the section __sec_lex_semactions__. 

[heading Associating Token Definitions with the Lexer]

If you compare this code to the code from __sec_lex_quickstart_1__ with regard 
to the way how token definitions are associated with the lexer, you will notice 
a different syntax being used here. In the previous example we have been 
using the `self.add()` style of the API, while we here directly assign the token 
definitions to `self`, combining the different token definitions using the `|` 
operator. Here is the code snippet again:

    this->self 
        =   word  [++ref(w), ref(c) += distance(_1)]
        |   eol   [++ref(c), ++ref(l)] 
        |   any   [++ref(c)]
        ;

This way we have a very powerful and natural way of building the lexical 
analyzer. If translated into English this may be read as: The lexical analyzer 
will recognize ('`=`') tokens as defined by any of ('`|`') the token 
definitions `word`, `eol`, and `any`. 

A second difference to the previous example is that we do not explicitly 
specify any token ids to use for the separate tokens. Using semantic actions to 
trigger some useful work has freed us from the need to define those. To ensure 
every token gets assigned a id the __lex__ library internally assigns unique 
numbers to the token definitions, starting with the constant defined by 
`boost::spirit::lex::min_token_id`.

[heading Pulling everything together]

In order to execute the code defined above we still need to instantiate an 
instance of the lexer type, feed it from some input sequence and create a pair 
of iterators allowing to iterate over the token sequence as created by the 
lexer. This code shows how to achieve these steps:

[wcl_main]


[endsect]
Commit	Line	Data
7c673cae FG	1	[/==============================================================================
	2	Copyright (C) 2001-2011 Joel de Guzman
	3	Copyright (C) 2001-2011 Hartmut Kaiser
	4
	5	Distributed under the Boost Software License, Version 1.0. (See accompanying
	6	file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
	7	===============================================================================/]
	8
	9	[section:lexer_quickstart2 Quickstart 2 - A better word counter using __lex__]
	10
	11	People familiar with __flex__ will probably complain about the example from the
	12	section __sec_lex_quickstart_1__ as being overly complex and not being
	13	written to leverage the possibilities provided by this tool. In particular the
	14	previous example did not directly use the lexer actions to count the lines,
	15	words, and characters. So the example provided in this step of the tutorial will
	16	show how to use semantic actions in __lex__. Even though this examples still
	17	counts textual elements, the purpose is to introduce new concepts and
	18	configuration options along the lines (for the full example code
	19	see here: [@../../example/lex/word_count_lexer.cpp word_count_lexer.cpp]).
	20
	21	[import ../example/lex/word_count_lexer.cpp]
	22
	23
	24	[heading Prerequisites]
	25
	26	In addition to the only required `#include` specific to /Spirit.Lex/ this
	27	example needs to include a couple of header files from the __boost_phoenix__
	28	library. This example shows how to attach functors to token definitions, which
	29	could be done using any type of C++ technique resulting in a callable object.
	30	Using __boost_phoenix__ for this task simplifies things and avoids adding
	31	dependencies to other libraries (__boost_phoenix__ is already in use for
	32	__spirit__ anyway).
	33
	34	[wcl_includes]
	35
	36	To make all the code below more readable we introduce the following namespaces.
	37
	38	[wcl_namespaces]
	39
	40	To give a preview at what to expect from this example, here is the flex program
	41	which has been used as the starting point. The useful code is directly included
	42	inside the actions associated with each of the token definitions.
	43
	44	[wcl_flex_version]
	45
	46
	47	[heading Semantic Actions in __lex__]
	48
	49	__lex__ uses a very similar way of associating actions with the token
	50	definitions (which should look familiar to anybody knowledgeable with
	51	__spirit__ as well): specifying the operations to execute inside of a pair of
	52	`[]` brackets. In order to be able to attach semantic actions to token
	53	definitions for each of them there is defined an instance of a `token_def<>`.
	54
	55	[wcl_token_definition]
	56
	57	The semantics of the shown code is as follows. The code inside the `[]`
	58	brackets will be executed whenever the corresponding token has been matched by
	59	the lexical analyzer. This is very similar to __flex__, where the action code
	60	associated with a token definition gets executed after the recognition of a
	61	matching input sequence. The code above uses function objects constructed using
	62	__boost_phoenix__, but it is possible to insert any C++ function or function object
	63	as long as it exposes the proper interface. For more details on please refer
	64	to the section __sec_lex_semactions__.
65
66	[heading Associating Token Definitions with the Lexer]
67
68	If you compare this code to the code from __sec_lex_quickstart_1__ with regard
69	to the way how token definitions are associated with the lexer, you will notice
70	a different syntax being used here. In the previous example we have been
71	using the `self.add()` style of the API, while we here directly assign the token
72	definitions to `self`, combining the different token definitions using the `\|`
73	operator. Here is the code snippet again:
74
75	this->self
76	= word [++ref(w), ref(c) += distance(_1)]
77	\| eol [++ref(c), ++ref(l)]
78	\| any [++ref(c)]
79	;
80
81	This way we have a very powerful and natural way of building the lexical
82	analyzer. If translated into English this may be read as: The lexical analyzer
83	will recognize ('`=`') tokens as defined by any of ('`\|`') the token
84	definitions `word`, `eol`, and `any`.
85
86	A second difference to the previous example is that we do not explicitly
87	specify any token ids to use for the separate tokens. Using semantic actions to
88	trigger some useful work has freed us from the need to define those. To ensure
89	every token gets assigned a id the __lex__ library internally assigns unique
90	numbers to the token definitions, starting with the constant defined by
91	`boost::spirit::lex::min_token_id`.
92
93	[heading Pulling everything together]
94
95	In order to execute the code defined above we still need to instantiate an
96	instance of the lexer type, feed it from some input sequence and create a pair
97	of iterators allowing to iterate over the token sequence as created by the
98	lexer. This code shows how to achieve these steps:
99
100	[wcl_main]
101
102
103	[endsect]