[ceph.git] / ceph / src / boost / libs / spirit / doc / lex / lexer_quickstart1.qbk

[/==============================================================================
    Copyright (C) 2001-2011 Joel de Guzman
    Copyright (C) 2001-2011 Hartmut Kaiser

    Distributed under the Boost Software License, Version 1.0. (See accompanying
    file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
===============================================================================/]

[section:lexer_quickstart1 Quickstart 1 - A word counter using __lex__]

__lex__ is very modular, which follows the general building principle of the 
__spirit__ libraries. You never pay for features you don't use. It is nicely 
integrated with the other parts of __spirit__ but nevertheless can be used 
separately to build stand alone lexical analyzers. 
The first quick start example describes a stand alone application: 
counting characters, words, and lines in a file, very similar to what the well 
known Unix command `wc` is doing (for the full example code see here:
[@../../example/lex/word_count_functor.cpp word_count_functor.cpp]).

[import ../example/lex/word_count_functor.cpp]


[heading Prerequisites]

The only required `#include` specific to /Spirit.Lex/ follows. It is a wrapper 
for all necessary definitions to use /Spirit.Lex/ in a stand alone fashion, and 
on top of the __lexertl__ library. Additionally we `#include` two of the Boost 
headers to define `boost::bind()` and `boost::ref()`.

[wcf_includes]

To make all the code below more readable we introduce the following namespaces.

[wcf_namespaces]


[heading Defining Tokens]

The most important step while creating a lexer using __lex__ is to define the 
tokens to be recognized in the input sequence. This is normally done by 
defining the regular expressions describing the matching character sequences, 
and optionally their corresponding token ids. Additionally the defined tokens 
need to be associated with an instance of a lexer object as provided by the 
library. The following code snippet shows how this can be done using __lex__.

[wcf_token_definition]


[heading Doing the Useful Work]

We will use a setup, where we want the __lex__ library to invoke a given 
function after any of the generated tokens is recognized. For this reason 
we need to implement a functor taking at least the generated token as an 
argument and returning a boolean value allowing to stop the tokenization 
process. The default token type used in this example carries a token value of
the type __boost_iterator_range__`<BaseIterator>` pointing to the matched 
range in the underlying input sequence. 

[wcf_functor]

All what is left is to write some boilerplate code helping to tie together the
pieces described so far. To simplify this example we call the `lex::tokenize()`
function implemented in __lex__ (for a more detailed description of this 
function see here: __fixme__), even if we could have written a loop to iterate
over the lexer iterators [`first`, `last`) as well.


[heading Pulling Everything Together]

[wcf_main]


[heading Comparing __lex__ with __flex__]

This example was deliberately chosen to be as much as possible similar to the 
equivalent __flex__ program (see below), which isn't too different from what 
has to be written when using __lex__. 

[note   Interestingly enough, performance comparisons of lexical analyzers 
        written using __lex__ with equivalent programs generated by 
        __flex__ show that both have comparable execution speeds! 
        Generally, thanks to the highly optimized __lexertl__ library and 
        due its carefully designed integration with __spirit__ the 
        abstraction penalty to be paid for using __lex__ is negligible.
]

The remaining examples in this tutorial will use more sophisticated features 
of __lex__, mainly to allow further simplification of the code to be written, 
while maintaining the similarity with corresponding features of __flex__. 
__lex__ has been designed to be as similar to __flex__ as possible. That 
is why this documentation will provide the corresponding __flex__ code for the 
shown __lex__ examples almost everywhere. So consequently, here is the __flex__ 
code corresponding to the example as shown above.

[wcf_flex_version]

[endsect]
Commit	Line	Data
7c673cae FG	1	[/==============================================================================
	2	Copyright (C) 2001-2011 Joel de Guzman
	3	Copyright (C) 2001-2011 Hartmut Kaiser
	4
	5	Distributed under the Boost Software License, Version 1.0. (See accompanying
	6	file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
	7	===============================================================================/]
	8
	9	[section:lexer_quickstart1 Quickstart 1 - A word counter using __lex__]
	10
	11	__lex__ is very modular, which follows the general building principle of the
	12	__spirit__ libraries. You never pay for features you don't use. It is nicely
	13	integrated with the other parts of __spirit__ but nevertheless can be used
	14	separately to build stand alone lexical analyzers.
	15	The first quick start example describes a stand alone application:
	16	counting characters, words, and lines in a file, very similar to what the well
	17	known Unix command `wc` is doing (for the full example code see here:
	18	[@../../example/lex/word_count_functor.cpp word_count_functor.cpp]).
	19
	20	[import ../example/lex/word_count_functor.cpp]
	21
	22
	23	[heading Prerequisites]
	24
	25	The only required `#include` specific to /Spirit.Lex/ follows. It is a wrapper
	26	for all necessary definitions to use /Spirit.Lex/ in a stand alone fashion, and
	27	on top of the __lexertl__ library. Additionally we `#include` two of the Boost
	28	headers to define `boost::bind()` and `boost::ref()`.
	29
	30	[wcf_includes]
	31
	32	To make all the code below more readable we introduce the following namespaces.
	33
	34	[wcf_namespaces]
	35
	36
	37	[heading Defining Tokens]
	38
	39	The most important step while creating a lexer using __lex__ is to define the
	40	tokens to be recognized in the input sequence. This is normally done by
	41	defining the regular expressions describing the matching character sequences,
	42	and optionally their corresponding token ids. Additionally the defined tokens
	43	need to be associated with an instance of a lexer object as provided by the
	44	library. The following code snippet shows how this can be done using __lex__.
	45
	46	[wcf_token_definition]
	47
	48
	49	[heading Doing the Useful Work]
	50
	51	We will use a setup, where we want the __lex__ library to invoke a given
	52	function after any of the generated tokens is recognized. For this reason
	53	we need to implement a functor taking at least the generated token as an
	54	argument and returning a boolean value allowing to stop the tokenization
	55	process. The default token type used in this example carries a token value of
	56	the type __boost_iterator_range__`<BaseIterator>` pointing to the matched
	57	range in the underlying input sequence.
	58
	59	[wcf_functor]
	60
	61	All what is left is to write some boilerplate code helping to tie together the
	62	pieces described so far. To simplify this example we call the `lex::tokenize()`
	63	function implemented in __lex__ (for a more detailed description of this
	64	function see here: __fixme__), even if we could have written a loop to iterate
65	over the lexer iterators [`first`, `last`) as well.
66
67
68	[heading Pulling Everything Together]
69
70	[wcf_main]
71
72
73	[heading Comparing __lex__ with __flex__]
74
75	This example was deliberately chosen to be as much as possible similar to the
76	equivalent __flex__ program (see below), which isn't too different from what
77	has to be written when using __lex__.
78
79	[note Interestingly enough, performance comparisons of lexical analyzers
80	written using __lex__ with equivalent programs generated by
81	__flex__ show that both have comparable execution speeds!
82	Generally, thanks to the highly optimized __lexertl__ library and
83	due its carefully designed integration with __spirit__ the
84	abstraction penalty to be paid for using __lex__ is negligible.
85	]
86
87	The remaining examples in this tutorial will use more sophisticated features
88	of __lex__, mainly to allow further simplification of the code to be written,
89	while maintaining the similarity with corresponding features of __flex__.
90	__lex__ has been designed to be as similar to __flex__ as possible. That
91	is why this documentation will provide the corresponding __flex__ code for the
92	shown __lex__ examples almost everywhere. So consequently, here is the __flex__
93	code corresponding to the example as shown above.
94
95	[wcf_flex_version]
96
97	[endsect]