]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [/============================================================================== |
2 | Copyright (C) 2001-2011 Joel de Guzman | |
3 | Copyright (C) 2001-2011 Hartmut Kaiser | |
4 | ||
5 | Distributed under the Boost Software License, Version 1.0. (See accompanying | |
6 | file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) | |
7 | ===============================================================================/] | |
8 | ||
9 | [section:lexer_quickstart1 Quickstart 1 - A word counter using __lex__] | |
10 | ||
11 | __lex__ is very modular, which follows the general building principle of the | |
12 | __spirit__ libraries. You never pay for features you don't use. It is nicely | |
13 | integrated with the other parts of __spirit__ but nevertheless can be used | |
14 | separately to build stand alone lexical analyzers. | |
15 | The first quick start example describes a stand alone application: | |
16 | counting characters, words, and lines in a file, very similar to what the well | |
17 | known Unix command `wc` is doing (for the full example code see here: | |
18 | [@../../example/lex/word_count_functor.cpp word_count_functor.cpp]). | |
19 | ||
20 | [import ../example/lex/word_count_functor.cpp] | |
21 | ||
22 | ||
23 | [heading Prerequisites] | |
24 | ||
25 | The only required `#include` specific to /Spirit.Lex/ follows. It is a wrapper | |
26 | for all necessary definitions to use /Spirit.Lex/ in a stand alone fashion, and | |
27 | on top of the __lexertl__ library. Additionally we `#include` two of the Boost | |
28 | headers to define `boost::bind()` and `boost::ref()`. | |
29 | ||
30 | [wcf_includes] | |
31 | ||
32 | To make all the code below more readable we introduce the following namespaces. | |
33 | ||
34 | [wcf_namespaces] | |
35 | ||
36 | ||
37 | [heading Defining Tokens] | |
38 | ||
39 | The most important step while creating a lexer using __lex__ is to define the | |
40 | tokens to be recognized in the input sequence. This is normally done by | |
41 | defining the regular expressions describing the matching character sequences, | |
42 | and optionally their corresponding token ids. Additionally the defined tokens | |
43 | need to be associated with an instance of a lexer object as provided by the | |
44 | library. The following code snippet shows how this can be done using __lex__. | |
45 | ||
46 | [wcf_token_definition] | |
47 | ||
48 | ||
49 | [heading Doing the Useful Work] | |
50 | ||
51 | We will use a setup, where we want the __lex__ library to invoke a given | |
52 | function after any of the generated tokens is recognized. For this reason | |
53 | we need to implement a functor taking at least the generated token as an | |
54 | argument and returning a boolean value allowing to stop the tokenization | |
55 | process. The default token type used in this example carries a token value of | |
56 | the type __boost_iterator_range__`<BaseIterator>` pointing to the matched | |
57 | range in the underlying input sequence. | |
58 | ||
59 | [wcf_functor] | |
60 | ||
61 | All what is left is to write some boilerplate code helping to tie together the | |
62 | pieces described so far. To simplify this example we call the `lex::tokenize()` | |
63 | function implemented in __lex__ (for a more detailed description of this | |
64 | function see here: __fixme__), even if we could have written a loop to iterate | |
65 | over the lexer iterators [`first`, `last`) as well. | |
66 | ||
67 | ||
68 | [heading Pulling Everything Together] | |
69 | ||
70 | [wcf_main] | |
71 | ||
72 | ||
73 | [heading Comparing __lex__ with __flex__] | |
74 | ||
75 | This example was deliberately chosen to be as much as possible similar to the | |
76 | equivalent __flex__ program (see below), which isn't too different from what | |
77 | has to be written when using __lex__. | |
78 | ||
79 | [note Interestingly enough, performance comparisons of lexical analyzers | |
80 | written using __lex__ with equivalent programs generated by | |
81 | __flex__ show that both have comparable execution speeds! | |
82 | Generally, thanks to the highly optimized __lexertl__ library and | |
83 | due its carefully designed integration with __spirit__ the | |
84 | abstraction penalty to be paid for using __lex__ is negligible. | |
85 | ] | |
86 | ||
87 | The remaining examples in this tutorial will use more sophisticated features | |
88 | of __lex__, mainly to allow further simplification of the code to be written, | |
89 | while maintaining the similarity with corresponding features of __flex__. | |
90 | __lex__ has been designed to be as similar to __flex__ as possible. That | |
91 | is why this documentation will provide the corresponding __flex__ code for the | |
92 | shown __lex__ examples almost everywhere. So consequently, here is the __flex__ | |
93 | code corresponding to the example as shown above. | |
94 | ||
95 | [wcf_flex_version] | |
96 | ||
97 | [endsect] |