]> git.proxmox.com Git - ceph.git/blame - ceph/src/boost/libs/spirit/doc/lex/tokenizing.qbk
bump version to 12.2.2-pve1
[ceph.git] / ceph / src / boost / libs / spirit / doc / lex / tokenizing.qbk
CommitLineData
7c673cae
FG
1[/==============================================================================
2 Copyright (C) 2001-2011 Joel de Guzman
3 Copyright (C) 2001-2011 Hartmut Kaiser
4
5 Distributed under the Boost Software License, Version 1.0. (See accompanying
6 file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
7===============================================================================/]
8
9[section:lexer_tokenizing Tokenizing Input Data]
10
11[heading The tokenize function]
12
13The `tokenize()` function is a helper function simplifying the usage of a lexer
14in a stand alone fashion. For instance, you may have a stand alone lexer where all
15that functional requirements are implemented inside lexer semantic actions.
16A good example for this is the [@../../example/lex/word_count_lexer.cpp word_count_lexer]
17described in more detail in the section __sec_lex_quickstart_2__.
18
19[wcl_token_definition]
20
21The construct used to tokenize the given input, while discarding all generated
22tokens is a common application of the lexer. For this reason __lex__ exposes an
23API function `tokenize()` minimizing the code required:
24
25 // Read input from the given file
26 std::string str (read_from_file(1 == argc ? "word_count.input" : argv[1]));
27
28 word_count_tokens<lexer_type> word_count_lexer;
29 std::string::iterator first = str.begin();
30
31 // Tokenize all the input, while discarding all generated tokens
32 bool r = tokenize(first, str.end(), word_count_lexer);
33
34This code is completely equivalent to the more verbose version as shown in the
35section __sec_lex_quickstart_2__. The function `tokenize()` will return either
36if the end of the input has been reached (in this case the return value will
37be `true`), or if the lexer couldn't match any of the token definitions in the
38input (in this case the return value will be `false` and the iterator `first`
39will point to the first not matched character in the input sequence).
40
41The prototype of this function is:
42
43 template <typename Iterator, typename Lexer>
44 bool tokenize(Iterator& first, Iterator last, Lexer const& lex
45 , typename Lexer::char_type const* initial_state = 0);
46
47[variablelist where:
48 [[Iterator& first] [The beginning of the input sequence to tokenize. The
49 value of this iterator will be updated by the
50 lexer, pointing to the first not matched
51 character of the input after the function
52 returns.]]
53 [[Iterator last] [The end of the input sequence to tokenize.]]
54 [[Lexer const& lex] [The lexer instance to use for tokenization.]]
55 [[Lexer::char_type const* initial_state]
56 [This optional parameter can be used to specify
57 the initial lexer state for tokenization.]]
58]
59
60A second overload of the `tokenize()` function allows specifying of any arbitrary
61function or function object to be called for each of the generated tokens. For
62some applications this is very useful, as it might avoid having lexer semantic
63actions. For an example of how to use this function, please have a look at
64[@../../example/lex/word_count_lexer.cpp word_count_functor.cpp]:
65
66[wcf_main]
67
68Here is the prototype of this `tokenize()` function overload:
69
70 template <typename Iterator, typename Lexer, typename F>
71 bool tokenize(Iterator& first, Iterator last, Lexer const& lex, F f
72 , typename Lexer::char_type const* initial_state = 0);
73
74[variablelist where:
75 [[Iterator& first] [The beginning of the input sequence to tokenize. The
76 value of this iterator will be updated by the
77 lexer, pointing to the first not matched
78 character of the input after the function
79 returns.]]
80 [[Iterator last] [The end of the input sequence to tokenize.]]
81 [[Lexer const& lex] [The lexer instance to use for tokenization.]]
82 [[F f] [A function or function object to be called for
83 each matched token. This function is expected to
84 have the prototype: `bool f(Lexer::token_type);`.
85 The `tokenize()` function will return immediately if
86 `F` returns `false.]]
87 [[Lexer::char_type const* initial_state]
88 [This optional parameter can be used to specify
89 the initial lexer state for tokenization.]]
90]
91
92[/heading The generate_static_dfa function]
93
94[endsect]