[ceph.git] / ceph / src / boost / libs / spirit / doc / lex / lexer_semantic_actions.qbk

[/==============================================================================
    Copyright (C) 2001-2011 Joel de Guzman
    Copyright (C) 2001-2011 Hartmut Kaiser

    Distributed under the Boost Software License, Version 1.0. (See accompanying
    file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
===============================================================================/]

[section:lexer_semantic_actions Lexer Semantic Actions]

The main task of a lexer normally is to recognize tokens in the input. 
Traditionally this has been complemented with the possibility to execute 
arbitrary code whenever a certain token has been detected. __lex__ has been
designed to support this mode of operation as well. We borrow from the concept
of semantic actions for parsers (__qi__) and generators (__karma__). Lexer 
semantic actions may be attached to any token definition. These are C++ 
functions or function objects that are called whenever a token definition 
successfully recognizes a portion of the input. Say you have a token definition
`D`, and a C++ function `f`, you can make the lexer call `f` whenever it matches 
an input by attaching `f`:

    D[f]

The expression above links `f` to the token definition, `D`. The required 
prototype of `f` is:

    void f (Iterator& start, Iterator& end, pass_flag& matched, Idtype& id, Context& ctx);

[variablelist where:
    [[`Iterator& start`]    [This is the iterator pointing to the begin of the 
                             matched range in the underlying input sequence. The 
                             type of the iterator is the same as specified while
                             defining the type of the `lexertl::actor_lexer<...>` 
                             (its first template parameter). The semantic action 
                             is allowed to change the value of this iterator
                             influencing, the matched input sequence.]]
    [[`Iterator& end`]      [This is the iterator pointing to the end of the 
                             matched range in the underlying input sequence. The 
                             type of the iterator is the same as specified while
                             defining the type of the `lexertl::actor_lexer<...>` 
                             (its first template parameter). The semantic action 
                             is allowed to change the value of this iterator
                             influencing, the matched input sequence.]]
    [[`pass_flag& matched`] [This value is pre/initialized to `pass_normal`.
                             If the semantic action sets it to `pass_fail` this 
                             behaves as if the token has not been matched in 
                             the first place. If the semantic action sets this
                             to `pass_ignore` the lexer ignores the current
                             token and tries to match a next token from the
                             input.]]
    [[`Idtype& id`]         [This is the token id of the type Idtype (most of 
                             the time this will be a `std::size_t`) for the 
                             matched token. The semantic action is allowed to 
                             change the value of this token id, influencing the 
                             if of the created token.]]
    [[`Context& ctx`]       [This is a reference to a lexer specific, 
                             unspecified type, providing the context for the
                             current lexer state. It can be used to access
                             different internal data items and is needed for
                             lexer state control from inside a semantic 
                             action.]]
]

When using a C++ function as the semantic action the following prototypes are 
allowed as well:

    void f (Iterator& start, Iterator& end, pass_flag& matched, Idtype& id);
    void f (Iterator& start, Iterator& end, pass_flag& matched);
    void f (Iterator& start, Iterator& end);
    void f ();

[important In order to use lexer semantic actions you need to use type 
           `lexertl::actor_lexer<>` as your lexer class (instead of the 
           type `lexertl::lexer<>` as described in earlier examples).]

[heading The context of a lexer semantic action]

The last parameter passed to any lexer semantic action is a reference to an 
unspecified type (see the `Context` type in the table above). This type is 
unspecified because it depends on the token type returned by the lexer. It is 
implemented in the internals of the iterator type exposed by the lexer.
Nevertheless, any context type is expected to expose a couple of
functions allowing to influence the behavior of the lexer. The following table 
gives an overview and a short description of the available functionality.

[table Functions exposed by any context passed to a lexer semantic action
    [[Name]   [Description]]
    [[`Iterator const& get_eoi() const`]
     [The function `get_eoi()` may be used by to access the end iterator of 
      the input stream the lexer has been initialized with]]
    [[`void more()`]
     [The function `more()` tells the lexer that the next time it matches a 
      rule, the corresponding token should be appended onto the current token 
      value rather than replacing it.]]
    [[`Iterator const& less(Iterator const& it, int n)`]
     [The function `less()` returns an iterator positioned to the nth input 
      character beyond the current token start iterator (i.e. by passing the 
      return value to the parameter `end` it is possible to return all but the 
      first n characters of the current token back to the input stream.]]
    [[`bool lookahead(std::size_t id)`]
     [The function `lookahead()` can be used to implement lookahead for lexer 
      engines not supporting constructs like flex' `a/b` 
      (match `a`, but only when followed by `b`). It invokes the lexer on the 
      input following the current token without actually moving forward in the 
      input stream. The function returns whether the lexer was able to match a 
      token with the given token-id `id`.]]
    [[`std::size_t get_state() const` and `void set_state(std::size_t state)`]
     [The functions `get_state()` and `set_state()` may be used to introspect
      and change the current lexer state.]]
    [[`token_value_type get_value() const` and `void set_value(Value const&)`]
     [The functions `get_value()` and `set_value()` may be used to introspect
      and change the current token value.]]
]

[heading Lexer Semantic Actions Using Phoenix]

Even if it is possible to write your own function object implementations (i.e. 
using Boost.Lambda or Boost.Bind), the preferred way of defining lexer semantic 
actions is to use __boost_phoenix__. In this case you can access the parameters 
described above by using the predefined __spirit__ placeholders: 

[table Predefined Phoenix placeholders for lexer semantic actions
    [[Placeholder]    [Description]]
    [[`_start`]
     [Refers to the iterator pointing to the beginning of the matched input 
      sequence. Any modifications to this iterator value will be reflected in 
      the generated token.]]
    [[`_end`]
     [Refers to the iterator pointing past the end of the matched input 
      sequence. Any modifications to this iterator value will be reflected in 
      the generated token.]]
    [[`_pass`]
     [References the value signaling the outcome of the semantic action. This 
      is pre-initialized to `lex::pass_flags::pass_normal`. If this is set to
      `lex::pass_flags::pass_fail`, the lexer will behave as if no token has 
      been matched, if is set to `lex::pass_flags::pass_ignore`, the lexer will 
      ignore the current match and proceed trying to match tokens from the 
      input.]]
    [[`_tokenid`]
     [Refers to the token id of the token to be generated. Any modifications 
      to this value will be reflected in the generated token.]]
    [[`_val`]
     [Refers to the value the next token will be initialized from. Any 
      modifications to this value will be reflected in the generated token.]]
    [[`_state`]
     [Refers to the lexer state the input has been match in. Any modifications 
      to this value will be reflected in the lexer itself (the next match will 
      start in the new state). The currently generated token is not affected 
      by changes to this variable.]]
    [[`_eoi`]
     [References the end iterator of the overall lexer input. This value 
      cannot be changed.]]
]

The context object passed as the last parameter to any lexer semantic action is 
not directly accessible while using __boost_phoenix__ expressions. We rather provide
predefined Phoenix functions allowing to invoke the different support functions
as mentioned above. The following table lists the available support functions 
and describes their functionality:
 
[table Support functions usable from Phoenix expressions inside lexer semantic actions
    [[Plain function] [Phoenix function] [Description]]
    [[`ctx.more()`]
     [`more()`]
     [The function `more()` tells the lexer that the next time it matches a 
      rule, the corresponding token should be appended onto the current token 
      value rather than replacing it.]]
    [[`ctx.less()`]
     [`less(n)`]
     [The function `less()` takes a single integer parameter `n` and returns an 
      iterator positioned to the nth input character beyond the current token 
      start iterator (i.e. by assigning the return value to the placeholder 
      `_end` it is possible to return all but the first `n` characters of the 
      current token back to the input stream.]]
    [[`ctx.lookahead()`]
     [`lookahead(std::size_t)` or `lookahead(token_def)`]
     [The function `lookahead()` takes a single parameter specifying the token
      to match in the input. The function can be used for instance to implement 
      lookahead for lexer engines not supporting constructs like flex' `a/b` 
      (match `a`, but only when followed by `b`). It invokes the lexer on the 
      input following the current token without actually moving forward in the 
      input stream. The function returns whether the lexer was able to match 
      the specified token.]]
]

[endsect]
Commit	Line	Data
7c673cae FG	1	[/==============================================================================
	2	Copyright (C) 2001-2011 Joel de Guzman
	3	Copyright (C) 2001-2011 Hartmut Kaiser
	4
	5	Distributed under the Boost Software License, Version 1.0. (See accompanying
	6	file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
	7	===============================================================================/]
	8
	9	[section:lexer_semantic_actions Lexer Semantic Actions]
	10
	11	The main task of a lexer normally is to recognize tokens in the input.
	12	Traditionally this has been complemented with the possibility to execute
	13	arbitrary code whenever a certain token has been detected. __lex__ has been
	14	designed to support this mode of operation as well. We borrow from the concept
	15	of semantic actions for parsers (__qi__) and generators (__karma__). Lexer
	16	semantic actions may be attached to any token definition. These are C++
	17	functions or function objects that are called whenever a token definition
	18	successfully recognizes a portion of the input. Say you have a token definition
	19	`D`, and a C++ function `f`, you can make the lexer call `f` whenever it matches
	20	an input by attaching `f`:
	21
	22	D[f]
	23
	24	The expression above links `f` to the token definition, `D`. The required
	25	prototype of `f` is:
	26
	27	void f (Iterator& start, Iterator& end, pass_flag& matched, Idtype& id, Context& ctx);
	28
	29	[variablelist where:
	30	[[`Iterator& start`] [This is the iterator pointing to the begin of the
	31	matched range in the underlying input sequence. The
	32	type of the iterator is the same as specified while
	33	defining the type of the `lexertl::actor_lexer<...>`
	34	(its first template parameter). The semantic action
	35	is allowed to change the value of this iterator
	36	influencing, the matched input sequence.]]
	37	[[`Iterator& end`] [This is the iterator pointing to the end of the
	38	matched range in the underlying input sequence. The
	39	type of the iterator is the same as specified while
	40	defining the type of the `lexertl::actor_lexer<...>`
	41	(its first template parameter). The semantic action
	42	is allowed to change the value of this iterator
	43	influencing, the matched input sequence.]]
	44	[[`pass_flag& matched`] [This value is pre/initialized to `pass_normal`.
	45	If the semantic action sets it to `pass_fail` this
	46	behaves as if the token has not been matched in
	47	the first place. If the semantic action sets this
	48	to `pass_ignore` the lexer ignores the current
	49	token and tries to match a next token from the
	50	input.]]
	51	[[`Idtype& id`] [This is the token id of the type Idtype (most of
	52	the time this will be a `std::size_t`) for the
	53	matched token. The semantic action is allowed to
	54	change the value of this token id, influencing the
	55	if of the created token.]]
	56	[[`Context& ctx`] [This is a reference to a lexer specific,
	57	unspecified type, providing the context for the
	58	current lexer state. It can be used to access
	59	different internal data items and is needed for
	60	lexer state control from inside a semantic
	61	action.]]
	62	]
	63
	64	When using a C++ function as the semantic action the following prototypes are
65	allowed as well:
66
67	void f (Iterator& start, Iterator& end, pass_flag& matched, Idtype& id);
68	void f (Iterator& start, Iterator& end, pass_flag& matched);
69	void f (Iterator& start, Iterator& end);
70	void f ();
71
72	[important In order to use lexer semantic actions you need to use type
73	`lexertl::actor_lexer<>` as your lexer class (instead of the
74	type `lexertl::lexer<>` as described in earlier examples).]
75
76	[heading The context of a lexer semantic action]
77
78	The last parameter passed to any lexer semantic action is a reference to an
79	unspecified type (see the `Context` type in the table above). This type is
80	unspecified because it depends on the token type returned by the lexer. It is
81	implemented in the internals of the iterator type exposed by the lexer.
82	Nevertheless, any context type is expected to expose a couple of
83	functions allowing to influence the behavior of the lexer. The following table
84	gives an overview and a short description of the available functionality.
85
86	[table Functions exposed by any context passed to a lexer semantic action
87	[[Name] [Description]]
88	[[`Iterator const& get_eoi() const`]
89	[The function `get_eoi()` may be used by to access the end iterator of
90	the input stream the lexer has been initialized with]]
91	[[`void more()`]
92	[The function `more()` tells the lexer that the next time it matches a
93	rule, the corresponding token should be appended onto the current token
94	value rather than replacing it.]]
95	[[`Iterator const& less(Iterator const& it, int n)`]
96	[The function `less()` returns an iterator positioned to the nth input
97	character beyond the current token start iterator (i.e. by passing the
98	return value to the parameter `end` it is possible to return all but the
99	first n characters of the current token back to the input stream.]]
100	[[`bool lookahead(std::size_t id)`]
101	[The function `lookahead()` can be used to implement lookahead for lexer
102	engines not supporting constructs like flex' `a/b`
103	(match `a`, but only when followed by `b`). It invokes the lexer on the
104	input following the current token without actually moving forward in the
105	input stream. The function returns whether the lexer was able to match a
106	token with the given token-id `id`.]]
107	[[`std::size_t get_state() const` and `void set_state(std::size_t state)`]
108	[The functions `get_state()` and `set_state()` may be used to introspect
109	and change the current lexer state.]]
110	[[`token_value_type get_value() const` and `void set_value(Value const&)`]
111	[The functions `get_value()` and `set_value()` may be used to introspect
112	and change the current token value.]]
113	]
114
115	[heading Lexer Semantic Actions Using Phoenix]
116
117	Even if it is possible to write your own function object implementations (i.e.
118	using Boost.Lambda or Boost.Bind), the preferred way of defining lexer semantic
119	actions is to use __boost_phoenix__. In this case you can access the parameters
120	described above by using the predefined __spirit__ placeholders:
121
122	[table Predefined Phoenix placeholders for lexer semantic actions
123	[[Placeholder] [Description]]
124	[[`_start`]
125	[Refers to the iterator pointing to the beginning of the matched input
126	sequence. Any modifications to this iterator value will be reflected in
127	the generated token.]]
128	[[`_end`]
129	[Refers to the iterator pointing past the end of the matched input
130	sequence. Any modifications to this iterator value will be reflected in
131	the generated token.]]
132	[[`_pass`]
133	[References the value signaling the outcome of the semantic action. This
134	is pre-initialized to `lex::pass_flags::pass_normal`. If this is set to
135	`lex::pass_flags::pass_fail`, the lexer will behave as if no token has
136	been matched, if is set to `lex::pass_flags::pass_ignore`, the lexer will
137	ignore the current match and proceed trying to match tokens from the
138	input.]]
139	[[`_tokenid`]
140	[Refers to the token id of the token to be generated. Any modifications
141	to this value will be reflected in the generated token.]]
142	[[`_val`]
143	[Refers to the value the next token will be initialized from. Any
144	modifications to this value will be reflected in the generated token.]]
145	[[`_state`]
146	[Refers to the lexer state the input has been match in. Any modifications
147	to this value will be reflected in the lexer itself (the next match will
148	start in the new state). The currently generated token is not affected
149	by changes to this variable.]]
150	[[`_eoi`]
151	[References the end iterator of the overall lexer input. This value
152	cannot be changed.]]
153	]
154
155	The context object passed as the last parameter to any lexer semantic action is
156	not directly accessible while using __boost_phoenix__ expressions. We rather provide
157	predefined Phoenix functions allowing to invoke the different support functions
158	as mentioned above. The following table lists the available support functions
159	and describes their functionality:
160
161	[table Support functions usable from Phoenix expressions inside lexer semantic actions
162	[[Plain function] [Phoenix function] [Description]]
163	[[`ctx.more()`]
164	[`more()`]
165	[The function `more()` tells the lexer that the next time it matches a
166	rule, the corresponding token should be appended onto the current token
167	value rather than replacing it.]]
168	[[`ctx.less()`]
169	[`less(n)`]
170	[The function `less()` takes a single integer parameter `n` and returns an
171	iterator positioned to the nth input character beyond the current token
172	start iterator (i.e. by assigning the return value to the placeholder
173	`_end` it is possible to return all but the first `n` characters of the
174	current token back to the input stream.]]
175	[[`ctx.lookahead()`]
176	[`lookahead(std::size_t)` or `lookahead(token_def)`]
177	[The function `lookahead()` takes a single parameter specifying the token
178	to match in the input. The function can be used for instance to implement
179	lookahead for lexer engines not supporting constructs like flex' `a/b`
180	(match `a`, but only when followed by `b`). It invokes the lexer on the
181	input following the current token without actually moving forward in the
182	input stream. The function returns whether the lexer was able to match
183	the specified token.]]
184	]
185
186	[endsect]