ceph/src/boost/libs/spirit/doc/qi/roman.qbk

   1 [/==============================================================================
   2     Copyright (C) 2001-2011 Joel de Guzman
   3     Copyright (C) 2001-2011 Hartmut Kaiser
   4
   5     Distributed under the Boost Software License, Version 1.0. (See accompanying
   6     file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
   7 ===============================================================================/]
   8
   9 [section Roman Numerals]
  10
  11 This example demonstrates:
  12
  13 * symbol table
  14 * rule
  15 * grammar
  16
  17 [heading Symbol Table]
  18
  19 The symbol table holds a dictionary of symbols where each symbol is a sequence
  20 of characters (a `char`, `wchar_t`, `int`, enumeration etc.) . The template
  21 class, parameterized by the character type, can work efficiently with 8, 16, 32
  22 and even 64 bit characters. Mutable data of type T are associated with each
  23 symbol.
  24
  25 Traditionally, symbol table management is maintained separately outside the BNF
  26 grammar through semantic actions. Contrary to standard practice, the Spirit
  27 symbol table class `symbols` is a parser. An object of which may be used
  28 anywhere in the EBNF grammar specification. It is an example of a dynamic
  29 parser. A dynamic parser is characterized by its ability to modify its behavior
  30 at run time. Initially, an empty symbols object matches nothing. At any time,
  31 symbols may be added or removed, thus, dynamically altering its behavior.
  32
  33 Each entry in a symbol table has an associated mutable data slot. In this
  34 regard, one can view the symbol table as an associative container (or map) of
  35 key-value pairs where the keys are strings.
  36
  37 The symbols class expects two template parameters. The first parameter specifies
  38 the character type of the symbols. The second specifies the data type associated
  39 with each symbol: its attribute.
  40
  41 Here's a parser for roman hundreds (100..900) using the symbol table. Keep in
  42 mind that the data associated with each slot is the parser's attribute (which is
  43 passed to attached semantic actions).
  44
  45 [import ../../example/qi/roman.cpp]
  46
  47 [tutorial_roman_hundreds]
  48
  49 Here's a parser for roman tens (10..90):
  50
  51 [tutorial_roman_tens]
  52
  53 and, finally, for ones (1..9):
  54
  55 [tutorial_roman_ones]
  56
  57 Now we can use `hundreds`, `tens` and `ones` anywhere in our parser expressions.
  58 They are all parsers.
  59
  60 [heading Rules]
  61
  62 Up until now, we've been inlining our parser expressions, passing them directly
  63 to the `phrase_parse` function. The expression evaluates into a temporary,
  64 unnamed parser which is passed into the `phrase_parse` function, used, and then
  65 destroyed. This is fine for small parsers. When the expressions get complicated,
  66 you'd want to break the expressions into smaller easier-to-understand pieces,
  67 name them, and refer to them from other parser expressions by name.
  68
  69 A parser expression can be assigned to what is called a "rule". There are
  70 various ways to declare rules. The simplest form is:
  71
  72     rule<Iterator> r;
  73
  74 At the very least, the rule needs to know the iterator type it will be working
  75 on. This rule cannot be used with `phrase_parse`. It can only be used with the
  76 `parse` function -- a version that does not do white space skipping (does not
  77 have the skipper argument). If you want to have it skip white spaces, you need
  78 to pass in the type skip parser, as in the next form:
  79
  80     rule<Iterator, Skipper> r;
  81
  82 Example:
  83
  84     rule<std::string::iterator, space_type> r;
  85
  86 This type of rule can be used for both `phrase_parse` and `parse`.
  87
  88 For our next example, there's one more rule form you should know about:
  89
  90     rule<Iterator, Signature> r;
  91
  92 or
  93
  94     rule<Iterator, Signature, Skipper> r;
  95
  96 [tip All rule template arguments after Iterator can be supplied in any order.]
  97
  98 The Signature specifies the attributes of the rule. You've seen that our parsers
  99 can have an attribute. Recall that the `double_` parser has an attribute of
 100 `double`. To be precise, these are /synthesized/ attributes. The parser
 101 "synthesizes" the attribute value. Think of them as function return values.
 102
 103 There's another type of attribute called "inherited" attribute. We won't need
 104 them for now, but it's good that you be aware of such attributes. You can think
 105 of them as function arguments. And, rightly so, the rule signature is a function
 106 signature of the form:
 107
 108     result(argN, argN,..., argN)
 109
 110 After having declared a rule, you can now assign any parser expression to it.
 111 Example:
 112
 113     r = double_ >> *(',' >> double_);
 114
 115 [heading Grammars]
 116
 117 A grammar encapsulates one or more rules. It has the same template parameters as
 118 the rule. You declare a grammar by:
 119
 120 # deriving a struct (or class) from the `grammar` class template
 121 # declare one or more rules as member variables
 122 # initialize the base grammar class by giving it the start rule (its the first
 123   rule that gets called when the grammar starts parsing)
 124 # initialize your rules in your constructor
 125
 126 The roman numeral grammar is a very nice and simple example of a grammar:
 127
 128 [tutorial_roman_grammar]
 129
 130 Things to take notice of:
 131
 132 * The grammar and start rule signature is `unsigned()`. It has a synthesized
 133   attribute (return value) of type `unsigned` with no inherited attributes
 134   (arguments).
 135
 136 * We did not specify a skip-parser. We don't want to skip in between the
 137   numerals.
 138
 139 * `roman::base_type` is a typedef for `grammar<Iterator, unsigned()>`. If
 140    `roman` was not a template, you could simply write: base_type(start)
 141
 142 * It's best to make your grammar templates such that they can be reused
 143   for different iterator types.
 144
 145 * `_val` is another __phoenix__ placeholder representing the rule's synthesized
 146   attribute.
 147
 148 * `eps` is a special spirit parser that consumes no input but is always
 149   successful. We use it to initialize `_val`, the rule's synthesized
 150   attribute, to zero before anything else. The actual parser starts at
 151   `+lit('M')`, parsing roman thousands. Using `eps` this way is good
 152   for doing pre and post initializations.
 153
 154 * The expression `a || b` reads: match a or b and in sequence. That is, if both
 155   `a` and `b` match, it must be in sequence; this is equivalent to `a >> -b | b`,
 156   but more efficient.
 157
 158 [heading Let's Parse!]
 159
 160 [tutorial_roman_grammar_parse]
 161
 162 `roman_parser` is an object of type `roman`, our roman numeral parser. This time
 163 around we are using the no-skipping version of the parse functions. We do not
 164 want to skip any spaces! We are also passing in an attribute, `unsigned result`,
 165 which will receive the parsed value.
 166
 167 The full cpp file for this example can be found here: [@../../example/qi/roman.cpp]
 168
 169
 170 [endsect]