ceph/src/boost/libs/spirit/doc/x3/tutorial/roman.qbk

   1 [/==============================================================================
   2     Copyright (C) 2001-2015 Joel de Guzman
   3     Copyright (C) 2001-2011 Hartmut Kaiser
   4
   5     Distributed under the Boost Software License, Version 1.0. (See accompanying
   6     file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
   7 ===============================================================================/]
   8
   9 [section Roman Numerals]
  10
  11 This example demonstrates:
  12
  13 * The Symbol Table
  14 * Non-terminal rules
  15
  16 [heading Symbol Table]
  17
  18 The symbol table holds a dictionary of symbols where each symbol is a sequence
  19 of characters. The template class, can work efficiently with 8, 16, 32 and even
  20 64 bit characters. Mutable data of type T are associated with each symbol.
  21
  22 Traditionally, symbol table management is maintained separately outside the BNF
  23 grammar through semantic actions. Contrary to standard practice, the Spirit
  24 symbol table class `symbols` is a parser. An object of which may be used
  25 anywhere in the EBNF grammar specification. It is an example of a dynamic
  26 parser. A dynamic parser is characterized by its ability to modify its behavior
  27 at run time. Initially, an empty symbols object matches nothing. At any time,
  28 symbols may be added or removed, thus, dynamically altering its behavior.
  29
  30 Each entry in a symbol table may have an associated mutable data slot. In this
  31 regard, one can view the symbol table as an associative container (or map) of
  32 key-value pairs where the keys are strings.
  33
  34 The symbols class expects one template parameter to specify the data type
  35 associated with each symbol: its attribute. There are a couple of
  36 namespaces in X3 where you can find various versions of the symbols class
  37 for handling different  character encoding including ascii, standard,
  38 standard_wide, iso8859_1, and unicode. The default symbol parser type in
  39 the main x3 namespace is standard.
  40
  41 Here's a parser for roman hundreds (100..900) using the symbol table. Keep in
  42 mind that the data associated with each slot is the parser's attribute (which is
  43 passed to attached semantic actions).
  44
  45     struct hundreds_ : x3::symbols<unsigned>
  46     {
  47         hundreds_()
  48         {
  49             add
  50                 ("C"    , 100)
  51                 ("CC"   , 200)
  52                 ("CCC"  , 300)
  53                 ("CD"   , 400)
  54                 ("D"    , 500)
  55                 ("DC"   , 600)
  56                 ("DCC"  , 700)
  57                 ("DCCC" , 800)
  58                 ("CM"   , 900)
  59             ;
  60         }
  61
  62     } hundreds;
  63
  64 Here's a parser for roman tens (10..90):
  65
  66     struct tens_ : x3::symbols<unsigned>
  67     {
  68         tens_()
  69         {
  70             add
  71                 ("X"    , 10)
  72                 ("XX"   , 20)
  73                 ("XXX"  , 30)
  74                 ("XL"   , 40)
  75                 ("L"    , 50)
  76                 ("LX"   , 60)
  77                 ("LXX"  , 70)
  78                 ("LXXX" , 80)
  79                 ("XC"   , 90)
  80             ;
  81         }
  82
  83     } tens;
  84
  85 and, finally, for ones (1..9):
  86
  87     struct ones_ : x3::symbols<unsigned>
  88     {
  89         ones_()
  90         {
  91             add
  92                 ("I"    , 1)
  93                 ("II"   , 2)
  94                 ("III"  , 3)
  95                 ("IV"   , 4)
  96                 ("V"    , 5)
  97                 ("VI"   , 6)
  98                 ("VII"  , 7)
  99                 ("VIII" , 8)
 100                 ("IX"   , 9)
 101             ;
 102         }
 103
 104     } ones;
 105
 106 Now we can use `hundreds`, `tens` and `ones` anywhere in our parser expressions.
 107 They are all parsers.
 108
 109 [heading Rules]
 110
 111 Up until now, we've been inlining our parser expressions, passing them directly
 112 to the `phrase_parse` function. The expression evaluates into a temporary,
 113 unnamed parser which is passed into the `phrase_parse` function, used, and then
 114 destroyed. This is fine for small parsers. When the expressions get complicated,
 115 you'd want to break the expressions into smaller easier-to-understand pieces,
 116 name them, and refer to them from other parser expressions by name.
 117
 118 A parser expression can be assigned to what is called a "rule". There are
 119 various ways to declare rules. The simplest form is:
 120
 121     rule<ID> const r = "some-name";
 122
 123 At the very least, the rule needs an identification tag. This ID can be any
 124 struct or class type and need not be defined. Forward declaration would suffice.
 125 The name is optional, but is useful for debugging and error handling, as we'll
 126 see later. Notice that rule `r` is declared `const`. Rules are immutable and are
 127 best declared as `const`.
 128
 129 [note Unlike Qi (Spirit V2), X3 rules can be used with both `phrase_parse` and
 130 `parse` without having to specify the skip parser]
 131
 132 For our next example, there's one more rule form you should know about:
 133
 134     rule<ID, Attribute> const r = "some-name";
 135
 136 The Attribute specifies the attributes of the rule. You've seen that our parsers
 137 can have an attribute. Recall that the `double_` parser has an attribute of
 138 `double`. To be precise, these are /synthesized/ attributes. The parser
 139 "synthesizes" the attribute value. Think of them as function return values.
 140
 141 After having declared a rule, you need a definition for the rule. Example:
 142
 143     auto const r_def = double_ >> *(',' >> double_);
 144
 145 By convention, rule definitions have a _def suffix. Like rules, rule definitions
 146 are immutable and are best declared as `const`. Now that we have a rule and its
 147 definition, we tie the rule with a rule definition using the
 148 `BOOST_SPIRIT_DEFINE` macro:
 149
 150     BOOST_SPIRIT_DEFINE(r);
 151
 152 [note `BOOST_SPIRIT_DEFINE` is variadic and may be used for one or more rules.
 153 Example: `BOOST_SPIRIT_DEFINE(r1, r2, r3);`]
 154
 155 [heading Grammars]
 156
 157 Unlike Qi (Spirit V2), X3 discards the notion of a grammar as a concrete
 158 entity for encapsulating rules. In X3, a grammar is simply a logical group of
 159 rules that work together, typically with a single top-level start rule which
 160 serves as the main entry point. X3 grammars are grouped using namespaces.
 161 The roman numeral grammar is a very nice and simple example of a grammar:
 162
 163     namespace parser
 164     {
 165         using x3::eps;
 166         using x3::lit;
 167         using x3::_val;
 168         using x3::_attr;
 169         using ascii::char_;
 170
 171         auto set_zero = [&](auto& ctx){ _val(ctx) = 0; };
 172         auto add1000 = [&](auto& ctx){ _val(ctx) += 1000; };
 173         auto add = [&](auto& ctx){ _val(ctx) += _attr(ctx); };
 174
 175         x3::rule<class roman, unsigned> const roman = "roman";
 176
 177         auto const roman_def =
 178             eps                 [set_zero]
 179             >>
 180             (
 181                 -(+lit('M')     [add1000])
 182                 >>  -hundreds   [add]
 183                 >>  -tens       [add]
 184                 >>  -ones       [add]
 185             )
 186         ;
 187
 188         BOOST_SPIRIT_DEFINE(roman);
 189     }
 190
 191 Things to take notice of:
 192
 193 * The start rule's attribute is `unsigned`.
 194
 195 * `_val(ctx)` gets a reference to the rule's synthesized attribute.
 196
 197 * `_attr(ctx)` gets a reference to the parser's synthesized attribute.
 198
 199 * `eps` is a special spirit parser that consumes no input but is always
 200   successful. We use it to initialize the rule's synthesized
 201   attribute, to zero before anything else. The actual parser starts at
 202   `+lit('M')`, parsing roman thousands. Using `eps` this way is good
 203   for doing pre and post initializations.
 204
 205 * The rule `roman` and the definition `roman_def` are const objects.
 206
 207 [heading Let's Parse!]
 208
 209     bool r = parse(iter, end, roman, result);
 210
 211     if (r && iter == end)
 212     {
 213         std::cout << "-------------------------\n";
 214         std::cout << "Parsing succeeded\n";
 215         std::cout << "result = " << result << std::endl;
 216         std::cout << "-------------------------\n";
 217     }
 218     else
 219     {
 220         std::string rest(iter, end);
 221         std::cout << "-------------------------\n";
 222         std::cout << "Parsing failed\n";
 223         std::cout << "stopped at: \": " << rest << "\"\n";
 224         std::cout << "-------------------------\n";
 225     }
 226
 227 `roman` is our roman numeral parser. This time around we are using the
 228 no-skipping version of the parse functions. We do not want to skip any spaces!
 229 We are also passing in an attribute, `unsigned result`, which will receive the
 230 parsed value.
 231
 232 The full cpp file for this example can be found here: [@../../../example/x3/roman.cpp]
 233
 234 [endsect]