ceph/src/boost/libs/spirit/doc/qi/mini_xml.qbk

   1 [/==============================================================================
   2     Copyright (C) 2001-2011 Joel de Guzman
   3     Copyright (C) 2001-2011 Hartmut Kaiser
   4
   5     Distributed under the Boost Software License, Version 1.0. (See accompanying
   6     file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
   7 ===============================================================================/]
   8
   9 [section Mini XML - ASTs!]
  10
  11 Stop and think about it... We've come very close to generating an AST
  12 (abstract syntax tree) in our last example. We parsed a single structure and
  13 generated an in-memory representation of it in the form of a struct: the
  14 `struct employee`. If we changed the implementation to parse one or more
  15 employees, the result would be a `std::vector<employee>`. We can go on and add
  16 more hierarchy: teams, departments, corporations. Then we'll have an AST
  17 representation of it all.
  18
  19 In this example (actually two examples), we'll now explore how to create
  20 ASTs. We will parse a minimalistic XML-like language and compile the results
  21 into our data structures in the form of a tree.
  22
  23 Along the way, we'll see new features:
  24
  25 * Inherited attributes
  26 * Variant attributes
  27 * Local Variables
  28 * Not Predicate
  29 * Lazy Lit
  30
  31 The full cpp files for these examples can be found here:
  32 [@../../example/qi/mini_xml1.cpp] and here: [@../../example/qi/mini_xml2.cpp]
  33
  34 There are a couple of sample toy-xml files in the mini_xml_samples subdirectory:
  35 [@../../example/qi/mini_xml_samples/1.toyxml],
  36 [@../../example/qi/mini_xml_samples/2.toyxml], and
  37 [@../../example/qi/mini_xml_samples/3.toyxml] for testing purposes.
  38 The example [@../../example/qi/mini_xml_samples/4.toyxml] has an error in it.
  39
  40 [import ../../example/qi/mini_xml1.cpp]
  41 [import ../../example/qi/mini_xml2.cpp]
  42
  43 [heading First Cut]
  44
  45 Without further delay, here's the first version of the XML grammar:
  46
  47 [tutorial_xml1_grammar]
  48
  49 Going bottom up, let's examine the `text` rule:
  50
  51     rule<Iterator, std::string(), space_type> text;
  52
  53 and its definition:
  54
  55     text = lexeme[+(char_ - '<')        [_val += _1]];
  56
  57 The semantic action collects the chars and appends them (via +=) to the
  58 `std::string` attribute of the rule (represented by the placeholder `_val`).
  59
  60 [heading Alternates]
  61
  62     rule<Iterator, mini_xml_node(), space_type> node;
  63
  64 and its definition:
  65
  66     node = (xml | text)                 [_val = _1];
  67
  68 We'll see a `mini_xml_node` structure later. Looking at the rule
  69 definition, we see some alternation going on here. An xml `node` is
  70 either an `xml` OR `text`. Hmmm... hold on to that thought...
  71
  72     rule<Iterator, std::string(), space_type> start_tag;
  73
  74 Again, with an attribute of `std::string`. Then, it's definition:
  75
  76     start_tag =
  77             '<'
  78         >>  !char_('/')
  79         >>  lexeme[+(char_ - '>')       [_val += _1]]
  80         >>  '>'
  81     ;
  82
  83 [heading Not Predicate]
  84
  85 `start_tag` is similar to the `text` rule apart from the added `'<'` and `'>'`.
  86 But wait, to make sure that the `start_tag` does not parse `end_tag`s too, we
  87 add: `!char_('/')`. This is a "Not Predicate":
  88
  89     !p
  90
  91 It will try the parser, `p`. If it is successful, fail; otherwise, pass. In
  92 other words, it negates the result of `p`. Like the `eps`, it does not consume
  93 any input though. It will always rewind the iterator position to where it
  94 was upon entry. So, the expression:
  95
  96     !char_('/')
  97
  98 basically says: we should not have a `'/'` at this point.
  99
 100 [heading Inherited Attribute]
 101
 102 The `end_tag`:
 103
 104     rule<Iterator, void(std::string), space_type> end_tag;
 105
 106 Ohh! Now we see an inherited attribute there: `std::string`. The `end_tag` does
 107 not have a synthesized attribute. Let's see its definition:
 108
 109     end_tag =
 110             "</"
 111         >>  lit(_r1)
 112         >>  '>'
 113     ;
 114
 115 `_r1` is yet another __phoenix__ placeholder for the first inherited attribute
 116 (we have only one, use `_r2`, `_r3`, etc. if you have more).
 117
 118 [heading A Lazy Lit]
 119
 120 Check out how we used `lit` here, this time, not with a literal string, but with
 121 the value of the first inherited attribute, which is specified as `std::string` in
 122 our rule declaration.
 123
 124 Finally, our `xml` rule:
 125
 126     rule<Iterator, mini_xml(), space_type> xml;
 127
 128 `mini_xml` is our attribute here. We'll see later what it is. Let's see its
 129 definition:
 130
 131     xml =
 132             start_tag                   [at_c<0>(_val) = _1]
 133         >>  *node                       [push_back(at_c<1>(_val), _1)]
 134         >>  end_tag(at_c<0>(_val))
 135     ;
 136
 137 Those who know __fusion__ now will notice `at_c<0>` and `at_c<1>`. This gives us
 138 a hint that `mini_xml` is a sort of a tuple - a fusion sequence. `at_c<N>` here
 139 is a lazy version of the tuple accessors, provided by __phoenix__.
 140
 141 [heading How it all works]
 142
 143 So, what's happening?
 144
 145 # Upon parsing `start_tag`, the parsed start-tag string is placed in
 146   `at_c<0>(_val)`.
 147
 148 # Then we parse zero or more `node`s. At each step, we `push_back` the result
 149   into `at_c<1>(_val)`.
 150
 151 # Finally, we parse the `end_tag` giving it an inherited attribute: `at_c<0>(_val)`.
 152   This is the string we obtained from the `start_tag`. Investigate `end_tag` above.
 153   It will fail to parse if it gets something different from what we got from the
 154   `start_tag`. This ensures that our tags are balanced.
 155
 156 To give the last item some more light, what happens is this:
 157
 158     end_tag(at_c<0>(_val))
 159
 160 calls:
 161
 162     end_tag =
 163             "</"
 164         >>  lit(_r1)
 165         >>  '>'
 166     ;
 167
 168 passing in `at_c<0>(_val)`, the string from start tag. This is referred to in the
 169 `end_tag` body as `_r1`.
 170
 171 [heading The Structures]
 172
 173 Let's see our structures. It will definitely be hierarchical: xml is
 174 hierarchical. It will also be recursive: xml is recursive.
 175
 176 [tutorial_xml1_structures]
 177
 178 [heading Of Alternates and Variants]
 179
 180 So that's what a `mini_xml_node` looks like. We had a hint that it is either
 181 a `string` or a `mini_xml`. For this, we use __boost_variant__. `boost::recursive_wrapper`
 182 wraps `mini_xml`, making it a recursive data structure.
 183
 184 Yep, you got that right: the attribute of an alternate:
 185
 186     a | b
 187
 188 is a
 189
 190   boost::variant<A, B>
 191
 192 where `A` is the attribute of `a` and `B` is the attribute of `b`.
 193
 194 [heading Adapting structs again]
 195
 196 `mini_xml` is no brainier. It is a plain ol' struct. But as we've seen in our
 197 employee example, we can adapt that to be a __fusion__ sequence:
 198
 199 [tutorial_xml1_adapt_structures]
 200
 201 [heading One More Take]
 202
 203 Here's another version. The AST structure remains the same, but this time,
 204 you'll see that we make use of auto-rules making the grammar
 205 semantic-action-less. Here it is:
 206
 207 [tutorial_xml2_grammar]
 208
 209 This one shouldn't be any more difficult to understand after going through the
 210 first xml parser example. The rules are almost the same, except that, we got rid
 211 of semantic actions and used auto-rules (see the employee example if you missed
 212 that). There is some new stuff though. It's all in the `xml` rule:
 213
 214 [heading Local Variables]
 215
 216     rule<Iterator, mini_xml(), locals<std::string>, space_type> xml;
 217
 218 Wow, we have four template parameters now. What's that `locals` guy doing there?
 219 Well, it declares that the rule `xml` will have one local variable: a `string`.
 220 Let's see how this is used in action:
 221
 222     xml %=
 223             start_tag[_a = _1]
 224         >>  *node
 225         >>  end_tag(_a)
 226     ;
 227
 228 # Upon parsing `start_tag`, the parsed start-tag string is placed in
 229   the local variable specified by (yet another) __phoenix__ placeholder:
 230   `_a`. We have only one local variable. If we had more, these are designated
 231   by `_b`..`_z`.
 232
 233 # Then we parse zero or more `node`s.
 234
 235 # Finally, we parse the `end_tag` giving it an inherited attribute: `_a`, our
 236   local variable.
 237
 238 There are no actions involved in stuffing data into our `xml` attribute. It's
 239 all taken care of thanks to the auto-rule.
 240
 241 [endsect]