]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [/============================================================================== |
2 | Copyright (C) 2015 Joel de Guzman | |
3 | ||
4 | Distributed under the Boost Software License, Version 1.0. (See accompanying | |
5 | file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) | |
6 | ===============================================================================/] | |
7 | ||
8 | [section:rexpr RExpressions - ASTs!] | |
9 | ||
10 | Stop and think about it... We're actually generating ASTs (abstract | |
11 | syntax trees) in our previoius examples. We parsed a single structure and | |
12 | generated an in-memory representation of it in the form of a struct: the struct | |
13 | employee. If we changed the implementation to parse one or more employees, the | |
14 | result would be a std::vector<employee>. We can go on and add more hierarchy: | |
15 | teams, departments, corporations. Then we'll have an AST representation of it | |
16 | all. | |
17 | ||
18 | In this example, we'll explore more on how to create ASTs. We will parse a | |
19 | minimalistic JSON-like language and compile the results into our data structures | |
20 | in the form of a tree. | |
21 | ||
22 | /rexpr/ is a parser for RExpressions, a language resembling a minimal subset | |
23 | of json, limited to a dictionary (composed of key=value pairs) where the value | |
24 | can itself be a string or a recursive dictionary. | |
25 | ||
26 | Here's an Example: | |
27 | ||
28 | { | |
29 | "color" = "blue" | |
30 | "size" = "29 cm." | |
31 | "position" = { | |
32 | "x" = "123" | |
33 | "y" = "456" | |
34 | } | |
35 | } | |
36 | ||
37 | This simple parser for X3 is intended as a minimal starting point. It is | |
38 | minimal, yet complete with all things necessary parts that make up rules (and | |
39 | grammars), except error handling and reporting. | |
40 | ||
41 | The example can be found here: [@../../../example/x3/rexpr/rexpr_min/rexpr.cpp] | |
42 | ||
43 | The same parser, but complete with error handling and reporting, better file | |
44 | organization, separate compilation of grammars, and a full test harness, can | |
45 | be found in this directory: | |
46 | ||
47 | [@../../../example/x3/rexpr/rexpr_full/] | |
48 | ||
49 | [note rexpr_full is the canonical structure proposed as best practice on how | |
50 | parsers are written using Spirit X3.] | |
51 | ||
52 | [heading The AST] | |
53 | ||
54 | Here's the AST: | |
55 | ||
56 | namespace client { namespace ast | |
57 | { | |
58 | namespace x3 = boost::spirit::x3; | |
59 | ||
60 | struct rexpr; | |
61 | ||
62 | struct rexpr_value : x3::variant< | |
63 | std::string | |
64 | , x3::forward_ast<rexpr> | |
65 | > | |
66 | { | |
67 | using base_type::base_type; | |
68 | using base_type::operator=; | |
69 | }; | |
70 | ||
71 | typedef std::map<std::string, rexpr_value> rexpr_map; | |
72 | typedef std::pair<std::string, rexpr_value> rexpr_key_value; | |
73 | ||
74 | struct rexpr | |
75 | { | |
76 | rexpr_map entries; | |
77 | }; | |
78 | }} | |
79 | ||
80 | `x3::variant` is a support utility in Spirit X3 that extends __boost_variant__. | |
81 | Typically, you use __boost_variant__ right out of the box and refer to a | |
82 | particular template instantiation using a typedef. For example: | |
83 | ||
84 | typedef boost::variant<std::string, int> my_variant; | |
85 | ||
86 | Instead of doing that, we create a `class` (or `struct` in the case) that | |
87 | subclasses from `x3::variant`. By making the variant a subclass, you have a | |
88 | distinct type in your namespace. You also have control of the constructors | |
89 | and assignment operators, as well as giving you the freedom to add more | |
90 | functionality. | |
91 | ||
92 | `rexpr_value` has two variant elements. It can either be a `std::string` or a | |
93 | `rexpr`. Since `rexpr` recursively contains a `rexpr_value`, it has to be | |
94 | forward declared. This recursive data structure requires `rexpr` to be wrapped | |
95 | in a `x3::forward_ast` as shown in the declaration. | |
96 | ||
97 | We need to tell fusion about our `rexpr` struct to make it a first-class fusion | |
98 | citizen: | |
99 | ||
100 | BOOST_FUSION_ADAPT_STRUCT( | |
101 | client::ast::rexpr, | |
102 | entries | |
103 | ) | |
104 | ||
105 | So essentially, a `rexpr_value` value is either a `std::string` or a | |
106 | `std::map<std::string, rexpr_value>`. | |
107 | ||
108 | [heading Walking the AST] | |
109 | ||
110 | We add a utility to print out the AST: | |
111 | ||
112 | int const tabsize = 4; | |
113 | ||
114 | struct rexpr_printer | |
115 | { | |
116 | typedef void result_type; | |
117 | ||
118 | rexpr_printer(int indent = 0) | |
119 | : indent(indent) {} | |
120 | ||
121 | void operator()(rexpr const& ast) const | |
122 | { | |
123 | std::cout << '{' << std::endl; | |
124 | for (auto const& entry : ast.entries) | |
125 | { | |
126 | tab(indent+tabsize); | |
127 | std::cout << '"' << entry.first << "\" = "; | |
128 | boost::apply_visitor(rexpr_printer(indent+tabsize), entry.second); | |
129 | } | |
130 | tab(indent); | |
131 | std::cout << '}' << std::endl; | |
132 | } | |
133 | ||
134 | void operator()(std::string const& text) const | |
135 | { | |
136 | std::cout << '"' << text << '"' << std::endl; | |
137 | } | |
138 | ||
139 | void tab(int spaces) const | |
140 | { | |
141 | for (int i = 0; i < spaces; ++i) | |
142 | std::cout << ' '; | |
143 | } | |
144 | ||
145 | int indent; | |
146 | }; | |
147 | ||
148 | Traversing the AST is a recursive excercise. `rexpr_printer` is a function | |
149 | object and the main entry point is `void operator()(rexpr const& ast)`. Notice | |
150 | how it recursively calls an instance of itself for each of the key value pairs, | |
151 | using `boost::apply_visitor`. Before and after iterating through all the | |
152 | elements in the map, the braces are printed to surround the entire body, taking | |
153 | care of correct indentation at each point in the recursive traversal. | |
154 | ||
155 | The `operator()` for `std::string` should be self explanatory. It simply prints | |
156 | the text inside double quotes. | |
157 | ||
158 | [heading The Grammar] | |
159 | ||
160 | namespace client { namespace parser | |
161 | { | |
162 | namespace x3 = boost::spirit::x3; | |
163 | namespace ascii = boost::spirit::x3::ascii; | |
164 | ||
165 | using x3::lit; | |
166 | using x3::lexeme; | |
167 | ||
168 | using ascii::char_; | |
169 | using ascii::string; | |
170 | ||
171 | x3::rule<class rexpr_value, ast::rexpr_value> | |
172 | rexpr_value = "rexpr_value"; | |
173 | ||
174 | x3::rule<class rexpr, ast::rexpr> | |
175 | rexpr = "rexpr"; | |
176 | ||
177 | x3::rule<class rexpr_key_value, ast::rexpr_key_value> | |
178 | rexpr_key_value = "rexpr_key_value"; | |
179 | ||
180 | auto const quoted_string = | |
181 | lexeme['"' >> *(char_ - '"') >> '"']; | |
182 | ||
183 | auto const rexpr_value_def = | |
184 | quoted_string | rexpr; | |
185 | ||
186 | auto const rexpr_key_value_def = | |
187 | quoted_string >> '=' >> rexpr_value; | |
188 | ||
189 | auto const rexpr_def = | |
190 | '{' >> *rexpr_key_value >> '}'; | |
191 | ||
192 | BOOST_SPIRIT_DEFINE(rexpr_value, rexpr, rexpr_key_value); | |
193 | }} | |
194 | ||
195 | We declare three rules `rexpr_value`, `rexpr` and `rexpr_key_value`. Same deal | |
196 | as before. The attributes declared for each rule are the same structures we | |
197 | declared in our AST above. These are the structures our rules will produce. | |
198 | ||
199 | So, going top-down (from the start rule), `rexpr` is defined as zero or more | |
200 | `rexpr_key_value`s enclosed inside the curly braces. That's the kleene star in | |
201 | action there: `*rexpr_key_value`. | |
202 | ||
203 | [note Take note of the convention: we use `rexpr` name as the rule name and | |
204 | `rexpr_def` as the rule definition name.] | |
205 | ||
206 | To make the grammar clean, we capture the `quoted_string` production using | |
207 | C++ `auto`. | |
208 | ||
209 | `rexpr_value` is a `quoted_string` followed by the assignment operator `'='`, | |
210 | followed by a `rexpr_value`. `rexpr_value` is a `quoted_string` /OR/ a `rexpr`. | |
211 | This uses the alternative operator `|`, and maps nicely to the variant AST the | |
212 | rule is associated with. | |
213 | ||
214 | Finally, we associate the rules and their definitions: | |
215 | ||
216 | BOOST_SPIRIT_DEFINE(rexpr_value, rexpr, rexpr_key_value); | |
217 | ||
218 | [endsect] |