]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | [/ |
2 | / Copyright (c) 2008 Marcin Kalicinski (kalita <at> poczta dot onet dot pl) | |
3 | / Copyright (c) 2009 Sebastian Redl (sebastian dot redl <at> getdesigned dot at) | |
4 | / | |
5 | / Distributed under the Boost Software License, Version 1.0. (See accompanying | |
6 | / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) | |
7 | /] | |
8 | [section XML Parser] | |
9 | [def __xml__ [@http://en.wikipedia.org/wiki/XML XML format]] | |
10 | [def __xml_parser.hpp__ [headerref boost/property_tree/xml_parser.hpp xml_parser.hpp]] | |
11 | [def __RapidXML__ [@http://rapidxml.sourceforge.net/ RapidXML]] | |
12 | [def __boost__ [@http://www.boost.org Boost]] | |
13 | The __xml__ is an industry standard for storing information in textual | |
14 | form. Unfortunately, there is no XML parser in __boost__ as of the | |
15 | time of this writing. The library therefore contains the fast and tiny | |
16 | __RapidXML__ parser (currently in version 1.13) to provide XML parsing support. | |
17 | RapidXML does not fully support the XML standard; it is not capable of parsing | |
18 | DTDs and therefore cannot do full entity substitution. | |
19 | ||
20 | By default, the parser will preserve most whitespace, but remove element content | |
21 | that consists only of whitespace. Encoded whitespaces (e.g.  ) does not | |
22 | count as whitespace in this regard. You can pass the trim_whitespace flag if you | |
23 | want all leading and trailing whitespace trimmed and all continuous whitespace | |
24 | collapsed into a single space. | |
25 | ||
26 | Please note that RapidXML does not understand the encoding specification. If | |
27 | you pass it a character buffer, it assumes the data is already correctly | |
28 | encoded; if you pass it a filename, it will read the file using the character | |
29 | conversion of the locale you give it (or the global locale if you give it none). | |
30 | This means that, in order to parse a UTF-8-encoded XML file into a wptree, you | |
31 | have to supply an alternate locale, either directly or by replacing the global | |
32 | one. | |
33 | ||
34 | XML / property tree conversion schema (__read_xml__ and __write_xml__): | |
35 | ||
36 | * Each XML element corresponds to a property tree node. The child elements | |
37 | correspond to the children of the node. | |
38 | * The attributes of an XML element are stored in the subkey [^<xmlattr>]. There | |
39 | is one child node per attribute in the attribute node. Existence of the | |
40 | [^<xmlattr>] node is not guaranteed or necessary when there are no attributes. | |
41 | * XML comments are stored in nodes named [^<xmlcomment>], unless comment | |
42 | ignoring is enabled via the flags. | |
43 | * Text content is stored in one of two ways, depending on the flags. The default | |
44 | way concatenates all text nodes and stores them as the data of the element | |
45 | node. This way, the entire content can be conveniently read, but the | |
46 | relative ordering of text and child elements is lost. The other way stores | |
47 | each text content as a separate node, all called [^<xmltext>]. | |
48 | ||
49 | The XML storage encoding does not round-trip perfectly. A read-write cycle loses | |
50 | trimmed whitespace, low-level formatting information, and the distinction | |
51 | between normal data and CDATA nodes. Comments are only preserved when enabled. | |
52 | A write-read cycle loses trimmed whitespace; that is, if the origin tree has | |
53 | string data that starts or ends with whitespace, that whitespace is lost. | |
54 | [endsect] [/xml_parser] |