]> git.proxmox.com Git - ceph.git/blob - ceph/src/boost/libs/regex/doc/html/boost_regex/introduction_and_overview.html
bump version to 12.2.2-pve1
[ceph.git] / ceph / src / boost / libs / regex / doc / html / boost_regex / introduction_and_overview.html
1 <html>
2 <head>
3 <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
4 <title>Introduction and Overview</title>
5 <link rel="stylesheet" href="../../../../../doc/src/boostbook.css" type="text/css">
6 <meta name="generator" content="DocBook XSL Stylesheets V1.77.1">
7 <link rel="home" href="../index.html" title="Boost.Regex 5.1.2">
8 <link rel="up" href="../index.html" title="Boost.Regex 5.1.2">
9 <link rel="prev" href="install.html" title="Building and Installing the Library">
10 <link rel="next" href="unicode.html" title="Unicode and Boost.Regex">
11 </head>
12 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13 <table cellpadding="2" width="100%"><tr>
14 <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../boost.png"></td>
15 <td align="center"><a href="../../../../../index.html">Home</a></td>
16 <td align="center"><a href="../../../../../libs/libraries.htm">Libraries</a></td>
17 <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
18 <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
19 <td align="center"><a href="../../../../../more/index.htm">More</a></td>
20 </tr></table>
21 <hr>
22 <div class="spirit-nav">
23 <a accesskey="p" href="install.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="unicode.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
24 </div>
25 <div class="section">
26 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
27 <a name="boost_regex.introduction_and_overview"></a><a class="link" href="introduction_and_overview.html" title="Introduction and Overview">Introduction and
28 Overview</a>
29 </h2></div></div></div>
30 <p>
31 Regular expressions are a form of pattern-matching that are often used in text
32 processing; many users will be familiar with the Unix utilities grep, sed and
33 awk, and the programming language Perl, each of which make extensive use of
34 regular expressions. Traditionally C++ users have been limited to the POSIX
35 C API's for manipulating regular expressions, and while Boost.Regex does provide
36 these API's, they do not represent the best way to use the library. For example
37 Boost.Regex can cope with wide character strings, or search and replace operations
38 (in a manner analogous to either sed or Perl), something that traditional C
39 libraries can not do.
40 </p>
41 <p>
42 The class <a class="link" href="ref/basic_regex.html" title="basic_regex"><code class="computeroutput"><span class="identifier">basic_regex</span></code></a>
43 is the key class in this library; it represents a "machine readable"
44 regular expression, and is very closely modeled on <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">basic_string</span></code>,
45 think of it as a string plus the actual state-machine required by the regular
46 expression algorithms. Like <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">basic_string</span></code>
47 there are two typedefs that are almost always the means by which this class
48 is referenced:
49 </p>
50 <pre class="programlisting"><span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">{</span>
51
52 <span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">class</span> <span class="identifier">charT</span><span class="special">,</span>
53 <span class="keyword">class</span> <span class="identifier">traits</span> <span class="special">=</span> <span class="identifier">regex_traits</span><span class="special">&lt;</span><span class="identifier">charT</span><span class="special">&gt;</span> <span class="special">&gt;</span>
54 <span class="keyword">class</span> <span class="identifier">basic_regex</span><span class="special">;</span>
55
56 <span class="keyword">typedef</span> <span class="identifier">basic_regex</span><span class="special">&lt;</span><span class="keyword">char</span><span class="special">&gt;</span> <span class="identifier">regex</span><span class="special">;</span>
57 <span class="keyword">typedef</span> <span class="identifier">basic_regex</span><span class="special">&lt;</span><span class="keyword">wchar_t</span><span class="special">&gt;</span> <span class="identifier">wregex</span><span class="special">;</span>
58
59 <span class="special">}</span>
60 </pre>
61 <p>
62 To see how this library can be used, imagine that we are writing a credit card
63 processing application. Credit card numbers generally come as a string of 16-digits,
64 separated into groups of 4-digits, and separated by either a space or a hyphen.
65 Before storing a credit card number in a database (not necessarily something
66 your customers will appreciate!), we may want to verify that the number is
67 in the correct format. To match any digit we could use the regular expression
68 [0-9], however ranges of characters like this are actually locale dependent.
69 Instead we should use the POSIX standard form [[:digit:]], or the Boost.Regex
70 and Perl shorthand for this \d (note that many older libraries tended to be
71 hard-coded to the C-locale, consequently this was not an issue for them). That
72 leaves us with the following regular expression to validate credit card number
73 formats:
74 </p>
75 <pre class="programlisting">(\d{4}[- ]){3}\d{4}</pre>
76 <p>
77 Here the parenthesis act to group (and mark for future reference) sub-expressions,
78 and the {4} means "repeat exactly 4 times". This is an example of
79 the extended regular expression syntax used by Perl, awk and egrep. Boost.Regex
80 also supports the older "basic" syntax used by sed and grep, but
81 this is generally less useful, unless you already have some basic regular expressions
82 that you need to reuse.
83 </p>
84 <p>
85 Now let's take that expression and place it in some C++ code to validate the
86 format of a credit card number:
87 </p>
88 <pre class="programlisting"><span class="keyword">bool</span> <span class="identifier">validate_card_format</span><span class="special">(</span><span class="keyword">const</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">&amp;</span> <span class="identifier">s</span><span class="special">)</span>
89 <span class="special">{</span>
90 <span class="keyword">static</span> <span class="keyword">const</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e</span><span class="special">(</span><span class="string">"(\\d{4}[- ]){3}\\d{4}"</span><span class="special">);</span>
91 <span class="keyword">return</span> <span class="identifier">regex_match</span><span class="special">(</span><span class="identifier">s</span><span class="special">,</span> <span class="identifier">e</span><span class="special">);</span>
92 <span class="special">}</span>
93 </pre>
94 <p>
95 Note how we had to add some extra escapes to the expression: remember that
96 the escape is seen once by the C++ compiler, before it gets to be seen by the
97 regular expression engine, consequently escapes in regular expressions have
98 to be doubled up when embedding them in C/C++ code. Also note that all the
99 examples assume that your compiler supports argument-dependent lookup, if yours
100 doesn't (for example VC6), then you will have to add some <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span></code> prefixes to some of the function calls in
101 the examples.
102 </p>
103 <p>
104 Those of you who are familiar with credit card processing, will have realized
105 that while the format used above is suitable for human readable card numbers,
106 it does not represent the format required by online credit card systems; these
107 require the number as a string of 16 (or possibly 15) digits, without any intervening
108 spaces. What we need is a means to convert easily between the two formats,
109 and this is where search and replace comes in. Those who are familiar with
110 the utilities sed and Perl will already be ahead here; we need two strings
111 - one a regular expression - the other a "format string" that provides
112 a description of the text to replace the match with. In Boost.Regex this search
113 and replace operation is performed with the algorithm <a class="link" href="ref/regex_replace.html" title="regex_replace"><code class="computeroutput"><span class="identifier">regex_replace</span></code></a>, for our credit card
114 example we can write two algorithms like this to provide the format conversions:
115 </p>
116 <pre class="programlisting"><span class="comment">// match any format with the regular expression:</span>
117 <span class="keyword">const</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">regex</span> <span class="identifier">e</span><span class="special">(</span><span class="string">"\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z"</span><span class="special">);</span>
118 <span class="keyword">const</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">machine_format</span><span class="special">(</span><span class="string">"\\1\\2\\3\\4"</span><span class="special">);</span>
119 <span class="keyword">const</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">human_format</span><span class="special">(</span><span class="string">"\\1-\\2-\\3-\\4"</span><span class="special">);</span>
120
121 <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">machine_readable_card_number</span><span class="special">(</span><span class="keyword">const</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">s</span><span class="special">)</span>
122 <span class="special">{</span>
123 <span class="keyword">return</span> <span class="identifier">regex_replace</span><span class="special">(</span><span class="identifier">s</span><span class="special">,</span> <span class="identifier">e</span><span class="special">,</span> <span class="identifier">machine_format</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">match_default</span> <span class="special">|</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">format_sed</span><span class="special">);</span>
124 <span class="special">}</span>
125
126 <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">human_readable_card_number</span><span class="special">(</span><span class="keyword">const</span> <span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span> <span class="identifier">s</span><span class="special">)</span>
127 <span class="special">{</span>
128 <span class="keyword">return</span> <span class="identifier">regex_replace</span><span class="special">(</span><span class="identifier">s</span><span class="special">,</span> <span class="identifier">e</span><span class="special">,</span> <span class="identifier">human_format</span><span class="special">,</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">match_default</span> <span class="special">|</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">format_sed</span><span class="special">);</span>
129 <span class="special">}</span>
130 </pre>
131 <p>
132 Here we've used marked sub-expressions in the regular expression to split out
133 the four parts of the card number as separate fields, the format string then
134 uses the sed-like syntax to replace the matched text with the reformatted version.
135 </p>
136 <p>
137 In the examples above, we haven't directly manipulated the results of a regular
138 expression match, however in general the result of a match contains a number
139 of sub-expression matches in addition to the overall match. When the library
140 needs to report a regular expression match it does so using an instance of
141 the class <a class="link" href="ref/match_results.html" title="match_results"><code class="computeroutput"><span class="identifier">match_results</span></code></a>,
142 as before there are typedefs of this class for the most common cases:
143 </p>
144 <pre class="programlisting"><span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">{</span>
145
146 <span class="keyword">typedef</span> <span class="identifier">match_results</span><span class="special">&lt;</span><span class="keyword">const</span> <span class="keyword">char</span><span class="special">*&gt;</span> <span class="identifier">cmatch</span><span class="special">;</span>
147 <span class="keyword">typedef</span> <span class="identifier">match_results</span><span class="special">&lt;</span><span class="keyword">const</span> <span class="keyword">wchar_t</span><span class="special">*&gt;</span> <span class="identifier">wcmatch</span><span class="special">;</span>
148 <span class="keyword">typedef</span> <span class="identifier">match_results</span><span class="special">&lt;</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">string</span><span class="special">::</span><span class="identifier">const_iterator</span><span class="special">&gt;</span> <span class="identifier">smatch</span><span class="special">;</span>
149 <span class="keyword">typedef</span> <span class="identifier">match_results</span><span class="special">&lt;</span><span class="identifier">std</span><span class="special">::</span><span class="identifier">wstring</span><span class="special">::</span><span class="identifier">const_iterator</span><span class="special">&gt;</span> <span class="identifier">wsmatch</span><span class="special">;</span>
150
151 <span class="special">}</span>
152 </pre>
153 <p>
154 The algorithms <a class="link" href="ref/regex_search.html" title="regex_search"><code class="computeroutput"><span class="identifier">regex_search</span></code></a>
155 and <a class="link" href="ref/regex_match.html" title="regex_match"><code class="computeroutput"><span class="identifier">regex_match</span></code></a>
156 make use of <a class="link" href="ref/match_results.html" title="match_results"><code class="computeroutput"><span class="identifier">match_results</span></code></a>
157 to report what matched; the difference between these algorithms is that <a class="link" href="ref/regex_match.html" title="regex_match"><code class="computeroutput"><span class="identifier">regex_match</span></code></a>
158 will only find matches that consume <span class="emphasis"><em>all</em></span> of the input text,
159 where as <a class="link" href="ref/regex_search.html" title="regex_search"><code class="computeroutput"><span class="identifier">regex_search</span></code></a>
160 will search for a match anywhere within the text being matched.
161 </p>
162 <p>
163 Note that these algorithms are not restricted to searching regular C-strings,
164 any bidirectional iterator type can be searched, allowing for the possibility
165 of seamlessly searching almost any kind of data.
166 </p>
167 <p>
168 For search and replace operations, in addition to the algorithm <a class="link" href="ref/regex_replace.html" title="regex_replace"><code class="computeroutput"><span class="identifier">regex_replace</span></code></a> that we have already
169 seen, the <a class="link" href="ref/match_results.html" title="match_results"><code class="computeroutput"><span class="identifier">match_results</span></code></a>
170 class has a <code class="computeroutput"><span class="identifier">format</span></code> member that
171 takes the result of a match and a format string, and produces a new string
172 by merging the two.
173 </p>
174 <p>
175 For iterating through all occurrences of an expression within a text, there
176 are two iterator types: <a class="link" href="ref/regex_iterator.html" title="regex_iterator"><code class="computeroutput"><span class="identifier">regex_iterator</span></code></a> will enumerate over
177 the <a class="link" href="ref/match_results.html" title="match_results"><code class="computeroutput"><span class="identifier">match_results</span></code></a>
178 objects found, while <a class="link" href="ref/regex_token_iterator.html" title="regex_token_iterator"><code class="computeroutput"><span class="identifier">regex_token_iterator</span></code></a> will enumerate
179 a series of strings (similar to perl style split operations).
180 </p>
181 <p>
182 For those that dislike templates, there is a high level wrapper class <a class="link" href="ref/deprecated_interfaces/old_regex.html" title="High Level Class RegEx (Deprecated)"><code class="computeroutput"><span class="identifier">RegEx</span></code></a>
183 that is an encapsulation of the lower level template code - it provides a simplified
184 interface for those that don't need the full power of the library, and supports
185 only narrow characters, and the "extended" regular expression syntax.
186 This class is now deprecated as it does not form part of the regular expressions
187 C++ standard library proposal.
188 </p>
189 <p>
190 The POSIX API functions: <a class="link" href="ref/posix.html#boost_regex.ref.posix.regcomp"><code class="computeroutput"><span class="identifier">regcomp</span></code></a>, <a class="link" href="ref/posix.html#boost_regex.ref.posix.regexec"><code class="computeroutput"><span class="identifier">regexec</span></code></a>, <a class="link" href="ref/posix.html#boost_regex.ref.posix.regfree"><code class="computeroutput"><span class="identifier">regfree</span></code></a> and [regerr], are available
191 in both narrow character and Unicode versions, and are provided for those who
192 need compatibility with these API's.
193 </p>
194 <p>
195 Finally, note that the library now has <a class="link" href="background_information/locale.html" title="Localization">run-time
196 localization support</a>, and recognizes the full POSIX regular expression
197 syntax - including advanced features like multi-character collating elements
198 and equivalence classes - as well as providing compatibility with other regular
199 expression libraries including GNU and BSD4 regex packages, PCRE and Perl 5.
200 </p>
201 </div>
202 <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
203 <td align="left"></td>
204 <td align="right"><div class="copyright-footer">Copyright &#169; 1998-2013 John Maddock<p>
205 Distributed under the Boost Software License, Version 1.0. (See accompanying
206 file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
207 </p>
208 </div></td>
209 </tr></table>
210 <hr>
211 <div class="spirit-nav">
212 <a accesskey="p" href="install.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="unicode.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
213 </div>
214 </body>
215 </html>