3 <meta http-equiv=
"Content-Type" content=
"text/html; charset=US-ASCII">
4 <title>Unicode and Boost.Regex
</title>
5 <link rel=
"stylesheet" href=
"../../../../../doc/src/boostbook.css" type=
"text/css">
6 <meta name=
"generator" content=
"DocBook XSL Stylesheets V1.77.1">
7 <link rel=
"home" href=
"../index.html" title=
"Boost.Regex 5.1.2">
8 <link rel=
"up" href=
"../index.html" title=
"Boost.Regex 5.1.2">
9 <link rel=
"prev" href=
"introduction_and_overview.html" title=
"Introduction and Overview">
10 <link rel=
"next" href=
"captures.html" title=
"Understanding Marked Sub-Expressions and Captures">
12 <body bgcolor=
"white" text=
"black" link=
"#0000FF" vlink=
"#840084" alink=
"#0000FF">
13 <table cellpadding=
"2" width=
"100%"><tr>
14 <td valign=
"top"><img alt=
"Boost C++ Libraries" width=
"277" height=
"86" src=
"../../../../../boost.png"></td>
15 <td align=
"center"><a href=
"../../../../../index.html">Home
</a></td>
16 <td align=
"center"><a href=
"../../../../../libs/libraries.htm">Libraries
</a></td>
17 <td align=
"center"><a href=
"http://www.boost.org/users/people.html">People
</a></td>
18 <td align=
"center"><a href=
"http://www.boost.org/users/faq.html">FAQ
</a></td>
19 <td align=
"center"><a href=
"../../../../../more/index.htm">More
</a></td>
22 <div class=
"spirit-nav">
23 <a accesskey=
"p" href=
"introduction_and_overview.html"><img src=
"../../../../../doc/src/images/prev.png" alt=
"Prev"></a><a accesskey=
"u" href=
"../index.html"><img src=
"../../../../../doc/src/images/up.png" alt=
"Up"></a><a accesskey=
"h" href=
"../index.html"><img src=
"../../../../../doc/src/images/home.png" alt=
"Home"></a><a accesskey=
"n" href=
"captures.html"><img src=
"../../../../../doc/src/images/next.png" alt=
"Next"></a>
26 <div class=
"titlepage"><div><div><h2 class=
"title" style=
"clear: both">
27 <a name=
"boost_regex.unicode"></a><a class=
"link" href=
"unicode.html" title=
"Unicode and Boost.Regex">Unicode and Boost.Regex
</a>
28 </h2></div></div></div>
30 There are two ways to use Boost.Regex with Unicode strings:
33 <a name=
"boost_regex.unicode.h0"></a>
34 <span class=
"phrase"><a name=
"boost_regex.unicode.rely_on_wchar_t"></a></span><a class=
"link" href=
"unicode.html#boost_regex.unicode.rely_on_wchar_t">Rely
38 If your platform's
<code class=
"computeroutput"><span class=
"keyword">wchar_t
</span></code> type
39 can hold Unicode strings, and your platform's C/C++ runtime correctly handles
40 wide character constants (when passed to
<code class=
"computeroutput"><span class=
"identifier">std
</span><span class=
"special">::
</span><span class=
"identifier">iswspace
</span></code>
41 <code class=
"computeroutput"><span class=
"identifier">std
</span><span class=
"special">::
</span><span class=
"identifier">iswlower
</span></code> etc), then you can use
<code class=
"computeroutput"><span class=
"identifier">boost
</span><span class=
"special">::
</span><span class=
"identifier">wregex
</span></code>
42 to process Unicode. However, there are several disadvantages to this approach:
44 <div class=
"itemizedlist"><ul class=
"itemizedlist" style=
"list-style-type: disc; ">
46 It's not portable: there's no guarantee on the width of
<code class=
"computeroutput"><span class=
"keyword">wchar_t
</span></code>,
47 or even whether the runtime treats wide characters as Unicode at all, most
48 Windows compilers do so, but many Unix systems do not.
51 There's no support for Unicode-specific character classes:
<code class=
"computeroutput"><span class=
"special">[[:
</span><span class=
"identifier">Nd
</span><span class=
"special">:]]
</span></code>,
<code class=
"computeroutput"><span class=
"special">[[:
</span><span class=
"identifier">Po
</span><span class=
"special">:]]
</span></code>
55 You can only search strings that are encoded as sequences of wide characters,
56 it is not possible to search UTF-
8, or even UTF-
16 on many platforms.
60 <a name=
"boost_regex.unicode.h1"></a>
61 <span class=
"phrase"><a name=
"boost_regex.unicode.use_a_unicode_aware_regular_expr"></a></span><a class=
"link" href=
"unicode.html#boost_regex.unicode.use_a_unicode_aware_regular_expr">Use
62 a Unicode Aware Regular Expression Type.
</a>
65 If you have the
<a href=
"http://www.ibm.com/software/globalization/icu/" target=
"_top">ICU
66 library
</a>, then Boost.Regex can be
<a class=
"link" href=
"install.html#boost_regex.install.building_with_unicode_and_icu_su">configured
67 to make use of it
</a>, and provide a distinct regular expression type (boost::u32regex),
68 that supports both Unicode specific character properties, and the searching
69 of text that is encoded in either UTF-
8, UTF-
16, or UTF-
32. See:
<a class=
"link" href=
"ref/non_std_strings/icu.html" title=
"Working With Unicode and ICU String Types">ICU
70 string class support
</a>.
73 <table xmlns:
rev=
"http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width=
"100%"><tr>
74 <td align=
"left"></td>
75 <td align=
"right"><div class=
"copyright-footer">Copyright
© 1998-
2013 John Maddock
<p>
76 Distributed under the Boost Software License, Version
1.0. (See accompanying
77 file LICENSE_1_0.txt or copy at
<a href=
"http://www.boost.org/LICENSE_1_0.txt" target=
"_top">http://www.boost.org/LICENSE_1_0.txt
</a>)
82 <div class=
"spirit-nav">
83 <a accesskey=
"p" href=
"introduction_and_overview.html"><img src=
"../../../../../doc/src/images/prev.png" alt=
"Prev"></a><a accesskey=
"u" href=
"../index.html"><img src=
"../../../../../doc/src/images/up.png" alt=
"Up"></a><a accesskey=
"h" href=
"../index.html"><img src=
"../../../../../doc/src/images/home.png" alt=
"Home"></a><a accesskey=
"n" href=
"captures.html"><img src=
"../../../../../doc/src/images/next.png" alt=
"Next"></a>