]>
Commit | Line | Data |
---|---|---|
1 | <html> | |
2 | <head> | |
3 | <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII"> | |
4 | <title>Unicode and Boost.Regex</title> | |
5 | <link rel="stylesheet" href="../../../../../doc/src/boostbook.css" type="text/css"> | |
6 | <meta name="generator" content="DocBook XSL Stylesheets V1.77.1"> | |
7 | <link rel="home" href="../index.html" title="Boost.Regex 5.1.2"> | |
8 | <link rel="up" href="../index.html" title="Boost.Regex 5.1.2"> | |
9 | <link rel="prev" href="introduction_and_overview.html" title="Introduction and Overview"> | |
10 | <link rel="next" href="captures.html" title="Understanding Marked Sub-Expressions and Captures"> | |
11 | </head> | |
12 | <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> | |
13 | <table cellpadding="2" width="100%"><tr> | |
14 | <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../boost.png"></td> | |
15 | <td align="center"><a href="../../../../../index.html">Home</a></td> | |
16 | <td align="center"><a href="../../../../../libs/libraries.htm">Libraries</a></td> | |
17 | <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td> | |
18 | <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td> | |
19 | <td align="center"><a href="../../../../../more/index.htm">More</a></td> | |
20 | </tr></table> | |
21 | <hr> | |
22 | <div class="spirit-nav"> | |
23 | <a accesskey="p" href="introduction_and_overview.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="captures.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a> | |
24 | </div> | |
25 | <div class="section"> | |
26 | <div class="titlepage"><div><div><h2 class="title" style="clear: both"> | |
27 | <a name="boost_regex.unicode"></a><a class="link" href="unicode.html" title="Unicode and Boost.Regex">Unicode and Boost.Regex</a> | |
28 | </h2></div></div></div> | |
29 | <p> | |
30 | There are two ways to use Boost.Regex with Unicode strings: | |
31 | </p> | |
32 | <h5> | |
33 | <a name="boost_regex.unicode.h0"></a> | |
34 | <span class="phrase"><a name="boost_regex.unicode.rely_on_wchar_t"></a></span><a class="link" href="unicode.html#boost_regex.unicode.rely_on_wchar_t">Rely | |
35 | on wchar_t</a> | |
36 | </h5> | |
37 | <p> | |
38 | If your platform's <code class="computeroutput"><span class="keyword">wchar_t</span></code> type | |
39 | can hold Unicode strings, and your platform's C/C++ runtime correctly handles | |
40 | wide character constants (when passed to <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">iswspace</span></code> | |
41 | <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">iswlower</span></code> etc), then you can use <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">wregex</span></code> | |
42 | to process Unicode. However, there are several disadvantages to this approach: | |
43 | </p> | |
44 | <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> | |
45 | <li class="listitem"> | |
46 | It's not portable: there's no guarantee on the width of <code class="computeroutput"><span class="keyword">wchar_t</span></code>, | |
47 | or even whether the runtime treats wide characters as Unicode at all, most | |
48 | Windows compilers do so, but many Unix systems do not. | |
49 | </li> | |
50 | <li class="listitem"> | |
51 | There's no support for Unicode-specific character classes: <code class="computeroutput"><span class="special">[[:</span><span class="identifier">Nd</span><span class="special">:]]</span></code>, <code class="computeroutput"><span class="special">[[:</span><span class="identifier">Po</span><span class="special">:]]</span></code> | |
52 | etc. | |
53 | </li> | |
54 | <li class="listitem"> | |
55 | You can only search strings that are encoded as sequences of wide characters, | |
56 | it is not possible to search UTF-8, or even UTF-16 on many platforms. | |
57 | </li> | |
58 | </ul></div> | |
59 | <h5> | |
60 | <a name="boost_regex.unicode.h1"></a> | |
61 | <span class="phrase"><a name="boost_regex.unicode.use_a_unicode_aware_regular_expr"></a></span><a class="link" href="unicode.html#boost_regex.unicode.use_a_unicode_aware_regular_expr">Use | |
62 | a Unicode Aware Regular Expression Type.</a> | |
63 | </h5> | |
64 | <p> | |
65 | If you have the <a href="http://www.ibm.com/software/globalization/icu/" target="_top">ICU | |
66 | library</a>, then Boost.Regex can be <a class="link" href="install.html#boost_regex.install.building_with_unicode_and_icu_su">configured | |
67 | to make use of it</a>, and provide a distinct regular expression type (boost::u32regex), | |
68 | that supports both Unicode specific character properties, and the searching | |
69 | of text that is encoded in either UTF-8, UTF-16, or UTF-32. See: <a class="link" href="ref/non_std_strings/icu.html" title="Working With Unicode and ICU String Types">ICU | |
70 | string class support</a>. | |
71 | </p> | |
72 | </div> | |
73 | <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr> | |
74 | <td align="left"></td> | |
75 | <td align="right"><div class="copyright-footer">Copyright © 1998-2013 John Maddock<p> | |
76 | Distributed under the Boost Software License, Version 1.0. (See accompanying | |
77 | file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>) | |
78 | </p> | |
79 | </div></td> | |
80 | </tr></table> | |
81 | <hr> | |
82 | <div class="spirit-nav"> | |
83 | <a accesskey="p" href="introduction_and_overview.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="captures.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a> | |
84 | </div> | |
85 | </body> | |
86 | </html> |