]> git.proxmox.com Git - ceph.git/blame - ceph/src/boost/libs/regex/doc/unicode.qbk
bump version to 12.2.2-pve1
[ceph.git] / ceph / src / boost / libs / regex / doc / unicode.qbk
CommitLineData
7c673cae
FG
1[/
2 Copyright 2006-2007 John Maddock.
3 Distributed under the Boost Software License, Version 1.0.
4 (See accompanying file LICENSE_1_0.txt or copy at
5 http://www.boost.org/LICENSE_1_0.txt).
6]
7
8
9[section:unicode Unicode and Boost.Regex]
10
11There are two ways to use Boost.Regex with Unicode strings:
12
13[h4 Rely on wchar_t]
14
15If your platform's `wchar_t` type can hold Unicode strings, and your
16platform's C/C++ runtime correctly handles wide character constants
17(when passed to `std::iswspace` `std::iswlower` etc), then you can use
18`boost::wregex` to process Unicode. However, there are several
19disadvantages to this approach:
20
21* It's not portable: there's no guarantee on the width of `wchar_t`, or
22even whether the runtime treats wide characters as Unicode at all,
23most Windows compilers do so, but many Unix systems do not.
24* There's no support for Unicode-specific character classes: `[[:Nd:]]`, `[[:Po:]]` etc.
25* You can only search strings that are encoded as sequences of wide
26characters, it is not possible to search UTF-8, or even UTF-16 on many platforms.
27
28[h4 Use a Unicode Aware Regular Expression Type.]
29
30If you have the
31[@http://www.ibm.com/software/globalization/icu/ ICU library], then
32Boost.Regex can be
33[link boost_regex.install.building_with_unicode_and_icu_su
34configured to make use
35of it], and provide a distinct regular expression type (boost::u32regex),
36that supports both Unicode specific character properties, and the searching
37of text that is encoded in either UTF-8, UTF-16, or UTF-32. See:
38[link boost_regex.ref.non_std_strings.icu
39ICU string class support].
40
41[endsect]
42