ceph/src/boost/libs/regex/doc/unicode.qbk

   1 [/
   2   Copyright 2006-2007 John Maddock.
   3   Distributed under the Boost Software License, Version 1.0.
   4   (See accompanying file LICENSE_1_0.txt or copy at
   5   http://www.boost.org/LICENSE_1_0.txt).
   6 ]
   7
   8
   9 [section:unicode Unicode and Boost.Regex]
  10
  11 There are two ways to use Boost.Regex with Unicode strings:
  12
  13 [h4 Rely on wchar_t]
  14
  15 If your platform's `wchar_t` type can hold Unicode strings, and your
  16 platform's C/C++ runtime correctly handles wide character constants
  17 (when passed to `std::iswspace` `std::iswlower` etc), then you can use
  18 `boost::wregex` to process Unicode.  However, there are several
  19 disadvantages to this approach:
  20
  21 * It's not portable: there's no guarantee on the width of `wchar_t`, or
  22 even whether the runtime treats wide characters as Unicode at all,
  23 most Windows compilers do so, but many Unix systems do not.
  24 * There's no support for Unicode-specific character classes: `[[:Nd:]]`, `[[:Po:]]` etc.
  25 * You can only search strings that are encoded as sequences of wide
  26 characters, it is not possible to search UTF-8, or even UTF-16 on many platforms.
  27
  28 [h4 Use a Unicode Aware Regular Expression Type.]
  29
  30 If you have the
  31 [@http://www.ibm.com/software/globalization/icu/ ICU library], then
  32 Boost.Regex can be
  33 [link boost_regex.install.building_with_unicode_and_icu_su
  34 configured to make use
  35 of it], and provide a distinct regular expression type (boost::u32regex),
  36 that supports both Unicode specific character properties, and the searching
  37 of text that is encoded in either UTF-8, UTF-16, or UTF-32.  See:
  38 [link boost_regex.ref.non_std_strings.icu
  39 ICU string class support].
  40
  41 [endsect]
  42