[ceph.git] / ceph / src / boost / libs / regex / doc / unicode.qbk

[/ 
  Copyright 2006-2007 John Maddock.
  Distributed under the Boost Software License, Version 1.0.
  (See accompanying file LICENSE_1_0.txt or copy at
  http://www.boost.org/LICENSE_1_0.txt).
]


[section:unicode Unicode and Boost.Regex]

There are two ways to use Boost.Regex with Unicode strings:

[h4 Rely on wchar_t]

If your platform's `wchar_t` type can hold Unicode strings, and your 
platform's C/C++ runtime correctly handles wide character constants 
(when passed to `std::iswspace` `std::iswlower` etc), then you can use 
`boost::wregex` to process Unicode.  However, there are several 
disadvantages to this approach:

* It's not portable: there's no guarantee on the width of `wchar_t`, or 
even whether the runtime treats wide characters as Unicode at all, 
most Windows compilers do so, but many Unix systems do not.
* There's no support for Unicode-specific character classes: `[[:Nd:]]`, `[[:Po:]]` etc.
* You can only search strings that are encoded as sequences of wide 
characters, it is not possible to search UTF-8, or even UTF-16 on many platforms.

[h4 Use a Unicode Aware Regular Expression Type.]

If you have the 
[@http://www.ibm.com/software/globalization/icu/ ICU library], then 
Boost.Regex can be 
[link boost_regex.install.building_with_unicode_and_icu_su 
configured to make use 
of it], and provide a distinct regular expression type (boost::u32regex), 
that supports both Unicode specific character properties, and the searching 
of text that is encoded in either UTF-8, UTF-16, or UTF-32.  See: 
[link boost_regex.ref.non_std_strings.icu 
ICU string class support].

[endsect]
Commit	Line	Data
7c673cae FG	1	[/
	2	Copyright 2006-2007 John Maddock.
	3	Distributed under the Boost Software License, Version 1.0.
	4	(See accompanying file LICENSE_1_0.txt or copy at
	5	http://www.boost.org/LICENSE_1_0.txt).
	6	]
	7
	8
	9	[section:unicode Unicode and Boost.Regex]
	10
	11	There are two ways to use Boost.Regex with Unicode strings:
	12
	13	[h4 Rely on wchar_t]
	14
	15	If your platform's `wchar_t` type can hold Unicode strings, and your
	16	platform's C/C++ runtime correctly handles wide character constants
	17	(when passed to `std::iswspace` `std::iswlower` etc), then you can use
	18	`boost::wregex` to process Unicode. However, there are several
	19	disadvantages to this approach:
	20
	21	* It's not portable: there's no guarantee on the width of `wchar_t`, or
	22	even whether the runtime treats wide characters as Unicode at all,
	23	most Windows compilers do so, but many Unix systems do not.
	24	* There's no support for Unicode-specific character classes: `[[:Nd:]]`, `[[:Po:]]` etc.
	25	* You can only search strings that are encoded as sequences of wide
	26	characters, it is not possible to search UTF-8, or even UTF-16 on many platforms.
	27
	28	[h4 Use a Unicode Aware Regular Expression Type.]
	29
	30	If you have the
	31	[@http://www.ibm.com/software/globalization/icu/ ICU library], then
	32	Boost.Regex can be
	33	[link boost_regex.install.building_with_unicode_and_icu_su
	34	configured to make use
	35	of it], and provide a distinct regular expression type (boost::u32regex),
	36	that supports both Unicode specific character properties, and the searching
	37	of text that is encoded in either UTF-8, UTF-16, or UTF-32. See:
	38	[link boost_regex.ref.non_std_strings.icu
	39	ICU string class support].
	40
	41	[endsect]
	42