[ceph.git] / ceph / src / boost / libs / locale / doc / conversions.txt

//
//  Copyright (c) 2009-2011 Artyom Beilis (Tonkikh)
//
//  Distributed under the Boost Software License, Version 1.0. (See
//  accompanying file LICENSE_1_0.txt or copy at
//  http://www.boost.org/LICENSE_1_0.txt)
//

// vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 filetype=cpp.doxygen
/*!
\page conversions Text Conversions

There is a set of functions that perform basic string conversion operations: 
upper, lower and \ref term_title_case "title case" conversions, \ref term_case_folding "case folding"
and Unicode \ref term_normalization "normalization". These are \ref boost::locale::to_upper "to_upper" , \ref boost::locale::to_lower "to_lower", \ref boost::locale::to_title "to_title", \ref boost::locale::fold_case "fold_case" and \ref boost::locale::normalize "normalize".

All these functions receive an \c std::locale object as parameter or use a global locale by default.

Global locale is used in all examples below.

\section conversions_case Case Handing

For example:
\code
    std::string grussen = "grüßEN";
    std::cout   <<"Upper "<< boost::locale::to_upper(grussen) << std::endl
                <<"Lower "<< boost::locale::to_lower(grussen) << std::endl
                <<"Title "<< boost::locale::to_title(grussen) << std::endl
                <<"Fold  "<< boost::locale::fold_case(grussen) << std::endl;
\endcode

Would print:

\verbatim
Upper GRÜSSEN
Lower grüßen
Title Grüßen
Fold  grüssen
\endverbatim

You may notice that there are existing functions \c to_upper and \c to_lower in the Boost.StringAlgo library.
The difference is that these function operate over an entire string instead of performing incorrect character-by-character conversions.

For example:

\code
    std::wstring grussen = L"grüßen";
    std::wcout << boost::algorithm::to_upper_copy(grussen) << " " << boost::locale::to_upper(grussen) << std::endl;
\endcode

Would give in output:

\verbatim
GRÜßEN GRÜSSEN
\endverbatim

Where a letter "ß" was not converted correctly to double-S in first case because of a limitation of \c std::ctype facet.

This is even more problematic in case of UTF-8 encodings where non US-ASCII are not converted at all. 
For example, this code 

\code
    std::string grussen = "grüßen";
    std::cout << boost::algorithm::to_upper_copy(grussen) << " " << boost::locale::to_upper(grussen) << std::endl;
\endcode

Would modify ASCII characters only

\verbatim
GRüßEN GRÜSSEN
\endverbatim

\section conversions_normalization Unicode Normalization

Unicode normalization is the process of converting strings to a standard form, suitable for text processing and
comparison. For example, character "ü" can be represented by a single code point or a combination of the character "u" and the
diaeresis "¨". Normalization is an important part of Unicode text processing.

Unicode defines four normalization forms. Each specific form is selected by a flag passed
to \ref boost::locale::normalize() "normalize" function:

- NFD - Canonical decomposition - boost::locale::norm_nfd
- NFC - Canonical decomposition followed by canonical composition - boost::locale::norm_nfc or boost::locale::norm_default
- NFKD - Compatibility decomposition - boost::locale::norm_nfkd
- NFKC - Compatibility decomposition followed by canonical composition - boost::locale::norm_nfkc

For more details on normalization forms, read <a href="http://unicode.org/reports/tr15/#Norm_Forms">this article</a>.

\section conversions_notes Notes

-   \ref boost::locale::normalize() "normalize" operates only on Unicode-encoded strings, i.e.: UTF-8, UTF-16 and UTF-32 depending on the
    character width. So be careful when using non-UTF encodings as they may be treated incorrectly.
-   \ref boost::locale::fold_case() "fold_case" is generally a locale-independent operation, but it receives a locale as a parameter to
    determine the 8-bit encoding.
-   All of these functions can work with an STL string, a NUL terminated string, or a range defined by two pointers. They always
    return a newly created STL string.
-   The length of the string may change, see the above example.
*/
Commit	Line	Data
7c673cae FG	1	//
	2	// Copyright (c) 2009-2011 Artyom Beilis (Tonkikh)
	3	//
	4	// Distributed under the Boost Software License, Version 1.0. (See
	5	// accompanying file LICENSE_1_0.txt or copy at
	6	// http://www.boost.org/LICENSE_1_0.txt)
	7	//
	8
	9	// vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 filetype=cpp.doxygen
	10	/*!
	11	\page conversions Text Conversions
	12
	13	There is a set of functions that perform basic string conversion operations:
	14	upper, lower and \ref term_title_case "title case" conversions, \ref term_case_folding "case folding"
	15	and Unicode \ref term_normalization "normalization". These are \ref boost::locale::to_upper "to_upper" , \ref boost::locale::to_lower "to_lower", \ref boost::locale::to_title "to_title", \ref boost::locale::fold_case "fold_case" and \ref boost::locale::normalize "normalize".
	16
	17	All these functions receive an \c std::locale object as parameter or use a global locale by default.
	18
	19	Global locale is used in all examples below.
	20
	21	\section conversions_case Case Handing
	22
	23	For example:
	24	\code
	25	std::string grussen = "grüßEN";
	26	std::cout <<"Upper "<< boost::locale::to_upper(grussen) << std::endl
	27	<<"Lower "<< boost::locale::to_lower(grussen) << std::endl
	28	<<"Title "<< boost::locale::to_title(grussen) << std::endl
	29	<<"Fold "<< boost::locale::fold_case(grussen) << std::endl;
	30	\endcode
	31
	32	Would print:
	33
	34	\verbatim
	35	Upper GRÜSSEN
	36	Lower grüßen
	37	Title Grüßen
	38	Fold grüssen
	39	\endverbatim
	40
	41	You may notice that there are existing functions \c to_upper and \c to_lower in the Boost.StringAlgo library.
	42	The difference is that these function operate over an entire string instead of performing incorrect character-by-character conversions.
	43
	44	For example:
	45
	46	\code
	47	std::wstring grussen = L"grüßen";
	48	std::wcout << boost::algorithm::to_upper_copy(grussen) << " " << boost::locale::to_upper(grussen) << std::endl;
	49	\endcode
	50
	51	Would give in output:
	52
	53	\verbatim
	54	GRÜßEN GRÜSSEN
	55	\endverbatim
	56
	57	Where a letter "ß" was not converted correctly to double-S in first case because of a limitation of \c std::ctype facet.
	58
	59	This is even more problematic in case of UTF-8 encodings where non US-ASCII are not converted at all.
	60	For example, this code
	61
	62	\code
	63	std::string grussen = "grüßen";
	64	std::cout << boost::algorithm::to_upper_copy(grussen) << " " << boost::locale::to_upper(grussen) << std::endl;
65	\endcode
66
67	Would modify ASCII characters only
68
69	\verbatim
70	GRüßEN GRÜSSEN
71	\endverbatim
72
73	\section conversions_normalization Unicode Normalization
74
75	Unicode normalization is the process of converting strings to a standard form, suitable for text processing and
76	comparison. For example, character "ü" can be represented by a single code point or a combination of the character "u" and the
77	diaeresis "¨". Normalization is an important part of Unicode text processing.
78
79	Unicode defines four normalization forms. Each specific form is selected by a flag passed
80	to \ref boost::locale::normalize() "normalize" function:
81
82	- NFD - Canonical decomposition - boost::locale::norm_nfd
83	- NFC - Canonical decomposition followed by canonical composition - boost::locale::norm_nfc or boost::locale::norm_default
84	- NFKD - Compatibility decomposition - boost::locale::norm_nfkd
85	- NFKC - Compatibility decomposition followed by canonical composition - boost::locale::norm_nfkc
86
87	For more details on normalization forms, read <a href="http://unicode.org/reports/tr15/#Norm_Forms">this article</a>.
88
89	\section conversions_notes Notes
90
91	- \ref boost::locale::normalize() "normalize" operates only on Unicode-encoded strings, i.e.: UTF-8, UTF-16 and UTF-32 depending on the
92	character width. So be careful when using non-UTF encodings as they may be treated incorrectly.
93	- \ref boost::locale::fold_case() "fold_case" is generally a locale-independent operation, but it receives a locale as a parameter to
94	determine the 8-bit encoding.
95	- All of these functions can work with an STL string, a NUL terminated string, or a range defined by two pointers. They always
96	return a newly created STL string.
97	- The length of the string may change, see the above example.
98	*/
99
100