]>
Commit | Line | Data |
---|---|---|
7eb75bcc DM |
1 | To generate or modify mapping headers\r |
2 | -------------------------------------\r | |
3 | Mapping headers are imported from CJKCodecs as pre-generated form.\r | |
4 | If you need to tweak or add something on it, please look at tools/\r | |
5 | subdirectory of CJKCodecs' distribution.\r | |
6 | \r | |
7 | \r | |
8 | \r | |
9 | Notes on implmentation characteristics of each codecs\r | |
10 | -----------------------------------------------------\r | |
11 | \r | |
12 | 1) Big5 codec\r | |
13 | \r | |
14 | The big5 codec maps the following characters as cp950 does rather\r | |
15 | than conforming Unicode.org's that maps to 0xFFFD.\r | |
16 | \r | |
17 | BIG5 Unicode Description\r | |
18 | \r | |
19 | 0xA15A 0x2574 SPACING UNDERSCORE\r | |
20 | 0xA1C3 0xFFE3 SPACING HEAVY OVERSCORE\r | |
21 | 0xA1C5 0x02CD SPACING HEAVY UNDERSCORE\r | |
22 | 0xA1FE 0xFF0F LT DIAG UP RIGHT TO LOW LEFT\r | |
23 | 0xA240 0xFF3C LT DIAG UP LEFT TO LOW RIGHT\r | |
24 | 0xA2CC 0x5341 HANGZHOU NUMERAL TEN\r | |
25 | 0xA2CE 0x5345 HANGZHOU NUMERAL THIRTY\r | |
26 | \r | |
27 | Because unicode 0x5341, 0x5345, 0xFF0F, 0xFF3C is mapped to another\r | |
28 | big5 codes already, a roundtrip compatibility is not guaranteed for\r | |
29 | them.\r | |
30 | \r | |
31 | \r | |
32 | 2) cp932 codec\r | |
33 | \r | |
34 | To conform to Windows's real mapping, cp932 codec maps the following\r | |
35 | codepoints in addition of the official cp932 mapping.\r | |
36 | \r | |
37 | CP932 Unicode Description\r | |
38 | \r | |
39 | 0x80 0x80 UNDEFINED\r | |
40 | 0xA0 0xF8F0 UNDEFINED\r | |
41 | 0xFD 0xF8F1 UNDEFINED\r | |
42 | 0xFE 0xF8F2 UNDEFINED\r | |
43 | 0xFF 0xF8F3 UNDEFINED\r | |
44 | \r | |
45 | \r | |
46 | 3) euc-jisx0213 codec\r | |
47 | \r | |
48 | The euc-jisx0213 codec maps JIS X 0213 Plane 1 code 0x2140 into\r | |
49 | unicode U+FF3C instead of U+005C as on unicode.org's mapping.\r | |
50 | Because euc-jisx0213 has REVERSE SOLIDUS on 0x5c already and A140\r | |
51 | is shown as a full width character, mapping to U+FF3C can make\r | |
52 | more sense.\r | |
53 | \r | |
54 | The euc-jisx0213 codec is enabled to decode JIS X 0212 codes on\r | |
55 | codeset 2. Because JIS X 0212 and JIS X 0213 Plane 2 don't have\r | |
56 | overlapped by each other, it doesn't bother standard conformations\r | |
57 | (and JIS X 0213 Plane 2 is intended to use so.) On encoding\r | |
58 | sessions, the codec will try to encode kanji characters in this\r | |
59 | order:\r | |
60 | \r | |
61 | JIS X 0213 Plane 1 -> JIS X 0213 Plane 2 -> JIS X 0212\r | |
62 | \r | |
63 | \r | |
64 | 4) euc-jp codec\r | |
65 | \r | |
66 | The euc-jp codec is a compatibility instance on these points:\r | |
67 | - U+FF3C FULLWIDTH REVERSE SOLIDUS is mapped to EUC-JP A1C0 (vice versa)\r | |
68 | - U+00A5 YEN SIGN is mapped to EUC-JP 0x5c. (one way)\r | |
69 | - U+203E OVERLINE is mapped to EUC-JP 0x7e. (one way)\r | |
70 | \r | |
71 | \r | |
72 | 5) shift-jis codec\r | |
73 | \r | |
74 | The shift-jis codec is mapping 0x20-0x7e area to U+20-U+7E directly\r | |
75 | instead of using JIS X 0201 for compatibility. The differences are:\r | |
76 | - U+005C REVERSE SOLIDUS is mapped to SHIFT-JIS 0x5c.\r | |
77 | - U+007E TILDE is mapped to SHIFT-JIS 0x7e.\r | |
78 | - U+FF3C FULL-WIDTH REVERSE SOLIDUS is mapped to SHIFT-JIS 815f.\r | |
79 | \r |